Download - Residual Finite State Automata 1. Introduction - CiteSeerX

Fundamenta Informaticae XX (2002) 1–30 1

IOS Press

Residual Finite State Automata

François Denis�LIF, UMR 6166 CNRS, Université de Provence, Marseille

Aurélien Lemay, Alain Terlutte yGRAPPA-LIFL, Université de Lille I

Abstract. We define a new variety of Nondeterministic Finite Automata (NFA): a Residual FiniteState Automaton (RFSA) is an NFA all the states of which defineresidual languages of the languageL that it recognizes ; a residual language according to a wordu is the set of wordsv such thatuv isin L. We prove that every regular language is recognized by a unique (canonical) RFSA which hasa minimal number of states and a maximal number of transitions. Canonical RFSAs are based onthe notion of prime residual languages, i.e. that are not theunion of other residual languages. Weprovide an algorithmic construction of the canonical RFSA from a given NFA. We study the size ofcanonical RFSAs and the complexity of our constructions.

1. Introduction

Regular languages are among the most studied objects in formal language theory. Deterministic finiteautomata (DFA) and nondeterministic finite automata (NFA) are two basic types of representation ofregular languages. DFAs have many good properties: most classical constructions are polynomial andthere exists a unique minimal element for a given regular language. Moreover, the Myhill-Nerode theo-rem shows that the states of a DFA correspond to natural components of the language it recognizes: itsresidual languagesor Brzozowski derivativesor left quotients(here, we shall use the first name). NFAsare a generalization of DFAs which lose most properties of DFAs but gain concision. For example, theminimal DFA which recognizes the language��a�n, with � = fa; bg, has2n+1 states while a minimalequivalent NFA has onlyn+2 states. However, there may exist several non-isomorphic equivalent min-imal NFAs and states of NFAs may correspond to no natural component of the language they recognize.�[email protected]{lemay, terlutte}@lifl.fr

2 F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata

Both type of representations properties, concision and thefact that they are based on natural compo-nents of the associated language, can be necessary within certain application field such as GrammaticalInference. The main goal of Grammatical Inference is to identify some target language fromexamplesof this language, i.e. words together with a piece of information indicating whether it belongs to thelanguage. General machine learning properties show that targets with short representations should beidentified from fewer examples than others. On the other hand, inference algorithms try to detect proper-ties of the target language from properties of some of its examples in order to build some representationof it. However, it has been shown that regular languages can be polynomially identified from given datausing DFAs representations [7, 8] but that they cannot be identified in the same conditions using NFAsrepresentations. In consequence, languages as simple as��a�n cannot be infered efficiently by infer-ence algorithms using DFAs representations and NFAs representations cannot be used. Hence, it is anatural goal to look for intermediary representations having both kind of properties.

In this paper, we consider NFAs all the states of which define residual languages of the language itrecognizes. We call Residual Finite State Automata (RFSA) such automata. RFSAs have been introducedin [5].Clearly, all DFAs are RFSAs but the converse is false.We show that we can naturally associatewith every regular languageL an RFSA which has a minimal number of states and which we call thecanonical RFSA ofL: each of its states is associated with aprime residual language, i.e. a residuallanguage which is not the union of other residual languages.We provide an algorithmic construction ofthe canonical RFSA equivalent to a given NFA which stems fromthe classical subset construction usedto build the minimal DFA. We give some results on the size of RFSAs: for example, there are canonicalRFSAs exponentially larger (resp. smaller) than equivalent minimal NFAs (resp. DFAs). Then, we studyRFSAs over a one-letter alphabet and we show that the gap between the sizes of minimal DFAs andcanonical RFSAs is quadratic in the worst case: remind that the gap between the sizes of minimal DFAsand NFAs can be superpolynomial. Finally, we give some complexity results which show that naturalconstructions and decision problems are PSPACE-complete.

In Section 2, we recall classical definitions and notations about regular languages and automata. Wedefine RFSAs in Section 3 and we study their properties in Section 4. In particular, we introduce thenotion of canonical RFSA. The construction of the canonicalRFSA using the subset method is givenin Section 5. In Section 6, we study some particular (and pathological) RFSAs. RFSAs over one-letteralphabet are studied in Section 7. Section 8 is devoted to thestudy of the complexity of decision andconstruction problems on RFSAs. We conclude in Section 9.

2. Preliminaries

In this section, we recall some definitions on finite automata. For more information, we invite the readerto consult [9, 14].

2.1. Automata and languages

Let � be a finite alphabet, and let�� be the set of words on�. We denote by" the empty wordand by juj the length of a wordu. For an integern, we define�n = fu 2 �� j juj = ng and��n = fu 2 �� j juj � ng. A languageis a subset of��.

A nondeterministic finite automaton(NFA) is a quintupleA = h�; Q;Q0; F; Æi whereQ is a finiteset of states,Q0 � Q is the set of initial states,F � Q is the set of final states,Æ is thetransition function

F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata 3

of the automaton defined from a subset ofQ � � to 2Q. We also denote byÆ the extended transitionfunction defined from a subset of2Q � �� to 2Q by� Æ(fqg; ") = fqg,� Æ(fqg; x) = Æ(q; x), wherex 2 �,� Æ(Q0; u) = Sq2Q0 Æ(fqg; u), whereu 2 ��,� Æ(fqg; ux) = Æ(Æ(q; u); x), wherex 2 � andu 2 ��.

An NFA is deterministic(DFA) if Q0 contains only one elementq0 and if 8q 2 Q, 8x 2 �,Card(Æ(q; x)) � 1. An NFA is trimmed if and only if 8q 2 Q, 9w1 2 ��, q 2 Æ(Q0; w1) and9w2 2 ��, Æ(q; w2) \ F 6= ;. A stateq is reachableby the wordu if q 2 Æ(Q0; u).A word u 2 �� is recognized by an NFAA = h�; Q;Q0; F; Æi if Æ(Q0; u)\F 6= ; and the language

recognized byA isLA = fu 2 �� j Æ(Q0; u)\F 6= ;g. LetQ0 � Q. We denote byLA;Q0 the languagefv 2 �� j Æ(Q0; v) \ F 6= ;g. So,LA = LA;Q0. WhenQ0 contains exactly one stateq, we simplydenoteLA;Q0 by LA;q. We denote byRe (��) the class of recognizable languages. It can be provedthat every recognizable language can be recognized by a DFA.There exists a unique minimal DFA thatrecognizes a given recognizable language (minimal with regard to the number of states and unique up toan isomorphism). Finally, the Kleene theorem [10] proves that the class of regular languagesReg(��)is identical toRe (��).

ThereversaluR of a wordu is defined inductively by"R = " and(xv)R = vRx for x 2 � andv 2��: The reversalLR of a languageL � �� is defined byLR = fuR 2 �� j u 2 Lg. The reversalARof an NFAA = h�; Q;Q0; F; Æi is defined byAR = h�; Q; F;Q0; ÆRi with q 2 ÆR(q0; x) if and only ifq0 2 Æ(q; x). We have(LA)R = LAR .

2.2. Residual Languages

LetL be a language over�� and letu 2 ��. Theresidual language ofL with regard tou is defined byu�1L = fv 2 �� j uv 2 Lg and we say thatu is acharacterizing wordfor u�1L. The Myhill-Nerodetheorem [11, 12] proves that the set of distinct residual languages of a languageL is finite if and only ifL is regular. Automata and residual languages are linked by the following properties.

Let A = h�; Q;Q0; F; Æi be an NFA. For any stateq 2 Q and any wordu 2 ��, if q 2 Æ(Q0; u),thenLA;q � u�1LA:

LetA = h�; Q; q0; F; Æi be a trimmed DFA.� For every non-empty residual languageu�1LA, there exists a stateq 2 Q such thatLA;q =u�1LA.� For every stateq 2 Q, there exists a residual languageu�1LA such thatu�1LA = LA;q.Furthermore, ifA is the minimal DFA, the correspondence between states ofA and non-empty

residual languages ofLA is bijective.


3. Definition of Residual Finite State Automaton

Definition 3.1. A Residual Finite State Automaton(RFSA) is an NFAA = h�; Q;Q0; F; Æi such that,for each stateq 2 Q, LA;q is a residual language ofLA. More formally,8q 2 Q, 9u 2 �� such thatLA;q = u�1LA:

Trimmed DFAs are RFSAs. So, every regular language is recognized by a RFSA.

Example 3.1. LetL = ��a�where� = fa; bg. This language is recognized by the following automataA1, A2 andA3 (Figures 1, 2, 3):� A1 is an NFA which is neither a DFA, nor an RFSA. Languages associated with states are:LA1;q0 = ��a�, LA1;q1 = �, LA1;q2 = f"g. As for everyu in ��, we haveuL � L and so,L � u�1L, neitherLA1;q1 norLA1;q2 are residual languages.� A2 is the minimal DFA that recognizesL. A2 is also an RFSA. We haveLA2;q0 = ��a� = "�1L,LA2;q1 = ��a� [ � = a�1L, LA2;q2 = ��a� [ � [ f"g = (aa)�1L, LA2;q3 = ��a� [ f"g =(ab)�1L.� A3 is an RFSA. Indeed, we haveLA3;q0 = "�1L, LA3;q1 = a�1L, LA3;q2 = (ab)�1L. One cannotice thatA3 is not a DFA.

Definition 3.2. Let A = h�; Q;Q0; F; Æi be an RFSA, and letq be a state ofA. The wordu is acharacterizing wordfor q if LA;q = u�1LA: The automatonA is consistentif each stateq is reachableby a characterizing word forq. We say thatA is strongly consistentif each stateq is reachable by everycharacterizing word forq.

Examples of not consistent RFSA and not strongly consistentRFSA are shown in Figure 4 andFigure 5.

If, for every stateq of an NFAA, there exists a worduq that only leads toq (that is, such thatÆ(Q0; uq) = fqg), thenA is an RFSA sinceLA;q = u�1q LA. However, next example shows that theconverse property is false.

Example 3.2. LetL = a�b�+b�a� and consider the automaton described in Figure 6. This automaton isa strongly consistent RFSA. But we can observe that there exists no wordu such thatÆ(Q0; u) = fq0g.4. Properties of Residual Finite State Automata

4.1. General Properties

Definition 4.1. LetL be a regular language over� and letu 2 ��. The residual languageu�1L is primeif it is not equal to the union of the residual languages it strictly contains: letRu = fv 2 �� j v�1L (u�1Lg, u�1L is prime if

[v2Ru v�1L ( u�1L:


q0��- q1�� q2��qa qa; ba; bFigure 1. A1 is an automaton recognizing��a� ; it is neither a DFA nor an RFSA.

q0��- q1�� q2��q3��-a ~a� b} b b�ab a

Figure 2. A2 is the minimal DFA recognizing��a�.

q0��- q1�� q2��jaY a; b ja; bY aa; b a} a; bFigure 3. A3 is the canonical RFSA recognizing��a�.q0��-�� q1��-�� q2��-a

Figure 4. An RFSA which recognizesL = f"; ag. It is not consistent as the only word which reachesq0 is " butLq0 6= "�1L. q0��-�� q1��6�� q2�� a -a,b

Figure 5. An RFSA which is consistent but not strongly consistent.


q0��- q1�� q2��- q3��qaqb b aa bFigure 6. An RFSA recognizing the languagea�b� + b�a�.

A residual language iscompositeif it is not prime. A stateq of an RFSAA is prime(resp.composite)if the residual languageLA;q is prime (resp. composite).

Note that a prime residual language is not empty and that the set of distinct prime residual languagesof a regular language is finite.

Proposition 4.1. LetA = h�; Q;Q0; F; Æi be an RFSA. For each prime residual languageu�1LA, thereexists a stateq 2 Æ(Q0; u) such thatLA;q = u�1LA.

Proof:Asu�1LA is prime,Æ(Q0; u) is not empty. LetÆ(Q0; u) = fq1; : : : ; qsg and letv1; : : : ; vs be words suchthatLA;qi = vi�1LA for every1 � i � s. We haveu�1LA = s[i=1LA;qi = s[i=1 v�1i LA:As u�1LA is prime, there existsvk such thatu�1LA = v�1k LA = LA;qk . ut

As a corollary, an RFSAA has at least as many states as the number of prime residual languages ofLA.

4.2. Saturation operator

We define asaturationoperator which may add transitions and initial states to an NFA without modifyingthe language it recognizes.

Definition 4.2. Let A = h�; Q;Q0; F; Æi be an NFA. The saturated of A is the automatonAs =h�; Q;Qs0; F; Æsi whereQs0 = fq 2 Q j LA;q � LAg andÆs(q; x) = fq0 2 Q j xLA;q0 � LA;qg forq 2 Q andx 2 �. We say that an automatonA is saturated ifA = As.Lemma 4.1. LetA1 andA2 be two NFAs over� sharing the same set of statesQ. If LA1 = LA2 and iffor every stateq 2 Q, LA1;q = LA2;q, thenAs1 = As2.Proof:LetA1 = h�; Q;Q0;1; F1; Æ1i andA2 = h�; Q;Q0;2; F2; Æ2i. The stateq is initial inAs1 iff LA1;q � LA1 ,i.e. iff q is initial in As2. In the same way,q0 2 Æs1(q; x) in As1 iff xLA1;q0 � LA1;q, i.e. iff q0 2 Æs2(q; x) inAs2. Eventually, the stateq is terminal inAs1 iff " 2 LA1;q, i.e. iff q is terminal inAs2. ut


Proposition 4.2. Let A = h�; Q;Q0; F; Æi be an NFA and letAs = h�; Q;Qs0; F; Æsi be the saturatedof A. For eachq in Q, we haveLA;q = LAs;q.Proof:Clearly, LA;q � LAs;q as the saturated of an automaton is obtained by adding transitions and initialstates.

The converse inclusion can be proved by induction on the length of the words ofLAs;q. utWe can then deduce the following corollaries.

Corollary 4.1. Let A be an NFA andAs be its saturated. ThenA andAs recognize the same languageandAs = (As)s.Corollary 4.2. If A is an RFSA, thenAs is also an RFSA.

It could have seemed better to define the saturation operatoranother way: in order to saturate anNFA, add initial states and transitions as long as this operation does not modify the language recognizedby the automaton. Unfortunately, this procedure does not lead to a unique NFA in the general case: in theautomaton defined in Figure 4, it is possible to add a transition labelled bya from q1 to q0 and fromq0to q2 without modifying the language but not simultaneously. However, we have the following lemma:

Lemma 4.2. LetA be a consistent RFSA, letq; q0 be two states ofA and letx be a letter such that addingthe transition(q; x; q0) toA does not modify the language recognized byA. Then,xLA;q0 � LA;q.Proof:Let v 2 LA;q0 and letu such thatLA;q = u�1LA andq is reachable byu. As adding the transition(q; x; q0) toA does not modify the language recognized byA, the worduxv 2 LA. Then,xv 2 u�1LA =LA;q. Therefore,xLA;q0 � LA;q. utCorollary 4.3. In order to compute the saturated of a consistent RFSAA, the following procedure canbe applied: add initial states and transitions as long as therecognized language is not modified.

Proof:A stateq of A can become an initial state without changing the recognizedlanguage if and only ifLA;q � LA. The result directly comes from this remark and previous lemma. ut

There are saturated RFSAs which are not consistent (see Figure 7). Moreover, there are consistentsaturated RFSAs which are not strongly consistent (see Figure 8). However, we have the following result.

Proposition 4.3. If A is a saturated RFSA and ifq is a prime state ofA, then for every wordu such thatLA;q = u�1LA, q is reachable byu.

Proof:

If u = " andLA;q = u�1LA, we haveLA;q � LA and asA is saturated,q is initial.


q0��- q2�� q3��q4��q5��q6��

q1��q7��1x1; b -t qx2; b

-b; -a; b:t zt -b; d qa 1b-t w

a; b;

Figure 7. A saturated inconsistent RFSA: the stateq3 defines the residual language ofbb but it is not reachableby bb.

q0��- q1��q2��q3��q4��q5��q6�� q7��1x1; b -a qx2; b

-b; -a 1aqa -b; d qa -a; b 1b

Figure 8. A saturated consistent RFSA which is not strongly consistent;aa and bb define the same residuallanguagefa; bg but the stateq5 is not reachable bybb.


Suppose now thatLA;q = (ux)�1LA, wherex 2 �. As LA;q is a prime residual language, byProposition 4.1, there existsq00 2 Æ(Q0; ux) such thatLA;q00 = (ux)�1LA. Let q0 be such thatq0 2Æ(Q0; u) andq00 2 Æ(q0; x).

Let v 2 LA;q. As LA;q = LA;q00 , xv 2 LA;q0. That is,xLA;q � LA;q0 and asA is saturated,q 2 Æ(q0; x). Therefore,q is reachable byux in A. ut4.3. Reduction operator�We define areductionoperator� which may delete states in an NFA without changing the language itrecognizes.

Definition 4.3. Let A = h�; Q;Q0; F; Æi be an NFA, and letq be a state ofQ. We denote byR(q) thesetfq0 2 Q n fqg j LA;q0 � LA;qg. We say thatq is erasablein A if LA;q = Sq02R(q) LA;q0 .

If q is erasable, we define�(A; q) = h�; Q0; Q00; F 0; Æ0i where:� Q0 = Qnfqg,� Q00 = Q0 if q 62 Q0, andQ00 = (Q0 n fqg) [R(q) otherwise,� F 0 = F TQ0,� for everyq0 2 Q0 and everyx 2 �,Æ0(q0; x) = ( Æ(q0; x) if q 62 Æ(q0; x)(Æ(q0; x) n fqg) [R(q) otherwise.

If q is not erasable, we define�(A; q) = A.

Note that ifA is a saturated NFA and ifq is an erasable state inA, then�(A; q) is obtained bydeletingq and its associated transitions fromA ; no transitions are added.

Definition 4.4. LetA be an NFA. If there is no erasable state in A, we say thatA is reduced.

Proposition 4.4. Let A be an NFA,q a state ofA andA0 = �(A; q). For every stateq0 2 Q n fqg, wehaveLA;q0 = LA0;q0 . As a consequence,LA = LA0 .Proof:If q is not an erasable state, the proposition is straightforward. Suppose now thatq is an erasable state.LetA = h�; Q;Q0; F; Æi and�(A; q) = A0 = h�; Q0; Q00; F 0; Æ0i.

We first prove by induction onn that, for every state�q 6= q and every integern,LA;�q \��n � LA0;�q \ ��n:Forn = 0, if " 2 LA;�q, then�q 2 F and as�q 6= q, �q 2 F 0 and" 2 LA0;�q.Let u = x�u 2 LA;�q whereu 2 ��n, x 2 � and let�q1 2 Q such that�q1 2 Æ(�q; x) andÆ(f�q1g; �u) \F 6= ;.

10 F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata� If �q1 2 Q0, we can apply the inductive hypothesis:�u 2 LA0;�q1 and as�q1 2 Æ0(�q; x), we haveu 2 LA0;�q.� Otherwise,�q1 = q and there exists a state��q1 6= q such thatLA;��q1 � LA;q and�u 2 LA;��q1 . Usingthe inductive hypothesis again, we have�u 2 LA0;��q1 and as��q1 2 Æ0(�q; x), we haveu 2 LA0;�q.

So, we have shown that for every state�q 6= q, LA;�q � LA0;�q.We prove now by induction that for every state�q 6= q and every integern,LA0;�q \ ��n � LA;�q \ ��nForn = 0, if " 2 LA0;�q, then�q 2 F 0 � F and then" 2 LA;�q.Letu = x�u 2 LA0;�q whereu 2 ��n, x 2 � and let�q1 2 Q0 such that�q1 2 Æ0(�q; x) andÆ0(f�q1g; �u)\F 0 6= ;. We can apply the inductive hypothesis:�u 2 LA;�q1. If �q1 2 Æ(�q; x), we directly haveu 2 LA;�q.

Otherwise, we haveLA;�q1 � LA;q andq 2 Æ(�q; x). We also haveu 2 LA;�q.So, we have shown that for every state�q 6= q, LA0;�q = LA;�q.We show thatA andA0 recognize the same language by studying the two following cases.� If q 62 Q0, we haveLA = Sq02Q0 LA;q0 = Sq02Q0 LA0;q0 = Sq02Q00 LA0;q0 = LA0 .� If q 2 Q0, we have LA = [q02Q0 LA;q0= ( [q02Q0;q0 6=qLA;q0) [ LA;q= ( [q02Q0;q0 6=qLA;q0) [ ( [�q2R(q)LA;�q)= ( [q02Q0;q0 6=qLA0;q0) [ ( [�q2R(q)LA0;�q)= [q02Q00 LA0;q0= LA0 :The reduction operator� does not change the language recognized by the automaton. ut

Corollary 4.4. The reduction operator� is an internal operator in the class of RFSAs.

We shall now show that saturation and reduction operators commute.

Lemma 4.3. LetA = h�; Q;Q0; F; Æi be an NFA and letq be a state ofQ. Then the automaton�(As; q)is saturated.


Proof:Let A = h�; Q;Q0; F; Æi; As = h�; Q;Qs0; F; Æsi; �(As; q) = h�; Q0; Q00; F 0; Æ0i and letL be the lan-guage recognized by these three automata.

Let q0 2 Q0 = Q n fqg. If L�(As;q);q0 � L�(As;q) thenLAs;q0 � LAs from Proposition 4.4. SinceAsis saturated, we haveq0 2 Qs0 and asQ00 = Qs0 n fqg, we haveq0 2 Q00.

Let x 2 � andq0; q00 2 Q0 be such thatxL�(As;q);q00 � L�(As;q);q0 . Then,xLAs;q00 � LAs;q0 fromProposition 4.4. SinceAs is saturated, we haveq00 2 Æs(q0; x) and sinceÆs(q0; x) n fqg � Æ0(q0; x), wehaveq00 2 Æ0(q0; x).

Thenrefore�(As; q) is saturated. utProposition 4.5. LetA = h�; Q;Q0; F; Æi be an NFA recognizing a regular languageL andq a state ofQ. We have [�(A; q)℄s = �(As; q)Proof:

We can observe that�(A; q) and�(As; q) have the same set of states. Furthermore, languages asso-ciated with every stateq0 are identical in the two automata because of Propositions 4.2 and 4.4. Becauseof Lemma 4.1, the saturated of these automata are isomorphic, therefore[�(A; q)℄s = [�(As; q)℄s. Butas�(As; q) is a saturated automaton by Lemma 4.3, the proposition is proved. ut4.4. Canonical RFSA

Definition 4.5. Let L be a regular language. We define the canonical RFSAA of L the following way:A = h�; Q;Q0; F; Æi where� � is the alphabet ofL,� Q is the set of prime residual languages ofL, soQ = fu�1L j u�1L is primeg,� its initial states are prime residual languages included inL, soQ0 = fu�1L 2 Q j u�1L � Lg,� its final states are prime residual languages containing theempty word, soF = fu�1L 2 Q j " 2u�1Lg,� its transition function is defined byÆ(u�1L; x) = fv�1L 2 Q j v�1L � (ux)�1Lg, for u�1L 2Q andx 2 �.

Example 4.1. An example of canonical RFSA is shown in Figure 3.

This definition assumes that the canonical RFSA is an RFSA ; weshall prove this presumption below.We have showed that the reduction operator� transforms an RFSA into an RFSA, and that it com-

mutes with the saturation operator. We shall now show that, if A is a saturated RFSA, the reductionoperator converges and that the resulting automaton is the canonical RFSA ofLA.


Proposition 4.6. Let L be a regular language. IfA = h�; Q;Q0; F; Æi is a reduced saturated RFSArecognizingL, thenA is (isomorphic to) the canonical RFSA ofL.

Proof:AsA is an RFSA, every prime residual languageu�1L of L can be defined as a languageLA;q associatedwith some stateq 2 Q (Proposition 4.1). As there are no erasable states inA, for every stateq, LA;q isa prime residual language and distinct states define distinct languages. AsA is saturated, prime residuallanguages contained inL correspond to initial states ofQ0. As A is saturated, for a prime residuallanguageu�1L and a letterx 2 �, we haveÆ(u�1L; x) = fv�1L 2 Q j x(v�1L) � u�1Lg =fv�1L 2 Q j v�1L � (ux)�1Lg which is the transition function of the canonical RFSA. ut

Let A0; : : : ; An be a sequence of NFAs such that for every index1 � i � n, there exists a stateqiof Ai�1 such thatAi = �(Ai�1; qi). Propositions 4.5 and 4.6 show that ifA0 is a saturated RFSA and ifAn is reduced, thenAn is the canonical RFSA of the language recognized byA0.Theorem 4.1. The canonical RFSA of a regular languageL is a strongly consistent RFSA which rec-ognizesL and is minimal regarding to the number of states. Moreover, the canonical RFSA possesses amaximal number of transitions.

Proof:It is an RFSA that recognizesL since it can be obtained from any RFSA that recognizesL using satura-tion and reduction operators, and since these two operatorsdo not change either the language recognizedby the automaton or the fact that the automaton is an RFSA. It possesses a minimal number of statesbecause of Proposition 4.1 and it is strongly consistent from Proposition 4.3. It has a maximal numberof transitions from Corollary 4.3. ut

Several transitions of the canonical RFSA may be redundant as Example 4.3 shows. Unfortunately,there may exist several non isomorphic RFSAs having as many states as the canonical RFSA and aminimal number of transitions, i.e. such that no transitionis redundant. However, it is possible todescribe a procedure that rules out some redundant transitions in a systematic way.

Definition 4.6. Let L be a regular language. For any set of distinct residual languagesR of L, definemax(R) = fL0 2 R j 8L00 2 R;L0 � L00 ) L0 = L00g. Let A = h�; Q;Q0; F; Æi be the canonicalRFSA which recognizesL. Thesimplified canonical RFSAof L is the automatonA0 = h�; Q;Q00; F; Æ0iwhereQ00 = max(Q0) andÆ0(q; x) = max(Æ(q; x)), for q 2 Q andx 2 �.

It can be shown that the simplified canonical RFSA ofL is an RFSA which recognizesL. Clearly,every regular language admits a unique simplified canonicalRFSA.

Example 4.2. Let � = fa; bg. The simplified canonical RFSA for��a� has4 transitions less than thecanonical RFSA and is the minimal RFSA with respect to numberof states and transitions.

However, it may happen that several non isomorphic RFSAs have as many states as the simplifiedcanonical RFSA but less transitions.


q0��- q1�� q2��ja ja; bY ab a} bFigure 9. The simplified canonical RFSA for��a�.

Example 4.3. Consider the languageL = faa; ab; ba; b ; b; ; da; ea; eb; e g. With regard to the num-ber of states, the minimal RFSAs recognizingL have 6 states. The canonical RFSA has 17 transitions(see Figure 10). The simplified canonical RFSA has 14 transitions asa�1L = d�1L andb�1L = d�1L.There are three non-isomorphic RFSAs with 13 transitions ase�1L = a�1L[ b�1L = a�1L[ �1L =b�1L [ �1L and none with less transitions.

q0��-q1��q2��q3��q4��

q5��*a; e :b; e z ; e ja; b; d; e ja; b za; :b; *a

Figure 10. A canonical RFSA with 17 transitions. The simplified canonical RFSA has 14 transitions. There arethree equivalent non-isomorphic RFSAs with 6 states and 13 transitions.

5. Construction of the canonical RFSA using the subset method

In the previous section, we described a way to build the canonical RFSA from a given DFA using satu-ration and reduction operators. Starting from an NFA, this method requires to build an equivalent DFAand to check whether residual languages are composite. These checks can be very expensive, even forsimple automata. In this section, we present another methodwhich stems from a classical constructionof the minimal DFA of a language and which is easier to implement.

The subset construction is a classical method used to build aDFA equivalent to a given NFA. LetA = h�; Q;Q0; F; Æi be an NFA. The method consists in building the set of reachable sets of states ofA.


We denote byQR(A) the setfp 2 2Q j 9u 2 �� s.t. Æ(Q0; u) = pg and we define the subset automatonD(A) = h�; QD; QD0; FD; ÆDi withQD = QR(A)QD0 = fQ0gFD = fp 2 QD j p \ F 6= ;gÆD(p; x) = fÆ(p; x)g if Æ(p; x) 6= ; and; otherwise, forp 2 QD andx 2 �:The language associated with a statep in the automatonD(A) is the union of the languages associated

with the states that composep in the automatonA, i.e.LD(A);p = Sq2p LA;q.The automatonD(A) is a deterministic trimmed automaton that recognizes the same language asA.We remind that the reversal of a languageL (resp. of an NFAA) is denoted byLR (resp.AR). The

following result provides a method to build the minimal DFA of L.

Theorem 5.1. [1] Let L be a regular language andB an NFA such thatBR is a DFA that recognizesLR. ThenD(B) is the minimal DFA recognizingL.

We can deduce from this theorem that for an NFAA, D(D(AR)R) is the minimal DFA recognizingthe languageLA.

We adapt the subset construction technique to deal with inclusions of sets of states. LetA =h�; Q;Q0; F; Æi be an NFA. We say thatp 2 QR(A) is coverableif there existp1; : : : ; pl 2 QR(A) n fpg,such thatp = Sli=1 pi. We define the automatonC(A) = h�; QC ; QC0; FC ; ÆCi withQC = fp 2 QR(A) j p is not coverablegQC0 = fp 2 QC j p � Q0gFC = fp 2 QC j p \ F 6= ;gÆC(p; x) = fp0 2 QC j p0 � Æ(p; x)g for anyp 2 QC andx 2 �:Lemma 5.1. Let A be an NFA. The automatonC(A) is an RFSA recognizingLA whose all states arereachable.

Proof:The automatonC(A) can be obtained from the DFAD(A) in three steps, by using operations likesaturation and reduction defined in Section 4. The states ofD(A) are associated with residual languagesof L. We only have to verify that the transformations do not change these residual languages.

LetA = h�; Q;Q0; F; Æi be an NFA. LetD(A) = h�; QD; QD0; FD; ÆDi.� Build A1 = h�; QD; QDC0; FD; ÆDi whereQDC0 = fp 2 QD j p � Q0g. The new initial statesp verify LD(A);p � LD(A);QD0 � L. This does not change the language.� Build A2 = h�; QD; QDC0; FD; ÆA2i whereÆA2(p; x) = fp0 2 QD j p0 � Æ(p; x)g for p 2 QDandx 2 �.

We haveLD(A);p0 = LA;p0 � LA;Æ(p;x) = LD(A);ÆD(p;x) and thenxLD(A);p0 � LD(A);p. Thus,as in Proposition 4.2, the languages associated with statesare not changed:LA2;p = LD(A);p forp 2 QD.

F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata 15� Build C(A) = h�; QC ; QC0; FC ; ÆCi by removing coverable states fromA2.Let p00 be a coverable state and(p; x; p00) be a transition leading top00 in A2. Let p001; : : : ; p00l besuch thatp00 = Sli=1 p00i . By the previous step,p00i 2 ÆA2(p; x) for every1 � i � l.We also havexLA2;p00 = x(Sli=1 LA2;p00i ). So we can remove the transition(p; x; p00) from A2without changing the languageLA2;p.When all transitions leading top00 are removed, we can removep00 itself ; if p00 is an initial state,eachp00i belongs toQDC0 and the languageLA2 is not changed. When all coverable states areremoved, we obtain the automatonC(A). ut

Theorem 5.2. Let L be a regular language and letB be an NFA such thatBR is an RFSA recognizingLR whose all states are reachable. ThenC(B) is the canonical RFSA recognizingL.

In order to prove this theorem, we introduce some lemmas.

Lemma 5.2. LetB = h�; QB ; Q0; F; Æi be an NFA such thatBR is a trimmed RFSA. Letq 2 QB andv 2 �� such thatLBR;q = (vR)�1LRB, let p 2 QR(B). Thenv 2 LB;p if and only if q 2 p.

Proof:Let u be such thatp = Æ(Q0; u); thusLB;p = u�1LB .

We have q 2 p , q 2 Æ(Q0; u), uR 2 LBR;q = (vR)�1LRB, vRuR 2 LRB, uv 2 LB, v 2 u�1LB = LB;p: utLemma 5.3. Let B = h�; QB ; Q0; F; Æi be an NFA such thatBR is a trimmed RFSA. For everyp,p0 2 QR(B), we haveLB;p � LB;p0 if and only if p � p0.Proof:If p � p0, thenLB;p = Sq2p LB;q � Sq2p0 LB;q = LB;p0 .

Conversely, letq 2 p. AsBR is an RFSA, there exists a wordv such thatLBR;q = (vR)�1LRB . FromLemma 5.2, we know thatv is inLB;p. AsLB;p � LB;p0 , v also belongs toLB;p0 and, using Lemma 5.2,we haveq 2 p0. utLemma 5.4. Let B be an NFA such thatBR is a trimmed RFSA. For everyp, p1, p2 : : : pn 2 QR(B),LB;p = S1�k�n LB;pk is equivalent top = S1�k�n pk.


Proof:It is obvious thatp = S1�k�n pk impliesLB;p = S1�k�nLB;pk .

Suppose thatLB;p = S1�k�n LB;pk . ThenLB;pk � LB;p for every1 � k � n. Using Lemma 5.4,we have,pk � p for all k and so,

S1�k�n pk � p.Let q 2 p. SinceBR is a trimmed RFSA, there exists a wordv such thatLBR;q = (vR)�1LRB. From

Lemma 5.2, we havev 2 LB;p. AsLB;p = S1�k�n LB;pk , there exists an indexk such thatv 2 LB;pk .From Lemma 5.2 again, we haveq 2 pk. Sop � S1�k�n pk. utProof:[Proof of Theorem 5.2]� Reachable sets of states ofB correspond to residual languages ofL and Lemma 5.4 shows that

composite residual languages correspond to coverable setsof states. So,p 2 QC if and only ifLB;p is a prime residual language ofL andQC can naturally be identified with the set of states ofthe canonical RFSA. Due to Lemma 5.3, we also verify thatLB;p = u�1L impliesp = Æ(Q0; u) ;the inverse is obvious.� QC0 = fp 2 QC j p � Q0g. Lemma 5.3 tells us thatp � Q0 is equivalent toLB;p � LB;Q0 = L.SoQC0 = fp 2 QC j LB;p � Lg corresponds to the set of initial states of the canonical RFSA.� FC = fp 2 QC j p \ F 6= ;g. We have" 2 LB;p if and only if there existsqi 2 p \ F . SoFC = fp 2 QC j " 2 LB;pg is the set of final states of the canonical RFSA.� For each statep 2 QC , letup 2 �� be such thatÆ(Q0; up) = p. Clearly,u�1p L = LB;p.For anyp; p0 2 QC andx 2 �,Æ(p; x) = Æ(Q0; upx) 2 R(B) andp0 2 ÆC(p; x) iff p0 � Æ(p; x):AsLB;p0 = u�1p0 L andLB;Æ(p;x) = (upx)�1L, we can deduce from Lemma 5.4 thatp0 � Æ(p; x) iff u�1p0 L � (upx)�1L:So, the transition functions are equivalent. ut

We can deduce from Lemma 5.1 and Theorem 5.2 that for any NFAA, C(C(AR)R) is the canonicalRFSA ofLA.

The simplified canonical RFSA can be computed from the canonical RFSA by using the operatormax. However, there exists a similar construction which allowsto obtain it directly from any NFA. LetC 0(A) = h�; QC ; QC00; FC ; ÆC0i withQC = fp 2 QR(A) j p is not coverablegQC00 = fp 2 QC j p � Q0 and 6 9p0 2 QC s.t. p ( p0 � Q0gFC = fp 2 QC j p \ F 6= ;gÆC(p; x) = fp0 2 QC j p0 � Æ(p; x) and 6 9p00 2 QC s.t. p0 ( p00 � Æ(p; x)g; for p 2 QC andx 2 �:


The simplified canonical RFSA is obtained byC 0(C 0(AR)R).Example 5.1. Figures 11, 12, 13 and 14, shows the construction of the canonical RFSA recognizing thelanguage��a�2 with � = fa; bg by using the subset construction.

The automatonA recognizes��a�2 and the reversal automatonAR is deterministic.The first steps of the construction ofC(A) are represented on Figure 12; they are identical to steps

in the classical subset construction. The state012 (resp.013) is coverable with01 and02 (resp.01 and03). Since the states012 and013 are coverable, it is not necessary to build the next states.On the Figure 13, the coverable states012 and013 have been removed. Transitions reaching the

state012 (resp.013) have been redirected to the states01 and02 (resp.01 and03). We obtain an RFSAwhich recognizes the language��a�2 and which is by accident the simplified canonical RFSA too.

Finally, the canonical RFSAC(A) is obtained by saturation. The state0 being included in each otherstate, every transition which reaches a state, has also to reach the state0.

Note that, as in the deterministic case, this construction may produce cumbersome intermediateautomata; indeed, it is possible to find examples for whichC(AR) has an exponential number of stateswith regard to the number of states ofA or C(C(AR)R). Thus in the worst case, this algorithm isexponential with regard to the size of the canonical RFSA. This situation can be observed with thereversal of the automaton used in Proposition 6.2.

6. Results on size of RFSAs

We classically take the number of states of an automaton as a measure of its size. It can be argued that thenumber of states of an automaton is suitable for DFAs but not for NFAs since the number of transitionsin the latter can be quadratic with regard to the number of states. However, our results show the existenceof exponential or superpolynomial gaps between the number of states of particular NFAs, RFSAs andDFAs. These results imply similar gaps between number of transitions.

The size of a canonical RFSA is bounded by the size of the equivalent minimal DFA and by the sizeof one of its equivalent minimal NFAs. We show that both bounds can be reached despite the fact thatthere is an exponential gap between them.

Proposition 6.1. There exist languages for which the minimal DFA has a size exponentially larger thanthe size of the canonical RFSA, and for which the canonical RFSA has the same size as the size of aminimal NFAs.

Proof:Consider the languagesLn = ��a�n, wheren is an integer and� = fa; bg.

It is well known that minimal NFAs forLn haven+ 2 states and thatLn has2n+1 distinct residuallanguages. It is easy to verify that onlyn+ 2 of them are prime:"�1Ln and(abi)�1Ln for 0 � i � n.utProposition 6.2. The size of the canonical RFSA of a languageL can be exponentially larger than thesize of a smallest NFA recognizingL.


0��- 1�� 2�� 3��a; b sa sa; b sa; bFigure 11. A is an NFA recognizing��a�2 ; AR is deterministic.

0��- 01�� 02�� 03��012�� 013��b sa sb sb�a �ak ak b

Figure 12. First steps of the construction ofC(A).0��- 01�� 02�� 03��b sa a sa; b sa; bk ak ak b

Figure 13. The coverable states have been removed and transitions are redirected; this automaton is an RFSA.

0��- 01�� 02�� 03��a; b sa a sa; b sa; bk ak a; b k a a; bk

a; bFigure 14. The canonical RFSAC(A) recognizing��a�2 is obtained by saturation.

F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata 19q0��-�� q1��q2��q3��

ja� a

Y a�a �bjbb

bFigure 15. AutomatonA4 whose canonical RFSA is exponentially larger thanA4.

Proof:We can verify this proposition on automataAn = h�; Q;Q0; F; Æi defined by� � = fa; bg,� Q = fqi j 0 � i � n� 1g,� Q0 = fqi j 0 � i < n=2g,� F = fq0g.� Æ(qi; a) = qi+1 for 0 � i < n� 1, Æ(qn�1; a) = q0, Æ(q0; b) = q0, Æ(qi; b) = qi�1 for 1 < i < n

andÆ(q1; b) = qn�1.The automatonA4 is represented in Figure 15.

The reversal ofAn is trimmed and deterministic, thus we can apply Theorem 5.2.The automatonC(An) is the canonical RFSA.The initial state in the subset construction hasdn=2e elements. The reachable sets of states are all

sets of states withdn=2e elements. So, none of them is coverable.Therefore, the canonical RFSAC(An) has a size exponentially larger than the size of the initial

NFA. utEvery non-empty residual language of a regular languageL has a minimal characteristic word whose

length is bounded by the number of states of the minimal DFA. Next proposition shows that this is nolonger true if we consider RFSA.

Proposition 6.3. There exist regular languages for which the smallest characterizing word for someresidual language is longer than any polynomial in the number of states of the canonical RFSA.

20 F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automataq20��-�� q21��

q30��-�� q31��q32��

qa; b2; b3i a; b2qa; b2; b3

) a; b2; b3K a; b3

b2; b3

b2; b3 b2; b3 b37b2

Figure 16. AutomatonAP for P = f2; 3g.Proof:

Let P = fp1; : : : ; png be a set ofn distinct prime numbers. Let us define the automatonAP =h�; Q;Q0; F; Æi by:� � = fag [ fbp j p 2 Pg,� Q = fqpj j p 2 P; 0 � j < pg,� Q0 = fqp0 j p 2 Pg,� F = Q0and Æ(qpj ; a) = fqp(j+1)mod pg for 0 � j < p; p 2 PÆ(qpj ; bp0) = fqpj ; qpj+1g for 0 � j < p� 1; p; p0 2 PÆ(qpp�1; bp0) = fqp00 g for p; p0 2 PSee Figure 16 forP = f2; 3g.


Let N = p1 � : : : � pn and letuij = aN�1biaj for 1 � i � n and0 � j < pi. We can check thatÆ(Q0; uij) = fqpij g . Therefore,LAP ;qpij = u�1ij L andAP is an RFSA.

Let 1 � i � n, 0 � j < pi and1 � k � n, 0 � l < pk.If pi � j 6= pk � l, thenapi�j 2 LAP ;qpij n LAP ;qpkl .

If pi � j = pk � l, thena2pi�j 2 LAP ;qpij n LAP ;qpkl .

Therefore, all residual languagesLAP ;qpij are different, none of them is included in another one andAP has the same number of states as the canonical RFSA.

We can check that for everyu 2 �<N , jÆ(Q0; u)j > 1. So, the smallest worduq such thatLAP ;q =u�1q L has a lengthjuqj � N .Now, letf be some polynomial. We can choose different prime numbersp1; : : : ; pn such thatp1 �: : :� pn > f(p1 + : : :+ pn) = f(jAP j). utNext proposition shows that the simplified canonical RFSA can have far less transitions that the

canonical RFSA.

Proposition 6.4. The number of transitions of the canonical RFSA of a languageL can be quadratic wrtthe number of transitions of the equivalent simplified canonical RFSA.

Proof:We can verify this proposition on languagesLn = a�an wheren 2 N.

It is easy to verify that the number of transitions of the simplified canonical RFSA ofL isNQ = n+1and that the number of transitions of the canonical RFSA is(N2Q + 3NQ)=2 � 1.

Forn = 3, the canonical RFSA and simplified canonical RFSA are represented in Figures 18 and 19.ut7. RFSAs over a one-letter alphabet

Here, we consider the case where the underlying alphabet possesses only one letter and we comparethe state complexity of DFAs and RFSAs. Such a study has already been done for DFAs and NFAsin [2] where it has been shown that the minimal DFA equivalentto a given NFA withm states couldhave�(epm logm) states. Here we show that the number of states of the minimal DFA over a one-letteralphabet is at most quadratic in terms of the number of statesof the equivalent canonical RFSA.

In all Section 7, we suppose that� = fag, L is a non-empty regular language over� andA =h�; Q; q0; F; Æi is the trimmed minimal DFA which recognizesL. Let us setnL = jQj� 1. Let us defineqi = Æ(q0; ai) and for sake of simplicity, letLi = LA;qi , for 0 � i � nL.If L is infinite, Æ(q0; ai) 6= ; for any integeri and previous notations can be extended: for every

integeri, let qi = Æ(q0; ai) andLi = LA;qi . LetmL be the smallest index such thatqmL = Æ(qnL ; a) andlet d = n�mL+1 be the length of the loop (see Figure 20). Note thatd > 0 and that for alli � n1 andevery non-negative integerk, qi+kd = qi.


0��- 1�� 2�� 3��a sa sa saFigure 17. An NFAA which recognizesa�a3; AR is deterministic.

0��- 01�� 012�� 0123��a sa sa saa a ak a k a k a} a } a} aFigure 18. The canonical RFSAC(A) which recognizesa�a3.0��- 01�� 012�� 0123��sa sa sa a

Figure 19. The simplified canonical RFSAC 0(A) which recognizesa�a3.q0��- q1�� qmL�� qnL��..... .....-a -a -a -a -a= a

Figure 20. A minimal DFA that recognizes an infinite regular language overfag (final states have been omitted).


7.1. Inclusion relations between residual languages of an infinite regular language overa one-letter alphabet

We suppose in this subsection thatL is infinite.

Lemma 7.1. For all integersi; j; k, Li � Lj ) Li+k � Lj+k:Proof:Let u 2 Li+k. We haveai+ku 2 L, i.e. aku 2 Li. Thenaku 2 Lj andaj+ku 2 L. Thereforeu 2 Lj+k. utLemma 7.2. For all integersi; j, Li � Lj ) d divides(i� j):Moreover, ifLi ( Lj, min(i; j) < mL.

Proof:Let r be an integer such thatrd � i + j � 0 andmL + rd � i � 0. Let i1 = i + (mL + rd � i) andj1 = j + (mL + rd� i). We haveLi1 = LmL+rd = LmL � Lj1 andj1 � mL. Applying Lemma 7.1with k = j1 �mL we obtain,LmL � Lj1 � L2j1�mL � : : : � Ld(j1�mL)+mL = LmL :AsA is minimal, this implies thatqj1 = qmL , i.e. thatd dividesj1 �mL and alsoi� j.

Asmax(i; j) = min(i; j) + max(i;j)�min(i;j)d � d, if Li ( Lj, we must havemin(i; j) < mL. utProposition 7.1. We have the following cases

1. If A is a loop, i.e. ifmL = 0, then there are no inclusion relations between residual languages.

2. If there are no inclusion relations betweenLmL�1 andLnL , then there are no inclusions betweenresidual languages.

3. If LmL�1 ( LnL , then[Li ( Lj ) i < j℄.4. If LnL ( LmL�1, then[Li ( Lj ) i > j℄.

Proof:The first assertion is clear from Lemma 7.2.

If there exist somei; j such thatLi ( Lj, we havemin(i; j) � mL� 1. If k = mL� 1�min(i; j),thenLmin(i;j)+k = LmL�1 andLmax(i;j)+k = LnL . Now, using the Lemma 7.1, the three last points areclear. ut


7.2. Prime and composite residual languages

Proposition 7.2. Every non-empty residual language of a finite regular language over a one-letter alpha-bet is prime.

Proof:LetL be a finite regular language over� and letA be its trimmed minimal DFA. We haveqnL 2 F andÆ(qnL ; a) = ;. For any0 � i � nL, anL�i 2 Li. Let j 6= i. If anL�i 2 Lj, thenj < i andanL�j 62 Li,i.e.Lj 6� Li. Hence,Li is prime. utLemma 7.3. Let L be an infinite regular language over a one-letter alphabet and letA be its trimmedminimal DFA. If some residual language ofL is composite, thenA is not a loop, i.e.mL > 0, andLmL�1 ( LnL .

Proof:Suppose that some residual language ofL is composite. From Proposition 7.1,mL > 0 and theremust exists an inclusion relation betweenLmL�1 andLnL . Suppose thatLnL ( LmL�1. Let qj bea composite state. From Proposition 7.1 and Lemma 7.2, we have j < mL. Let u 2 LmL�1. WehaveamL�1�ju 2 Lj. Let i such thatLi ( Lj andamL�1�ju 2 Li. We haveu 2 LmL�1+i�j .From Proposition 7.1, we havei > j and from Lemma 7.2, we havei � j = rd with r > 0. So,u 2 LmL�1+i�j = LnL+(m�1)d = LnL . Therefore,LmL�1 = LnL which is contradictory. utProposition 7.3. LetL be a regular language over a one-letter alphabet and letA be its trimmed minimalDFA. If some stateq of A is composite, then all the states which followq are composite too.

Proof:From previous lemmas,L is infinite,A is not a loop,LmL�1 ( LnL andLi ( Lj implies i < j. Let ibe the first index such thatLi is a composite residual language. LetRi = fj j j < i andLj ( Lig. WehaveLi = Sj2Ri Lj. It is easy to verify that for everyi � k � nL, we haveLk = Sj2Ri Lj+k�i andLj+k�i 6= Lk for everyj 2 Ri. ut7.3. Ratio between the size of the minimal DFA and the canonical RFSA for a one-letter

alphabet regular language

If all states of the minimal DFAA are prime, the canonical RFSA has the same size as the minimalDFA.From Proposition7.2 and Lemma 7.3, it remains to consider the case whereLmL�1 ( LnL . LetQP bethe set of prime states ofQ andnP = jQP j. Due to Proposition 7.3,i � nP implies thatqi is composite.LetAS = h�; QP ; fq0g; F \QP ; ÆSi be the trimmed NFA built from�(A; qnP ) where� is the reductionoperator defined in Section 4.3:AS is an RFSA recognizingL that has the same number of states as thecanonical RFSA. Letn0 be the smallest index such thatqn0 2 ÆS(qnP�1; a).Lemma 7.4. For every i � 0, we havejÆS(q0; ai)j � jÆS(q0; ai+1)j. Moreover, if jÆS(q0; ai)j =jÆS(q0; ai+nP�n0)j for somei � n0, thenÆS(q0; ai) = ÆS(q0; ai+nP�n0).

F. Denis, A. Lemay, A. Terlutte / Residual Finite State Automata 25q0��- q1�� q2�� qn�1��ja ja ja jai aY a .....

Figure 21. The minimal DFA corresponding to this canonical RFSA has(n� 1)2 + 2 states.

Proof:If i < nP , ÆS(q0; ai) = fqig. Now let i � nP . Due to the wayÆS is built, qj 2 ÆS(q0; ai) impliesn0 � j < nP . We haveÆS(q0; ai+1) = Sqj2ÆS(q0;ai) ÆS(qj; a). If j < nP � 1, thenÆS(qj ; a) = fqj+1gand if j = nP � 1, thenqn0 2 ÆS(qj ; a). And asqn0 62 Sfqj2ÆS(q0;ai)jj<nP�1g ÆS(qj ; a), we havejÆS(q0; ai+1)j � jÆS(q0; ai)j, which proves the first point.

For everyj � n0, we haveqj 2 ÆS(qj; anP�n0). So, for everyi � n0, ÆS(q0; ai) � ÆS(q0; ai+nP�n0).This proves the second point. utProposition 7.4. Let L be a regular language over a one-letter alphabet. LetnR (resp. nP ) be thenumber of non empty residual languages (resp. prime residual languages) ofL. Then,nR � (nP � 1)2 + 2:Proof:If nR = nP , the proposition is clear. Otherwise, letA be the minimal DFA ofL and letAS =h�; QS ; fq0g; FS ; ÆSi be the trimmed NFA built from�(A; qnP ). The number of reachable states inAS is an upper bound fornR. Let us define the functionf by f(i) = jÆS(q0; ai)j for any integeri. Weverify that:� For everyn such that1 < n < nP , we can show using Lemma 7.4 that either there arenP � n0

statesqi such thatf(i) = n but in this case there is no indexi such thatf(i) > n, or there are atmostnP � n0 � 1 statesqi such thatf(i) = n.� There is at most one stateqi such thatf(i) = nP � n0.

From this, we can calculate thatnR � nP + (nP � 1)(nP � 2) + 1 = (nP � 1)2 + 2 states. utThe upper bound is reached by the automaton described in Figure 21.

8. Complexity results

We have defined the notions of RFSAs, saturated automata, canonical RFSAs; in this section, we evaluatethe complexity of constructions and decision problems linked to them.

We shall mainly use the following classical complexity results concerning finite automata (quotedfrom [6]).


Proposition 8.1. Deciding whether two NFAs recognize the same language is aPSPACE� ompleteproblem.

As an immediate corollary: given two NFAsA andA0, deciding whetherLA � LA0 is aPSPACE� omplete problem. The problem is quadratic ifA andA0 are DFAs.Deciding whether the intersection of two DFAs is empty can bedone in quadratic time. On the other

hand:

Proposition 8.2. Deciding whether the intersection ofnDFAs is empty or not is aPSPACE� ompleteproblem.

The first notion that we defined issaturation. Clearly, deciding whether a DFA is saturated is apolynomial problem. For NFAs, we have the following result.

Proposition 8.3. Deciding whether an NFA is saturated is aPSPACE � omplete problem.

Proof:Given an oracle which decides whether the language recognized by a given NFA is included in anotherone, we can build the saturated of a given NFA within polynomial time.

Given an oracle which builds the saturated of a given NFA, we can say whether a given NFA issaturated within polynomial time.

It remains to prove that the inclusion problem between two languages represented by NFAs polyno-mially reduces to the problem of deciding whether an NFA is saturated.

Let A = h�; Q;Q0; F; Æi andA0 = h�; Q0; Q00; F 0; Æ0i be two NFAs. We can suppose thatA andA0 are trimmed, thatQ \Q0 = ; and that they have a unique initial state which cannot be reached fromother states. LetQ0 = fq0g, Q = fq0; q1; : : : ; qlg, Q00 = fq00g andQ0 = fq00; q01; : : : ; q0l0g. We completethe alphabet� by addingl + l0 + 2 new lettersx1; : : : ; xl; x01; : : : ; x0l0 ; z; t: let �0 be the new alphabet.We consider two new statesqe andqf and letB = h�0; Q[Q0[fqe; qfg; fq00g; F [F 0[fqfg; Æ00i whereÆ00 contains the transitions ofÆ [ Æ0 and the transitions defined below:qf 2 Æ(qi; xi) for i = 1 : : : l andqf 2 Æ(q0i; x0i) for i = 1 : : : l0;qe 2 Æ(q0; x) \ Æ(q00; x) for x 2 �;qe 2 Æ(qe; x) for x 2 �;qf 2 Æ(qe; xi) for i = 1 : : : l andqf 2 Æ(qe; x0i) for i = 1 : : : l0;qf 2 Æ(qf ; z) andqf 2 Æ(qe; t) (see Figure 22).

Now, it is easy to verify thatB is saturated if and only ifLA 6� LA0 . utNext proposition shows that deciding whether a given NFA is an RFSA is also difficult.

Proposition 8.4. Deciding whether an NFA is an RFSA is aPSPACE � omplete problem.


q0��q00��-

qi��qe��

q0j��qf��R� ~

xijxi, x0j , t� z�� >x0j

AutomatonA

AutomatonA0Figure 22. The automaton is saturated iffLq0 6� Lq00 .


Proof:

First, we prove that the problem of deciding whether the union of n regular languages described byDFAs is equal to�� can be polynomially reduced to the problem of deciding whether an NFA is a RFSA.

We can considern DFAs A1 = h�; Q1; q1; Q1F ; Æ1i; : : : ; An = h�; Qn; qn; QnF ; Æni wherei 6= jimpliesQi \Qj = ;.

Let x1; : : : ; xn; y1; : : : ; yn; a; b be2n+2 new letters. We can build the NFAA = h�; Q;QI ; QF ; Æiwhere� Q = Si=1:::nQi [ fq1; : : : ; qn; qe; qf ; qgg wherefq1; : : : ; qn; qe; qf ; qgg are new states,� QI = fq1; : : : ; qn; qe; qfg� QF = Si=1:::nQiF [ fqgg� Æ = (Si=1:::n Æi) [ (Si=1:::nf(qi; xi; qi); (qi; yi; qi); (qe; a; qi)g) [ f(qe; b; qe);(qf ; b; qf ); (qf ; a; qg)g [ f(qg; x; qg) j x 2 �g (see Figure 23).

Every states, except maybeqe, defines a residual language of the described language:� statesqi correspond to residual languages of wordsyi,� statesq of setsQi correspond to residual languages ofxiu whereÆi(qi; u) = q,� qf corresponds to the residual language ofb and� qg corresponds to the residual language ofa.

Stateqe defines a residual language of the language if and only if the union of recognized languagesby Ai automaton is equal to��. That is,A is an RFSA if and only if the union of languages describedbyAi automaton is equal to��

It remains to show that we can decide whether a given NFAA is an RFSA within polynomial space.Consider the subset construction defined in Section 5. The reachable set of states ofA can be enumeratedwithin polynomial space: therefore, for each stateq of A and for each reachable set of states, decidewhether they define equal regular languages. The NFAA is an RFSA if and only if all its states areequivalent to some reachable set of states. ut

Using similar techniques, it can be shown that deciding whether the saturated of a given DFA is thecanonical RFSA is a PSPACE-complete problem. Hint: givenn DFAsA1; : : : ; An over�, it is easy tobuild a DFA over an extended alphabet�0 such that[ni=1LAi 6= �� iff the saturated ofA is the canonicalRFSA.

9. Conclusion

The class of RFSAs can be viewed as an intermediary class between the DFAs and the NFAs. Basedon an important property of the DFAs, namely the fact that each state of an automatonA must define aresidual language of the language recognized byA, the RFSAs also share with the DFAs the property


qe��-q1��- q1�� qn��- qn��

qf��-qg��y1 yn �

b I b~x1 ~xnO a Æa 6a

Automaton Automaton

...

A1 An

Figure 23. A is an RFSA iff the union of languages described by theAi is equal to��.

of having a minimal canonical form. On the other hand, the canonical RFSAs can be in some cases asconcise as minimal NFAs.

It has been indicated in the introduction that the ideas developped in this paper come from a workdone in the domain of Grammatical Inference. A main problem in this field is to infer efficiently (arepresentation of) a regular language from a finite set of examples of this language. Some positiveresults can be proved when regular languages are represented by DFAs. For example, it has been shownthat Regular Languages represented by DFAs can be infered from given data([7, 8]). In this framework,classical inference algorithms such as RPNI [13] need a polynomial number of examples relatively to thesize of the minimal DFA that recognizes the language to be infered. So, regular languages as simple as��a�n cannot be infered efficiently using these algorithms. Hence, it is a natural idea to think of usingother kind of representations for regular languages, such as NFAs. Unfortunately, it has been shownthat Regular Languages represented by NFAs cannot be efficiently infered from given data ([8]). Themain difficulty that arises when one try to build an NFA from examples comes from the fact that statesdo not correspond to natural components of the associated language. So, we defined RFSAs in order toobtain an automata representation of regular languages forwhich states correspond to residual languages.RFSAs have been used to design grammatical inference algorithms in [3, 4].

References

[1] Brzozowski, J. A.: Canonical regular expressions and minimal state graphs for definite events, in:Mathe-matical Theory of Automata, vol. 12 ofMRI Symposia Series, 1962, 52–561.

[2] Chrobak, M.: Finite automata and unary languages.,Theorical Computer Science, 1986, 47(2):149–158.

[3] Denis, F., Lemay, A., Terlutte, A.: Learning regular languages using non deterministic finite automata,ICGI’2000, 5th International Colloquium on Grammatical Inference, 1891, Springer Verlag, 2000.


[4] Denis, F., Lemay, A., Terlutte, A.: Learning regular languages using RFSA,ALT 2001, number 2225 inLecture Notes in Artificial Intelligence, Springer Verlag,2001.

[5] Denis, F., Lemay, A., Terlutte, A.: Residual Finite State Automata,STACS 2001, 18th Annual Symposiumon Theoretical Aspects of Computer Science, number 2010 in Lecture Notes in Computer Science, SpringerVerlag, 2001.

[6] Garey, M. R., Johnson, D. S.:Computers and Intractability, a Guide to the Theory of NP-Completness, W.H.Freeman and Co, San Francisco, 1979.

[7] Gold, E.: Complexity of Automaton Identification from Given Data,Inform. Control, 37, 1978, 302–320.

[8] Higuera, C. D. L.: Characteristic Sets for Polynomial Grammatical Inference,Machine Learning, 27, 1997,125–137.

[9] Hopcroft, J., Ullman, J.:Introduction to Automata Theory, Languages, and Computation, Addison-Wesley,1979.

[10] Kleene, S. C.: Representation of Events in Nerve Nets and Finite Automata, in:Automata Studies, Annals ofMath. Studies 34(C. Shannon, J. McCarthy, Eds.), New Jersey, 1956.

[11] Myhill, J.: Finite Automata and the Representation of Events, Technical Report 57-624, WADC, 1957.

[12] Nerode, A.: Linear Automaton Transformation,Proc. American Mathematical Society, 9, 1958.

[13] Oncina, J., Garcia, P.: Inferring regular languages inpolynomial update time,Pattern Recognition and ImageAnalysis, 1992.

[14] Yu, S.: Handbook of Formal Languages, Regular Languages, vol. 1, chapter 2, Springer Verlag, 1997,41–110.