Emptiness and Finiteness for Tree Automata with Global Reflexive Disequality Constraints

J Autom Reasoning (2013) 51:371–400DOI 10.1007/s10817-012-9270-5

Emptiness and Finiteness for Tree Automatawith Global Reflexive Disequality Constraints

Carles Creus · Adrià Gascón · Guillem Godoy

Received: 15 June 2011 / Accepted: 16 October 2012 / Published online: 8 November 2012© Springer Science+Business Media Dordrecht 2012

Abstract In recent years, several extensions of tree automata have been considered.Most of them are related with the capability of testing equality or disequalityof certain subterms of the term evaluated by the automaton. In particular, treeautomata with global constraints are able to test equality and disequality of subtermsdepending on the state to which they are evaluated. The emptiness problem is knowndecidable for this kind of automata, but with a non-elementary time complexity, andthe finiteness problem remains unknown. In this paper, we consider the particularcase of tree automata with global constraints when the constraint is a conjunction ofdisequalities between states, and the disequality predicate is forced to be reflexive.This restriction is significant in the context of XML definitions with monadic key con-straints. We prove that emptiness and finiteness are decidable in triple exponentialtime for this kind of automata.

Keywords Tree automata · Global constraints · Disequality constraints ·Decision problems

The authors were supported by Spanish Ministry of Education and Science by theFORMALISM project (TIN2007-66523). The second author was also supported by a FPUgrant from the Spanish Ministry of Education. The last author was also supported by theSpanish Ministry of Science and Innovation SweetLogics project (TIN2010-21062-C02-01).

C. Creus · A. Gascón · G. Godoy (B)Universitat Politècnica de Catalunya, Omega Building, Jordi Girona 1-3,Barcelona 08034, Spaine-mail: [email protected]

C. Creuse-mail: [email protected]

A. Gascóne-mail: [email protected]

372 C. Creus et al.

1 Introduction

Due to their good computational properties, tree automata is a widely used formal-ism for representing sets of terms. For example, they can be used to describe sets ofvalid configurations of programs in software verification, or to define data types andefficiently compute searches in XML.

Unfortunately, the expressive power of tree automata is limited since runs arebased just on a finite number of states. In particular, they cannot test whether certainsubterms of the input term are equal or not. For instance, the language { f (t1, t2) | t1 �=t2} is not recognizable by a tree automaton.

In recent years, several extensions to overcome this limitation in expressivenesshave been considered. Some of these extensions allow to impose equality or dise-quality of certain subterms of the given term. According to how the comparisonsare imposed, we can distinguish two kinds of such constrained automata. On the oneside, we have the tree automata with local equality and disequality constraints. In thiscase, each transition rule of the automaton is associated with a boolean combinationof atomic predicates of the form p1 ≈ p2 or p1 �≈ p2 for positions p1, p2. Suchatomic constraints hold for a given rule application if the subterms at relative posi-tions p1 and p2 of the subterm where the rule is applied are equal in the first caseand different in the second one. The rule can be applied if the entire constraint holds.Languages like { f (t1, t2) | t1 �= t2} can be described with these kind of transition rules.Unfortunately, the gain in expressiveness comes with a loss of good computationalproperties. In particular, emptiness and finiteness are undecidable for these kindof automata [5, 12]. Nevertheless, both properties have been proven decidable forseveral particular cases [3, 6–8]. On the other side, we have the tree automatawith global equality and disequality constraints. In this case, the form of the tran-sition rules is the same as for unconstrained tree automata. With respect to plaintree automata, the automata of this kind have, in addition, a boolean combinationof predicates of the form q1 ≈ q2 or q1 �≈ q2 for states q1, q2. Such atomic con-straints hold for a given run if each two subterms reaching q1 and q2, respectively,are equal in the first case and different in the second one. The entire constraintmust hold in a correct run. It is easy to see, using the results in [10, 11], that themembership problem for this kind of automata is NP-complete. As in the case ofthe kind of automata mentioned above, languages like { f (t1, t2) | t1 �= t2} can bedescribed with global constraints, but also, several natural properties are undecidablefor them or their decidability remains unknown. In particular, the emptiness propertyis decidable for this kind of automata [2], but the complexity of the currently knownalgorithm is non-elementary. Moreover, decidability of the finiteness property re-mains as an open question and universality is known to be undecidable [10, 11].Better results for emptiness and finiteness are known for particular cases. Forexample, when the boolean formula is a conjunction of atoms over the predicate≈, emptiness and finiteness are EXPTIME-complete [10]. If, in addition, ≈ alwaysrelates identical states, then emptiness is decidable in linear time [11] and finitenessis decidable in polynomial time [10]. Also, when the formula is a conjunction ofatoms over �≈ and such a predicate is irreflexive, emptiness is in NEXPTIME [10].Automata with global constraints have been used to prove decidability of extensionsof Monadic Second Order logic (MSO) with restricted tree (dis)equality tests [10]and tree (dis)equality tests with counting constraints [2] interpreted on both rankedand unranked terms. Other variants of MSO that are interpreted on data terms andrelated models [4, 9] also have been studied recently.

TA with Global Reflexive Disequalities 373

In the present paper we consider the case where the global constraint is a conjunc-tion of atoms over the predicate �≈, and this predicate is forced to be reflexive overthe set of states involved in the constraint. In other words, whenever a state q occursin some atom, then the atom q �≈ q also appears in the constraint. This restrictionis significant in the context of XML definitions with monadic key constraints. Forthis particular kind of global constraints, we prove that emptiness and finiteness aredecidable in triple exponential time.

The paper is structured as follows. In Section 2 we introduce standard notationsand definitions used in the paper. In Section 3 we analyze the expressive power ofour kind of automata. In Section 4 we define the notion of compatible runs, whichintuitively means that two runs are compatible if both may appear as a subrun ofanother run without falsifying the constraint. Also, we show how to compute whichstates have infinite sets of pairwise compatible runs reaching them. In Section 5 wesimplify the emptiness and finiteness problems by properly removing such statesfrom the given automaton. Finally, in Section 6 we describe our decision proceduresfor emptiness and finiteness of the simplified version of the automaton.

2 Preliminaries

2.1 Terms

We use standard notations from the term rewriting literature [1]. A signature � isa (finite) set of function symbols with arity, which is partitioned as ∪i�

(i) such thatf ∈ �(m) if the arity of f is m. We sometimes denote � explicitly as { f1 : m1, . . . , fn :mn}, where f1, . . . , fn are the function symbols and m1, . . . , mn are the correspondingarities. We define maxar(�) as max({m1, . . . , mn}). Symbols in �(0), called constants,are denoted by a, b , with possible subscripts. The set T (�) of ground terms (or justterms) over � is the smallest set such that f (t1, . . . , tm) is in T (�) whenever f ∈ �(m),and t1, . . . , tm ∈ T (�). A language over � is a set of ground terms.

A position is a sequence of natural numbers. The symbol λ denotes the emptysequence, also called the root position, and p.p′ denotes the concatenation of thepositions p and p′. The set of positions of a term t, denoted Pos(t), is defined recur-sively as Pos( f (t1, . . . , tm)) = {λ} ∪ {i.p | i ∈ {1, . . . , m} ∧ p ∈ Pos(ti)}. The length ofa position is denoted as |p|. Note that |λ| = 0 and |i.p| = 1+ |p| hold. A position p1 isa pref ix of a position p, denoted p1 ≤ p, if there is a position p2 such that p1.p2 = pholds. Also, p1 is a proper pref ix of p, denoted p1 < p, if p1 ≤ p and p1 �= p hold.Two positions p, p′ are parallel, denoted by p‖p′, if p �≤ p′ and p′ �≤ p hold.

The subterm of a term t at a position p, denoted t|p, is defined recursively ast|λ = t and f (t1, . . . , tm)|i.p = ti|p. The replacement of a term t at a position p bya term s, denoted t[s]p, is defined recursively as t[s]λ = s and f (t1, . . . , tm)[s]i.p =f (t1, . . . , ti[s]p, . . . , tm). The height of a term t, denoted height(t), is defined re-cursively as height(t) = 0 if t is a constant, and as height( f (t1, . . . , tm)) = 1+max({height(t1), . . . , height(tm)}) otherwise.

2.2 Tree Automata

A tree automaton (TA, see [5]) is a tuple A = 〈Q, �, F,�〉, where Q is a set of states,� is a signature, F ⊆ Q is the subset of final states (also called accepting states),and � is a set of rules of the form f (q1, . . . , qm)→ q, where q1, . . . , qm, q ∈ Q and

374 C. Creus et al.

f ∈ �(m). The size of A is defined by |A| = |Q| + |�| +maxar(�). Usually, the sizeof a TA is not defined by means of the signature � [5], since it is common to consider� as fixed for the problem. However, the asymptotic cost of our algorithm is thesame in both notions.

A run r of a TA A = 〈Q, �, F,�〉 on a term t is a function r : Pos(t)→ �

satisfying that, for each position p ∈ Pos(t), if t|p is of the form f (t1, . . . , tm) andr(p.1), . . . , r(p.m) are rules with right-hand side states q1, . . . , qm ∈ Q, respectively,then r(p) is a rule of � of the form f (q1, . . . , qm)→ q, for some q ∈ Q. By abuse ofnotation we also write r(p) for the right-hand side state of the rule r(p), depending onthe context. Moreover, since t can be deduced from r, we often do not make explicit tand just say that r is a run of A. As usual, we sometimes describe a run r as a term inT (�). A run r is accepting if r(λ) is accepting. A term t is accepted or recognized byA if there exists an accepting run of A on t. The language recognized by A, denotedL(A), is the set of terms accepted by A. By L(A, q) we denote the set of terms forwhich there exists a run r of A on them holding r(λ) = q. We say that a language Lis regular if there exists a TA A such that L(A) = L.

Given a TA A, a term t and a run r of A on t, we define the subrun r|p as therun of A on t|p described by r|p(p′) = r(p.p′). Moreover, the subrun is strict if p �=λ. In addition, we define Pos(r) as Pos(t), height(r) as height(t), and term(r) as t.Given two runs r1, r2, and a position p ∈ Pos(r1) such that r1|p and r2 reach the samestate, we define the replacement r1[r2]p as the run r on term(r1)[term(r2)]p defined asfollows: r(p′) = r2( p), if p′ is of the form p. p, and r(p′) = r1(p′), otherwise.

2.3 Tree Automata with Global Disequality Constraints

A tree automaton with global disequality constraints (TAG[�≈]) is a tuple A =〈Q, �, F, D,�〉, where 〈Q, �, F,�〉 is a tree automaton, denoted ta(A), and D isa conjunction/set of atomic constraints of the form (q �≈ q′), where q, q′ ∈ Q. To easethe presentation, we denote that a state q is involved in some constraint as q ∈ D.The size of A is defined by |A| = |ta(A)|.

A run r of a TAG[�≈] A = 〈Q, �, F, D,�〉 on a term t is a run of ta(A) ont satisfying that, for every pair of different positions p1, p2 in Pos(t), if (r(p1) �≈r(p2)) ∈ D, then t|p1 �= t|p2 . Moreover, as with TA’s, we often do not make explicit tand just say that r is a run of A. A term t is accepted or recognized by A if there existsan accepting run of A on t. The language recognized by A, denoted L(A), is the setof terms accepted by A. By L(A, q) we denote the set of terms for which there existsa run r of A on them holding r(λ) = q.

In this paper we focus on a particular case of TAG[�≈], that we denote TAG[ �≈R],where the relation represented by D is reflexive, i.e. D satisfies that if a constraint(q1 �≈ q2) occurs in D then the constraints (q1 �≈ q1) and (q2 �≈ q2) also occur in D.For completeness, we introduce two further particular cases that will be helpful tocharacterize the expresive power of TAG[�≈R]’s. We say that a TAG[�≈R] is a TAGRwhen D only contains reflexive constraints, i.e. when the atomic constraints are all ofthe form (q �≈ q). By considering similar restrictions, we define a TAGI as a TAG[�≈]where D represents an irreflexive relation, i.e. the atomic constraints must be all ofthe form (q1 �≈ q2) for different q1 and q2.

2.4 Partitions

A partition P of a set S is a set of disjoint sets P1, . . . , Pn holding S = ∪i∈{1,...,n}Pi andPi �= ∅. Each Pi is said to be a part of the partition. Note that S = ∅ implies P = ∅.


With⋃

P we denote ∪i∈{1,...,n}Pi. We define the equivalence relation ∼P on⋃

P ase1 ∼P e2 if and only if exists Pi ∈ P such that e1, e2 ∈ Pi.

3 Expressive Power

In this section we compare the expressive power of TAG[�≈R]’s, TAGI’s, andTAGR’s and conclude that the classes of languages recognizable by TAG[ �≈R]’sand TAGI’s are incomparable, and that TAG[�≈R]’s are strictly more expressivethan TAGR’s. To ease the presentation, in the rest of the paper we denoteby hn(a) the term h(. . . (h(a)) . . .) with n occurrences of h, where h is an unaryfunction symbol, n is a natural number, and a is a constant symbol. The samenotation is also used for runs. Moreover, in this section we denote terms of theform f (t1, f (t2, f (t3, . . . f (tn−1, tn) . . .))) as f [t1, . . . , tn], where f is a binary functionsymbol.

Lemma 1 The class of languages recognizable by TAGI’s is incomparable withthe classes of languages recognizable by TAG[�≈R]’s and TAGR’s with respect toinclusion.

Proof Let � = { f : 2, h : 1, a : 0} be a signature. We first show that the class of lan-guages recognizable by TAGR’s is not included in the class of languages recognizableby TAGI’s. Since TAGR’s are a particular case of TAG[�≈R]’s, this claim holds alsofor TAG[�≈R]’s. Consider the following language over �:

L = {f[hk1(a), . . . , hkn(a), a

] | n ≥ 1 ∧ k1, . . . , kn ≥ 0 ∧ ∀1≤i< j≤n : ki �= k j}

It is straightforward that L can be recognized by a TAGR. We proceed bycontradiction assuming that there exists a TAGI A recognizing L. Let n be a naturalnumber strictly greater than the number of states of A and consider the followingterm t ∈ L:

f

1 {ha

f

2

⎧⎨

⎩

h

h

a

f

3

⎧⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎩

h

h. . .

h

a

...

f

n

⎧⎪⎨

⎪⎩

h...

h

a

a

By the assumption, there exists an accepting run r of A on t. Since n is greaterthan the number of states of A, there exist two different natural numbers i, j such

376 C. Creus et al.

that the positions p1 =i

︷︸︸︷2. . . . .2 .1 and p2 =

j︷︸︸︷2. . . . .2 .1 satisfy that p1, p2 ∈ Pos(r) and

the subruns r|p1 and r|p2 reach the same state. Let r′ be the replacement r[r|p1 ]p2 andnote that term(r′) �∈ L. The fact that r′ satisfies the constraint of A follows from thefact that r is a run and that, since constraints of the form (q �≈ q) do not occur inthe constraint of A, multiple occurrences of r|p2 do not falsify any constraint. Hence,term(r′) ∈ L(A) but term(r′) �∈ L, a contradiction.

We now show that the class of languages recognizable by TAGI’s is not includedin the class of languages recognizable by TAG[ �≈R]’s. Since TAGR’s are a particularcase of TAG[�≈R]’s, the claim holds also for the class of languages recognizable byTAGR’s. Consider the following language over �:

L = {f[hk(a), hk1(a), . . . , hkn(a), a

] | n ≥ 1 ∧ k, k1, . . . , kn ≥ 0 ∧ k �= k1, . . . , kn}

It is straightforward that L can be recognized by a TAGI. We proceed by contra-diction assuming that there exists a TAG[�≈R] A recognizing L. Let n be a naturalnumber strictly greater than the number of states of A and consider the followingterm t ∈ L defined as f [t, t1, . . . , tnn , a], where t = hn!+n(a) and t1 = . . . = tnn = hn(a).

f

n! + n

⎧⎪⎨

⎪⎩

h = t...

h

a

f

n

⎧⎪⎨

⎪⎩

h = t1...

h

a

f

n

⎧⎪⎪⎨

⎪⎪⎩

h = t2...

h. . .

a

...

f

n

⎧⎪⎨

⎪⎩

h = tnn

...

h

a

a

By the assumption, there exists an accepting run r of A on t. Since n is greater thanthe number of states of A and nn is greater than the number of different sequences ofstates of A of length n, there exist two different natural numbers i, j ≥ 1 such that the

positions p1 =i

︷︸︸︷2. . . . .2 .1 and p2 =

j︷︸︸︷2. . . . .2 .1 satisfy that p1, p2 ∈ Pos(r) and r|p1 =

r|p2 hold. Since the constraint relation defined by the constraint of A is reflexive, itfollows that the subruns r|p1 and r|p2 do not contain states involved in a constraintof A. Moreover, since height(r|p1) = n, which is greater than the number of states ofA, it follows that r|p1 can be pumped to obtain a run r′ such that term(r′) = t, r′ andr|p1 reach the same state, and r′ does not contain states involved in a constraint ofA. Finally, note that term(r[r′]p1) = t[t]p1 , which is not in L. However, r[r′]p1 is anaccepting run, a contradiction. ��


Lemma 2 The class of languages recognizable by TAGR’s is strictly included in theclass of languages recognizable by TAG[ �≈R]’s.

Proof We only need to prove that the inclusion is strict since TAGR’s are aparticular case of TAG[�≈R]’s. Let � = {g : 3, f : 2, h : 1, a : 0} be a signature andconsider the following language over �:

L ={

g(

f[hα1(a), . . . , hαn(a), a

], hβ(a), f

[hγ1(a), . . . , hγk(a), a

])

| n, k ≥ 1 ∧α1, . . . , αn, β, γ1, . . . , γk ≥ 0 ∧∀1≤i< j≤n : αi �= α j ∧∀1≤i< j≤k : γi �= γ j ∧β �= α1, . . . , αn, γ1, . . . , γk

}

It is straightforward that L can be recognized by a TAG[ �≈R]. We proceed bycontradiction assuming that there exists a TAGR A recognizing L. Let n be a naturalnumber strictly greater than the number of states of A and consider the followingterm t ∈ L, where k = 1+ 2 · n(n+ 1):

g

f

n!

⎧⎪⎨

⎪⎩

h...

h

a

f

2 · n!

⎧⎪⎪⎨

⎪⎪⎩

h...

h. . .

a

...

f

k · n!

⎧⎪⎨

⎪⎩

h...

h

a

a

2 · k · n!

⎧⎪⎨

⎪⎩

h...

h

a

f

n!

⎧⎪⎨

⎪⎩

h...

h

a

f

2 · n!

⎧⎪⎪⎨

⎪⎪⎩

h...

h. . .

a

...

f

k · n!

⎧⎪⎨

⎪⎩

h...

h

a

a

By the assumption, there exists an accepting run r of A on t. We now defineall the subruns of r|1 that recognize subterms of t|1 of the form h(h(. . . h(a) . . .)).Moreover, we want to refer to them by their height. Hence, let ri,d be the subrun

of r|1 at position

i−1︷︸︸︷2. . . . .2 .1.

i·n!−d︷︸︸︷1. . . . .1, for all i ∈ {1, . . . , k} and d ∈ {0, . . . , i · n!}. Note

that height(ri,d) = d holds for all i ∈ {1, . . . , k} and d ∈ {0, . . . , i · n!}. Consider twodifferent runs ri,d, r j,d. Note that ri,d and r j,d cannot reach the same state q of A ifq is involved in a constraint, since such constraint is necessarily of the form (q �≈ q)

and term(ri,d) = term(r j,d) holds. It follows that, for a fixed d, the number of ri,d’sreaching some state involved in a constraint is smaller than n. Hence, for a fixed d,the number of ri,d’s containing some state involved in a constraint is smaller thann · (d+ 1). At this point, let ri be ri,i·n!, for all i ∈ {1, . . . , k}. Note that those are the

378 C. Creus et al.

subruns of r|1 at positions

i−1︷︸︸︷2. . . . .2 .1. The number of ri’s whose subrun of height n

contains a constrained state is smaller than n(n+ 1). Let ri,d be the subruns of r|3defined analogously to the definition of ri,d as subruns of r|1 and let ri be ri,i·n!. Byan analogous argument, the number of ri’s whose subrun of height n contains aconstrained state is smaller than n(n+ 1). By definition of k, it follows that thereexists e ∈ {1, . . . , k} such that re and re satisfy that their subruns of height n do notcontain a constrained state. Moreover, since term(re) = term(re) holds, it follows thatre,d and re,d cannot reach the same state q of A if q is involved in a constraint, for alld ∈ {0, . . . , e · n!}.

Note that the subruns of re and re of height n, which do not have states involved inconstraints, can be pumped to transform re and re into new runs ri

e and rie, for i ≥ 0,

satisfying the following:

(a) term(rie) = term(ri

e) = h(i+e)·n!(a), i.e. we can obtain runs on terms of the formh(h(. . . h(a) . . .)) with a height greater than e · n! and multiple of n!.

(b) For every position p ∈ Pos(rie), if ri

e(p) is a constrained state, then re(p) also isa constrained state.

(c) For every position p ∈ Pos(rie), if ri

e(p) is a constrained state, then re(p) also isa constrained state.

(d) For every position p ∈ Pos(rie), if ri

e(p) and rie(p) are constrained states, then

the states rie(p) and ri

e(p) are different.

Note that conditions (b) and (c) imply that, for all i ≥ 0, any position p such that rie(p)

or rie(p) is a constrained state satisfies |p| ≤ e · n! − n. Also, by condition (a), for each

i ≥ 2 · k− e, the runs rie and ri

e satisfy that height(rie) = height(ri

e) ≥ 2 · k · n!. For anysuch i, it follows that the height of any subrun of ri

e and rie reaching a state involved in

a constraint is greater than or equal to 2 · k · n! − (e · n! − n) = (2 · k− e) · n! + n ≥k · n! + n. In this case, since k · n! is the maximum height of r1, . . . , rk, r1, . . . , rk,constraints between runs ri

e, rie and any other run r1, . . . , rk, r1, . . . , rk are satisfied.

Therefore, note that the subrun r|2 and the subrun re must share a constrained state inthe same position since, otherwise, we can replace re by r2·k−e

e and obtain an acceptingrun on a term not in L. The same applies to re.

Let pe be the position

e−1︷︸︸︷2. . . . .2 .1, i.e. the position where re occurs in r|1 and re occurs

in r|3. Let p, p be the shortest positions in Pos(r|2) such that re|p and r|2.p reach thesame constrained state and re| p and r|2. p reach the same constrained state. Assume,without loss of generality, that |p| ≤ | p|. We now want to pump the subruns re and re

to obtain runs of height 3 · k · n! and swap the subruns of r at positions 1.pe.p and 2.p,in order to obtain an accepting run r′ on a term not in L. First, let r′e be r3·k−e

e and let r′ebe r3·k−e

e . Note that height(r′e) = height(r′e) = 3 · k · n! and, by the same arguments asbefore, the height of any subrun of r′e and r′e reaching a state involved in a constraintis greater than or equal to 3 · k · n! − (e · n! − n) = (3 · k− e) · n! + n ≥ 2 · k · n! + n.Hence, constraints between runs r′e, r′e and any other run r|2, r1, . . . , rk, r1, . . . , rk aresatisfied. Second, let r′ be the run r[r|2.p]1.pe.p[r′e|p]2.p[r′e]3.pe . Note that term(r′) �∈ Lholds since term(r′)|2 = term(r′)|3.pe = h3·k·n!(a). By condition (d) in the definition ofri

e and rie and the assumption on p, constraints between the subruns r′|2 and r′|3.pe

are satisfied. Finally, constraints between r′|1.pe , which has height equal to 2 · k · n!,


and any other subrun of r′ are also satisfied by a height argument. Hence, r′ is anaccepting run, a contradiction. ��

4 Compatible Runs

We start by defining a notion of compatibility between runs which, in the nextsection, allows us to simplify the decision of emptiness and finiteness. Note thatany run of a TAG[�≈R] A is a run of ta(A), but the converse is not true since arun r of ta(A) may not satisfy the constraint of A. In this case, there exist twodifferent positions p1, p2 of r satisfying that (r(p1) �≈ r(p2)) is a constraint of Aand term(r|p1) = term(r|p2). In this sense, we can see the subruns r|p1 and r|p2 asincompatible, since, whenever a run r′ of ta(A) contains them as subruns, r′ is not arun of A since the constraint is falsified.

Definition 1 Let A be a TAG[�≈R]. Two runs r1, r2 of A are compatible if for everypair of positions p1 ∈ Pos(r1), p2 ∈ Pos(r2), it holds that, if (r1(p1) �≈ r2(p2)) is aconstraint of A, then term(r1|p1) �= term(r2|p2).

We say that a set of runs of A is a compatible set if all its runs are pairwisecompatible.

The following example illustrates how the incompatibility between runs affects thelanguage recognized by a TAG[�≈R]. In the examples, when making a run explicit,we sometimes define it as a term in T (Q) instead of T (�) when only the right-handsides of the rules are relevant.

Example 1 Let A be a TAG[�≈R] defined as A = 〈{q f , qh, qa}, {a : 0, h : 1, f :2}, {q f }, (qa �≈ qa), {a → qa, h(qa)→ qh, h(qh)→ qh, f (qh, qh)→ q f }〉. Note thatL(ta(A)) = { f (hn(a), hm(a)) | n, m ≥ 1} �= ∅.

Any two terms hn(a) and hm(a), with n, m ≥ 1, have the associated runs r1 =qn

h(qa) and r2 = qmh (qa) of A. Let p1, p2 be the longest positions of r1 and r2,

respectively. Note that term(r1|p1) = term(r2|p2) = a and r1(p1) = r2(p2) = qa. Since(qa �≈ qa) is a constraint of A, the runs r1, r2 are incompatible. Hence, L(A) = ∅ sincethere is no run of A reaching the accepting state q f and satisfying the constraint, eventhough there are infinitely many terms reaching qh.

Let us give some intuition on how the reflexivity of the constraints of a TAG[ �≈R]A is related with the notion of compatibility, since it is a key point in the proof of thefollowing lemma. Let R be a compatible set of runs of A. Consider a run r ∈ R and itssubrun r|p for any position p ∈ Pos(r) such that r(p) is involved in some constraint.Recall that when a state q occurs in a constraint of a TAG[�≈R], it necessarily occursat least in a constraint of the form (q �≈ q), i.e. in a reflexive constraint. This factimplies that the only run of R containing r|p as a subrun is precisely r, since otherwiseR would not be a compatible set. Actually, the definition of compatibility furtherguarantees that only r has a subrun reaching the state r(p) after recognizing term(r|p).This fact is used in the proof of the following lemma to bound the number of runs inR that are incompatible with a certain fixed run.

380 C. Creus et al.

Lemma 3 Let A be a TAG[ �≈R]. Let r be a run of A and let R = {r1, r2, . . .} be aninf inite compatible set of runs of A. Then, there exists S ⊆ R such that S ∪ {r} is aninf inite compatible set of runs of A.

Proof Let A be 〈Q, �, F, D,�〉more explicitly written. Let p ∈ Pos(r) be a positionholding r(p) ∈ D. Since R is a compatible set, there are at most |Q| runs r′ in Rfor which there exists a position p′ satisfying (r′(p′) �≈ r(p)) ∈ D and term(r′|p′) =term(r|p). By defining S as the result of removing from R all such runs r′ for all suchpositions p, the result follows. ��

Corollary 1 Let A be a TAG[ �≈R]. Let S1, . . . , Sn be inf inite compatible sets of runsof A. Then, there exists S ⊆⋃

1≤i≤n Si such that S is an inf inite compatible set of runsof A and, for every i ∈ {1, . . . , n}, S ∩ Si is inf inite.

Proof By applying Lemma 3 several times, there exists a selection r1 ∈ S1, . . . , rn ∈Sn such that E = {r1, . . . , rn} is a compatible set and, moreover, there exist S′1 ⊆S1, . . . , S′n ⊆ Sn such that S′1 ∪ E, . . . , S′n ∪ E are infinite compatible sets. Analo-gously, there exists a selection r′1 ∈ S′1, . . . , r′n ∈ S′n such that E′ = {r′1, . . . , r′n} isa compatible set and, moreover, there exist S′′1 ⊆ S′1, . . . , S′′n ⊆ S′n such that S′′1 ∪E′, . . . , S′′n ∪ E′ are infinite compatible sets. Moreover, note that E ∪ E′ is a com-patible set. This process can be repeated to obtain runs r′′1 ∈ S′′1, . . . , r′′n ∈ S′′n such thatE′′ = {r′′1, . . . , r′′n} is a compatible set and, moreover, there exist S′′′1 ⊆ S′′1, . . . , S′′′n ⊆ S′′nsuch that S′′′1 ∪ E′′, . . . , S′′′n ∪ E′′ are infinite compatible sets. As before, note that E ∪E′ ∪ E′′ is a compatible set. This process can be iterated to obtain an infinite numberof compatible sets E, E′, E′′, . . . such that the infinite union S = E ∪ E′ ∪ E′′ ∪ . . . isthe infinite compatible set of the statement. ��

Example 2 Let A be a TAG[�≈R] defined as A = 〈{q f , qh, q}, {a : 0, h : 1, f :2}, {q f }, (qh �≈ qh), {a → q, h(q)→ q, h(q)→ qh, f (qh, qh)→ q f }〉. Note that itsrecognized language is L(A) = { f (hn(a), hm(a)) | n, m ≥ 1 ∧ n �= m}, which is notregular.

Let’s consider the set S of runs of A defined as S = {q f (qh(q2n), qh(q2n+1)) | n ≥ 1}.Note that S is infinite and, moreover, it is a compatible set. Now consider a runr = q f (qh(qm1), qh(qm2)), with m1, m2 ≥ 1 and m1 �= m2. By Lemma 3, r is compatiblewith an infinite number of runs in S. In fact, it is easy to see that r can be incompatiblewith, at most, two runs in S, the ones with subruns qh(qm1) and qh(qm2). Hence, aninfinite compatible set of runs from S and containing r can be constructed.

With the aim of simplifying the decision procedure for emptiness and finitenessof the language recognized by a TAG[�≈R] A, we compute which states have infinitesets of compatible runs reaching them. In the next section, this allows us to definefrom A a simpler automaton such that the emptiness of the recognized language ispreserved, and its finiteness may change only under certain conditions.

Definition 2 Let A = 〈Q, �, F, D, �〉 be a TAG[�≈R]. We define its set of states withinf inite compatible runs, denoted by Q∞

A , to be the set containing all the states q ∈ Qsuch that there exists an infinite compatible set S of runs of A satisfying that r(λ) = q,for each run r ∈ S.


We use Q∞ as a shorthand when A is clear from the context.

Algorithm 1 computes the set Q∞ for a given TAG[�≈R] A. Its correctness isstated in the following two lemmas. With respect to its running time, first note thatthe algorithm computes finiteness and emptiness of the language recognized by theTA A0 constructed in step 1. Such properties can be decided in polynomial time fora TA [5]. Next, in step 3, it incrementally computes the set Q∞ in, at most, |Q| steps,using operations that can be all computed in polynomial time. It follows that thealgorithm runs in polynomial time.

Algorithm 1 Compute the set Q∞ for a given TAG[�≈R] AInput: A = 〈Q, �, F, D,�〉Output: Q∞

1. Let A0 = 〈Q0, �, F ∩ Q0,�0〉 be a TA, where

– Q0 = {q ∈ Q | q �∈ D}, and– �0 = {( f (q1, . . . , qm)→ q) ∈ � | q1, . . . , qm, q �∈ D}.

2. Q∞ := {q ∈ Q0 | L(A0, q) is infinite}.3. While ∃( f (q1, . . . , qm)→ q) ∈ � such that

– q �∈ Q∞,– ∀i ∈ {1, . . . , m} : qi ∈ Q∞ ∨ (qi �∈ D ∧ L(A0, qi) �= ∅), and– ∃i ∈ {1, . . . , m} : qi ∈ Q∞

Do Q∞ := Q∞ ∪ {q}

Lemma 4 Let A = 〈Q, �, F, D, �〉 be a TAG[ �≈R]. Let Q∞ be the set computed byAlgorithm 1 on input A. If there exists a run r of A satisfying (∀p ∈ Pos(r) \ {λ} :(r(p) ∈ D ⇒ r(p) ∈ Q∞)) ∧ (∃p ∈ Pos(r) : r(p) ∈ Q∞), then r(λ) ∈ Q∞.

Proof Let S={p∈Pos(r)|r(p) ∈ Q∞ ∧ (� ∃p′ ∈ Pos(r) : p′ < p ∧ r(p′) ∈ Q∞)}. Notethat S is a set of parallel positions and that, by the assumptions, S �= ∅ holds. LetP be the set of prefixes of the positions in S, i.e. P = {p ∈ Pos(r) | ∃p′ ∈ S : p ≤ p′}.By induction on the terms pending at positions in P, it is easy to see that, for eachp ∈ P, r(p) is added to Q∞ either in step 2 or step 3 of the algorithm, since the secondcondition of step 3 is satisfied by the assumption of the lemma and the third one holdsby induction hypothesis. Since λ ∈ P, the statement holds. ��

Lemma 5 Let A be a TAG[ �≈R]. Let Q∞ be the set computed by Algorithm 1 on inputA. Then, Q∞ = Q∞

A .

Proof Let A be 〈Q, �, F, D, �〉 more explicitly written. We prove each directionseparately.

(⊆) We first prove soundness showing that any state q ∈ Q∞ is also in Q∞A . We use

induction on the number of iterations of the algorithm until q was added toQ∞.

382 C. Creus et al.

First assume that q was added in step 2 of the algorithm. Hence, it has infinitelymany different runs on A0 and, since runs of A0 are compatible runs of A, itfollows directly that q ∈ Q∞

A .Now assume that q was added after some iterations of step 3 of the algorithm.Hence, there exists a rule f (q1, . . . , qm)→ q in � where each qi either hasalready been added to Q∞ or it has non-empty language in A0. Moreover,there exists at least one j such that q j is in Q∞. To prove q ∈ Q∞

A , it sufficesto show that there exists an infinite compatible set of runs reaching q withthe rule f (q1, . . . , qm)→ q at their root position. We show that this set canbe constructed by properly selecting runs reaching each qi. We consider thefollowing cases.

(a) For every state qi �∈ Q∞, we take any run of A0 reaching that qi, which ex-ists because L(A0, qi) �= ∅. Note that the selected run is always compatiblewith any other run, since it contains no state involved in a constraint.

(b) For all the states in Q∞ appearing in the left-hand side of the rule, say{q′1, . . . , q′k} ⊆ {q1, . . . , qm}, q′1, . . . , q′k ∈ Q∞

A holds by induction hypothesisand, moreover, there exists at least one such state, i.e. 1 ≤ k ≤ m. Hence,by Corollary 1, there exists a compatible set containing an infinite numberof runs reaching each q′i.

Using the runs selected in (a) and the infinitely many compatible runs of (b), wecan construct infinitely many compatible runs with the rule f (q1, . . . , qm)→ qat root position, and hence q ∈ Q∞

A .(⊇) We prove completeness by contradiction. Assume that there exists a state

q ∈ Q such that q ∈ Q∞A and q �∈ Q∞. Since q ∈ Q∞

A , there exists an infinitecompatible set of runs S such that r(λ) = q, for each r ∈ S. We distinguish twocases.First, if there exists an infinite subset of runs of S not containing any stateinvolved in a constraint of A, then L(A0, q) is infinite. Therefore, q was addedto Q∞ in step 2 of the algorithm, in contradiction with the assumption.Otherwise, there exists an infinite compatible set R ⊆ S such that every run inR contains a state involved in some constraint of A. Note that, by Lemma 4and the fact that q �∈ Q∞ holds by the assumption, we can conclude that ∀r ∈S : (∃p ∈ Pos(r) \ {λ} : (r( p) ∈ D ∧ r( p) �∈ Q∞)) ∨ (∀p ∈ Pos(r) : r( p) �∈ Q∞)

holds. By definition of R, it follows that every run in R contains at least onestate q′ ∈ D that also satisfies q′ �∈ Q∞.We now define R as the set of subruns of runs in R such that only the statereached at the root position does not belong to Q∞ and is involved in aconstraint of A. More formally, R = {r|p | r ∈ R ∧ p ∈ Pos(r) ∧ r(p) �∈ Q∞ ∧r(p) ∈ D ∧ ∀p′ ∈ Pos(r) : (p < p′ ∧ r(p′) ∈ D ⇒ r(p′) ∈ Q∞)}. Note that R isan infinite compatible set and, moreover, every strict subrun of each run r ∈ Rdoes not contain a state involved in a constraint, since otherwise, by Lemma 4,r(λ) ∈ Q∞.Since R is infinite and � is finite, there exists an infinite compatible set R′ ⊆ Rsuch that every run in R′ has the same rule f (q1, . . . , qm)→ q′ at root position.Finally, since R′ is infinite, there exists j ∈ {1, . . . , m} such that L(A0, q j) isinfinite and q j ∈ Q∞. Hence, q′ was added to Q∞ in step 3 of the algorithm, acontradiction with the definition of R. ��


5 Transformations on the Automaton

Taking advantage of the fact that the set Q∞ can be computed, we simplify ourproblem by transforming the initial TAG[�≈R] and adopting a slightly differentnotion of run. The goal of the transformation, as shown in the following example, isto simplify runs of a TAG[�≈R] by ignoring subruns reaching states in Q∞ since theyare not relevant in our setting. Next, we further transform the obtained TAG[�≈R]to distinguish states that can be reached with runs that do not involve constrainedstates.

Example 3 Let A be the TAG[�≈R] defined in Example 2, i.e. A = 〈{q f , qh, q}, {a :0, h : 1, f : 2}, {q f }, (qh �≈ qh), {a → q, h(q)→ q, h(q)→ qh, f (qh, qh)→ q f }〉.

Note that the constraint of A concerns runs reaching qh. Yet, as seen in Example 2,qh ∈ Q∞

A holds. Therefore, the constraint is not relevant in terms of emptinessdecision since enough compatible runs reaching qh can always be found. Hence, wewill represent all terms reaching qh with a new constant symbol �. In this sense,if we replace the rule (h(q)→ qh) by (�→ qh), then emptiness of the languageis preserved under a relaxed notion of run. In this new notion, satisfiability ofconstraints is reinterpreted so that they are additionally satisfied when the symbol� appears in the terms associated with the involved subruns, i.e. a subrun with �always satisfies a disequality.

As a final remark, note that after representing all the terms reaching qh by theconstant symbol �, the language recognized is { f (�,�)}, which is finite althoughthe original language was infinite. Hence, finiteness of the recognized language isnot preserved by this transformation, but it is easy to see that any occurrence of thesymbol � in a term of the language guarantees that the original language was infinite.

As seen in the previous example, we need to compare terms using the followingnotion, which depends on a special symbol of the signature denoted by �.

Definition 3 Let � be a signature and let � be a symbol in �. We define the relation=� on T (�) as t1 =� t2 if and only if t1 = t2 and � does not occur in t1 nor in t2. Notethat =� is a partial equivalence relation, i.e. it is symmetric and transitive, but notreflexive.

Next, we formally define the notion of run commented in the previous example interms of =�. The difference with the usual definition of run is that a term containing� always satisfies a disequality with any other term (even itself).

Definition 4 Let A = 〈Q, �, F, D, �〉 be a TAG[�≈R]. Given a symbol �∈ �, wedefine a �-run r of A as a run of ta(A) satisfying that, for every pair of differentpositions p1, p2 of r, if (r(p1) �≈ r(p2)) ∈ D, then term(r|p1) �=� term(r|p2). WithL�(A) we denote the set of terms t ∈ T (�) such that there exists a �-run of A on treaching an accepting state.

Analogously to Definition 1, two �-runs r1, r2 of A are �-compatible if for everypair of positions p1 ∈ Pos(r1), p2 ∈ Pos(r2), it holds that, if (r1(p1) �≈ r2(p2)) is aconstraint of A, then term(r1|p1) �=� term(r2|p2).

384 C. Creus et al.

We now formally define the transformation of the automaton commented inExample 3. This transformation eases the presentation of our decision procedurefor the emptiness and finiteness problems for TAG[�≈R]’s by simplifying the initialautomaton.

Definition 5 Let A = 〈Q, �, F, D, �〉 be a TAG[�≈R] and let � be a constant symbolnot in �. We define the TAG[�≈R] A� as 〈Q�, ��, F�, D�,��〉, where

– Q� = (Q \ Q∞A ) ∪ {q�},

– �� = � ∪ {�: 0},– F� = (F ∩ Q�) ∪ {q�} if F ∩ Q∞

A is not empty, and F� = F ∩ Q�, otherwise,

– D� = {(q1 �≈ q2) ∈ D | q1, q2 ∈ (Q \ Q∞A )},

– �� = {�→ q�} ∪�′, where �′ is the set of rules f (q′1, . . . , q′m)→ q such that q �∈Q∞

A and there exist states q1, . . . , qm ∈ Q satisfying that ( f (q1, . . . , qm)→ q) ∈� and, for every i ∈ {1, . . . , m}, if qi ∈ Q∞

A then q′i = q� and, otherwise, q′i = qi.

Note that, given a TAG[�≈R] A, since all states in Q∞A are represented by a

single state q� in A�, some of the states of A may become useless after thetransformation. For instance, states that can only appear in runs reaching a state inQ∞

A do not appear in any accepting �-run of A�. Furthermore, it is straightforwardthat the recognized language—under the notion of �-runs—may change after thistransformation is applied. However, these are not problems since the emptinessof the recognized language is preserved, as stated in Lemma 7. First, we prove anintermediate technical lemma.

Lemma 6 Let A be a TAG[ �≈R]. Let r be an accepting �-run of A� on a term withan occurrence of �. Then, there exist inf initely many accepting runs of A.

Proof The case where r = (�→ q�) holds trivially. Otherwise, let A be 〈Q, �, F,

D,�〉, more explicitly written and let M : Pos(r)→ Q be a mapping satisfying:

– M(p) = q if r(p) = (l → q) �= (�→ q�),

– M(p) = q if r(p) = (�→ q�), where q ∈ Q∞, and

– For each p ∈ Pos(r) with r(p) �= (�→ q�), being f ∈ �(m) the symbol term(r)(p),it holds that ( f (M(p.1), . . . , M(p.m)) → M(p)) ∈ �.

Note that such mapping M exists by Definition 5. Let {p1, . . . , pn} be the set ofpositions in Pos(r) satisfying r(pi) = (�→ q�) and, for each i ∈ {1, . . . , n}, let Si bean infinite set of compatible runs reaching M(pi). Such infinite sets exist by definitionof M since M(pi) ∈ Q∞. By Corollary 1, there exists S ⊆⋃

1≤i≤n Si such that S is aninfinite compatible set of runs of A and, for every i ∈ {1, . . . , n}, S ∩ Si is infinite. Thisfact allows to replace the �-subruns of r at positions p1, . . . , pn to obtain infinitelymany accepting runs of A. ��

Lemma 7 Let A be a TAG[ �≈R]. L(A) is empty if and only if L�(A�) is empty.


Proof We prove each direction separately.

(⇒) Given an accepting run r of A, it is easy to construct an accepting �-run ofA�. Since the precise construction is quite technical, we just describe it briefly.Intuitively, it consists on replacing the subruns of r at minimum positions (withrespect to <) that reach a state in Q∞

A by the �-run (�→ q�).(⇐) Let r be an accepting �-run of A� on a term t. The case where t does not

contain � is vacuously true. Otherwise, the statement directly follows fromLemma 6. ��

As commented above, as a result of applying the transformation, infiniteness ofthe recognized language may no longer hold. The following lemma states that thiswill not be a problem for our decision procedure.

Lemma 8 Let A be a TAG[ �≈R]. L(A) is inf inite if and only if L�(A�) is inf inite orit contains a term with an occurrence of �.

Proof We prove each direction separately.

(⇒) If there exist infinitely many accepting runs of A not containing any statein Q∞

A , then L�(A�) is infinite and the statement holds. Otherwise, let r bean accepting run of A containing some state in Q∞

A . It is easy to constructfrom r an accepting �-run r′ of A�, by replacing the subruns of r at minimumpositions (with respect to <) that reach a state in Q∞

A by the �-run (�→ q�).The �-run r′ recognizes a term with an occurrence of �, and we are done.

(⇐) If there exist infinitely many accepting �-runs of A� not containing the stateq�, then L(A) is infinite and the statement holds. Otherwise, in the case wherethere exists an accepting �-run of A� containing the state q�, the statementfollows from Lemma 6. ��

We introduce one last transformation on the automaton. Its goal is to distinguishthe states of the original automaton in two disjoint sets

◦Qc and

◦Qc satisfying that

every run reaching a state in◦

Qc (resp.◦

Qc) contains (resp. does not contain) a subrunreaching a state involved in some constraint. Note that a state q of the originalautomaton may be reachable by both kinds of runs and, hence, it must be split intotwo different states qc ∈ ◦

Qc and qc ∈ ◦Qc. This transformation is useful in the decision

process since it allows to separately deal with two different kinds of runs.

Definition 6 Let A = 〈Q, �, F, D,�〉 be a TAG[�≈R]. We define the TAG[�≈R]◦

Aas 〈 ◦

Qc ∪ ◦Qc, �,

◦F,

◦D,

◦�〉, where

–◦

Qc = {qc | q ∈ Q},–

◦Qc = {qc | q ∈ Q},

–◦

F = {qc | q ∈ F} ∪ {qc | q ∈ F},–

◦D = {(qc

1 �≈ qc2) | (q1 �≈ q2) ∈ D},

–◦� is the set of rules of the form f (qx1

1 , . . . , qxmm )→ qx, satisfying that there exists

a rule ( f (q1, . . . , qm)→ q) ∈ �, x1, . . . , xm, x ∈ {c, c}, and x = c ⇔ (q ∈ D ∨ ∃i ∈{1, . . . , m} : xi = c).

386 C. Creus et al.

As in the case of Definition 5, useless states may appear after the construction of◦

A. For instance, a state qc such that q is involved in a constraint of A does not appearin the right-hand side of any rule of

◦A. The fact that a run r of

◦A reaches a state in

◦Qc if and only if r contains a subrun reaching a state involved in a constraint can beproven by induction on height(r) and distinguishing cases according to the definitionof

◦�. Given a TAG[�≈R] A, we denote by

◦A� the TAG[�≈R] obtained by applying

the transformation of Definition 6 on A�. The fact that the language is preservedalso follows trivially and is stated in the following lemma.

Lemma 9 Let A be a TAG[ �≈R]. L(A) = L(◦

A) and L�(A�) = L�(◦

A�) hold.

Note that no rule of◦

A� has qc� as right-hand side by definition of A� and

◦��,

since q� is never involved in a constraint and always occurs in the �-subrun (�→q�). Hence, we refer to qc

� simply as q�. By the definition of the transformations andthe order in which they are applied on A, note that every �-run of

◦A� with a rule of

the form f (. . . , q�, . . .)→ qx at its root position, for f ∈ �� and x ∈ {c, c}, satisfiesqx ∈ ◦

Qc�, i.e. x = c. If this was not the case, i.e. qx ∈ ◦

Qc�, then it is trivial to see that

q ∈ Q∞A follows from Lemmas 4 and 5, which leads to a contradiction with the form of

the rule and the definition of◦��. As a final remark, note that any qc ∈ ◦

Qc� recognizes

a finite language over T (��). This property can be easily proven by contradiction,since if q could be reached by an infinite number of runs not involving constrainedstates, again by Lemmas 4 and 5, q ∈ Q∞

A would hold and hence, by Definition 5,qc = q�, which by construction of

◦�� recognizes a finite language.

The following corollaries follow from Lemmas 7, 8, and 9, state that emptinessof the recognized language is preserved by these transformations, and show how itsfiniteness is changed.

Corollary 2 Let A be a TAG[ �≈R]. L(A) is empty if and only if L�(◦

A�) is empty.

Corollary 3 Let A be a TAG[ �≈R]. L(A) is inf inite if and only if L�(◦

A�) is inf initeor it contains a term with an occurrence of �.

In the rest of this section we present technical results that show that dealing with�-runs of

◦A� is useful in our setting.

Lemma 10 Let A be a TAG[ �≈R]. Let r be a �-run of◦

A�. Let p be a position and letk1 be a natural number such that r(p.k1) = q�.

Then, there exists a natural number k2 such that r(p.k2) �= q�.

Proof Assuming that r(p) = ( f (q�, . . . , q�)→ q) leads to a contradiction with thefact that r(p) �= q� by Lemmas 4 and 5, and Definitions 5 and 6. ��

The following lemma is crucial in our global approach. It gives an upper boundfor the number of �-runs containing constrained states that can be pairwise �-compatible.


Lemma 11 Let A = 〈Q, �, F, D, �〉 be a TAG[ �≈R]. Let r1, . . . , rn be �-runs of◦

A�pairwise �-compatible and such that, for each i ∈ {1, . . . , n}, ri(λ) ∈ ◦

Qc�. Then, n ≤

|Q| · |�|maxar(�)|Q| .

Proof Let p1 ∈ Pos(r1), . . . , pn ∈ Pos(rn) be positions satisfying that ri(pi.p) ∈◦

Qc� ⇔ p = λ. Let r′1, . . . , r′n be the �-subruns r1|p1 , . . . , rn|pn , respectively. Note that

r′1, . . . , r′n are pairwise �-compatible and, moreover, states involved in a constraintonly occur at the root position of the r′i’s.

We argue by contradiction assuming that n > |Q| · |�|maxar(�)|Q| . Let r′i be (li →qc

i )(r′i,1, . . . , r′i,mi

) more explicitly written, for i ∈ {1, . . . , n}. Note that, for each i ∈{1, . . . , n} and j ∈ {1, . . . , mi}, q� does not occur in r′i, j, since otherwise qi ∈ Q∞

A holds

by Lemmas 4 and 5, and Definitions 5 and 6, which implies that qci = q� �∈

◦Qc

�,contradicting the assumptions on r′i. Moreover, height(r′i, j) < |Q| − 1, since otherwiser′i, j can be pumped, which implies that q� occurs in r′i, j. Note that the bound |Q| − 1is enough because qc

i cannot appear in r′i, j. Furthermore, note that there exist, at

most, |�|maxar(�)h+1different terms of height h. Hence, by the assumption that n >

|Q| · |�|maxar(�)|Q| and the pigeon hole principle, it follows that there exist differenti, j ∈ {1, . . . , n} such that qc

i = qcj and term(r′i) =� term(r′j). This is in contradiction

with the �-compatibility of r′i and r′j since (qci �≈ qc

j) is necessarily in◦

D�. ��

The following corollary is not used in the rest of the paper, but follows directlyfrom Lemma 11 and gives more intuition. It states that, for any �-run r of

◦A�,

there exists a bound for the number of occurrences of states of◦

Qc� at parallel posi-

tions of r.

Corollary 4 Let A = 〈Q, �, F, D,�〉 be a TAG[ �≈R]. Let r be a �-run of◦

A�.Let p1, . . . , pn be pairwise parallel positions of r such that r(pi) ∈

◦Qc

�, for eachi ∈ {1, . . . , n}. Then, n ≤ |Q| · |�|maxar(�)|Q| .

6 Deciding Emptiness and Finiteness

As a consequence of Corollary 2, deciding emptiness of the language recognizedby a given TAG[�≈R] A, can be reduced to test whether there exists an accepting�-run of

◦A�. And, as a consequence of Corollary 3, deciding finiteness of the

language recognized by A, can be reduced to test whether L�(◦

A�) is infinite orcontains a term with an occurrence of the symbol �. We present an algorithmthat simulates the construction of accepting �-runs that helps to reason about theexistence of �-runs satisfying the conditions mentioned above. Roughly speaking,our algorithm non-deterministically simulates the construction of accepting �-runsin a top-down manner. More concretely, in an intermediate step of the algorithm,the top-most part of an accepting �-run r has been already (non-deterministically)constructed, and it remains to determine its �-subruns r1, . . . , rn at certain parallelpositions. Moreover, the ri’s are required to reach some concrete states q1, . . . , qn

and term(r1), . . . , term(rn) have to satisfy certain equality and disequality constraintswith respect to =�. The states q1, . . . , qn are determined by the part of r that has

388 C. Creus et al.

already been constructed. The (dis)equality constraints among term(r1), . . . , term(rn)

are either determined by the set of constraints◦

D� and q1, . . . , qn, or they areinherited from the part of r that has already been constructed. The algorithmproceeds by guessing the rule at the root position of some of the ri’s, hence extendingthe constructed part of r. The ri’s whose root is determined at this step of thealgorithm are the ones that are guessed to have maximal height among all the ri’s.By always extending r in this order, we partially construct the �-subruns of r thathave the same height in the same step of the algorithm, which allows to satisfy orpropagate the (dis)equality constraints that have to be satisfied.

The algorithm is presented as an inference system that deals with pairs of theform 〈M, S〉, where M and S are partitions of labeled states of

◦A�, i.e. M and S are

sets of disjoint sets of pairs 〈q, �〉. To ease the presentation, we denote the labeledstates 〈q, �〉 as q�. Our labels are used simply as identifiers to distinguish repeatedoccurrences of the same state. We define the labels as sequences of natural numbersstanding for the position in the constructed �-run where the state occurs.

Let us specify the role of pairs 〈M, S〉 and how this data structure is helpful toformalize the behaviour of the algorithm as sketched above. The inference startswith a pair of the form 〈{{qλ

f }},∅〉, where q f is guessed among the accepting states

of◦

A�, and next non-deterministically constructs a �-run r reaching q f , if possible.This construction is done top-down, by guessing the rules of

◦�� to be used. In an

intermediate step, the pair 〈M, S〉 contains the states at the deepest positions of thepartial �-run constructed so far, i.e. for an element q� of 〈M, S〉 it holds that r(�) = q.The process guarantees some invariant properties on 〈M, S〉 to keep track of theconstraints imposed by the automaton. For instance, consider two different elementsq�1

1 , q�22 of 〈M, S〉. If both occur in M, then height(r|�1) = height(r|�2) holds. If q�1

1occurs in M and q�2

2 occurs in S, then height(r|�1) > height(r|�2) holds. Moreover,q�1

1 , q�22 belong to the same part in M or S if and only if term(r|�1) =� term(r|�2).

We start giving a definition that relates pairs 〈M, S〉 with runs that satisfy theconditions imposed by the pair.

Definition 7 Let A be a TAG[�≈R]. Let M, S be partitions of labeled states of◦

A�such that

⋃(M ∪ S) = {q�1

1 , . . . , q�nn } and

⋃M ∩⋃

S = ∅. Let r1, . . . , rn be �-runs of◦

A�. We say that r1, . . . , rn f it 〈M, S〉 if the following conditions hold:

(F1) ri(λ) = qi, for i ∈ {1, . . . , n},(F2) qi �= q� ⇒ (q�i

i ∈⋃

M ⇔ height(ri) = max({height(r j) | j ∈ {1, . . . , n}})), fori ∈ {1, . . . , n},

(F3) term(ri) =� term(r j) if and only if q�ii ∼M∪S q

� j

j , for each different i, j ∈{1, . . . , n}, and

(F4) r1, . . . , rn are pairwise �-compatible.

Let us remark that the antecedent of condition (F2) implies that the height of �-runs reaching q� is not considered since they represent an infinite number of terms.

Example 4 Let A be a TAG[�≈R] defined as A=〈{q f , qh, q}, {a :0, b :0, h :1, f : 2},{q f }, (q �≈ q), {a → q, b → q, h(q)→ qh, h(qh)→ qh, f (qh, qh)→ q f }〉. Note that


Q∞A = ∅, and hence L(A) = L�(

◦A�) = { f (hn(a), hm(b)) | n, m ≥ 1} ∪ { f (hn(b),

hm(a)) | n, m ≥ 1}.In this example, any accepting �-run of

◦A� fits 〈{{qλ

f }},∅〉. In particular, the �-runr = q f (qh(qh(q)), qh(q)) on the term f (h(h(a)), h(b)) fits 〈{{qλ

f }},∅〉. Now considerthe �-runs r|1 = qh(qh(q)) and r|2 = qh(q) on terms h(h(a)) and h(b), respectively.Note that r|1, r|2 fit 〈{{q1

h}}, {{q2h}}〉 but not 〈{{q1

h}, {q2h}},∅〉. Moreover, r|1.1 and r|2

do not fit 〈{{q1h}}, {{q2

h}}〉 since they have the same height. Finally, note that theredo not exist �-runs fitting pairs of the form 〈{{q�1 , q�2}},∅〉 since �-runs reaching qmust recognize different terms, with respect to =�, to be �-compatible due to theconstraint (q �≈ q), yet they are forced to be equal since q�1 , q�2 are in the same part.

We now define the clean operation on partitions of labeled states. The goalsof the clean operation are (i) to erase the occurrences of the state q� from thegiven partition and (ii) to collapse elements of the form q�1 , q�2 occurring in thesame part of the given partition to just one of them, when q ∈ ◦

Qc�. This technical

operation allows to bound |⋃(M ∪ S)| for the pairs 〈M, S〉 considered by thedecision procedure and, hence, is key to guarantee its termination.

Definition 8 Let A be a TAG[�≈R]. Let T be a partition of labeled states of◦

A� =〈 ◦Qc

� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉. We define clean(T) as {fold(P)|P ∈ T, P is not of the

form {q��}}, where fold(P) is a maximal subset of P satisfying that for each different

q�, q� ∈ fold(P), it holds that q ∈ ◦Qc

�.

Example 5 Let A be a TAG[�≈R]. Let T be a partition of labeled states of◦

A� =〈 ◦Qc

� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉 and let P be a part in T.

We start considering P = {q�1 , q�2 , q�3 , q�4}, for some different q, q ∈ ◦Qc

� and q ∈◦

Qc�. By the definition of fold, only one instance of q and q may remain in fold(P) and,

hence, fold(P) is either {q�1 , q�3 , q�4} or {q�2 , q�3 , q�4}, since the element q�3 is kept andthe element q�4 has a state in

◦Qc

�.We now consider some cases dealing with the state q�. First, if P is of the form

{q��}, then fold(P) is never considered due to the definition of clean and, hence, P �∈

clean(T) holds. Now consider that P is of the form {q�1� , q�2� }. In this case, fold(P)

is either {q�1� } or {q�2� }. Moreover, note that fold(P) ∈ clean(T) and, hence, clean(T)

contains a part of the form {q��}. Although one of the goals of the clean operation is

precisely to erase the occurrences of the state q�, this is not a contradiction since theinference system will guarantee that q� only appears in parts of the form {q�

�} beforeapplying the clean operation.

As a final remark, note that since T is a partition, any two different parts P1, P2 ∈T such that they are not of the form {q�

�}, satisfy that fold(P1) and fold(P2) are disjointand that fold(P1) and fold(P2) are parts of clean(T).

The clean operation preserves the fitness property when every labeled state of theform q�

� occurs in a part {q��}. The fact that there exist �-runs fitting 〈M, S〉 trivially

implies the existence of �-runs fitting 〈clean(M), clean(S)〉. The other direction isstated in the following lemma. Let us remark that condition (b) is rather technical.

390 C. Creus et al.

It serves to guarantee preservation of occurrences of q� in the �-runs of the fitting,which is useful later to prove decidability of finiteness for TAG[�≈R]’s.

Lemma 12 Let A be a TAG[ �≈R]. Let M, S be partitions of labeled states of◦

A� suchthat

⋃(M ∪ S) = {q�1

1 , . . . , q�nn },

⋃M ∩⋃

S = ∅, and, for each i ∈ {1, . . . , n}, (qi =q�)⇒ ({q�i

i } ∈ M ∪ S). Let M be clean(M) and S be clean(S), where⋃

(M ∪ S) ={q�1

1 , . . . , q�nn }.

If there exist �-runs r1, . . . , rn of◦

A� f itting 〈M, S〉, then there exist �-runs r1, . . . , rn

of◦

A� f itting 〈M, S〉 and holding

(a) max({height(ri) | i ∈ {1, . . . , n}}) = max({height(ri) | i ∈ {1, . . . , n}}), and(b) (∃i ∈ {1, . . . , n}, p ∈ Pos(ri) : ri(p) = q�)⇔ (∃i ∈ {1, . . . , n} : qi = q�) ∨ (∃i ∈

{1, . . . , n}, p ∈ Pos(ri) : ri(p) = q�).

Proof Let◦

A� be 〈 ◦Qc

� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉more explicitly written. We construct

the �-runs r1, . . . , rn fitting 〈M, S〉 as follows, where we distinguish cases, for each ri,depending on whether q�i

i occurs in⋃

(M ∪ S).

(i) For each i ∈ {1, . . . , n} such that q�ii ∈

⋃(M ∪ S), we define ri to be r j, being

j the one satisfying q�ii = q

� j

j . Note that the set of such ri’s concides with{r1, . . . , rn}.

(ii) For each i ∈ {1, . . . , n} such that q�ii �∈

⋃(M ∪ S), by definition of the clean

operation, either (ii.1) qi = q� holds or (ii.2) qi ∈◦

Qc� holds. In the case (ii.1),

i.e. when qi = q�, note that the element q�ii appears in M ∪ S in a part of the

form {q�ii }, by definition of M and S. In this case, we define ri to be the �-

run (�→ q�). In the case (ii.2), i.e. when qi ∈◦

Qc�, there exists exactly one

j ∈ {1, . . . , n} such that q� j

j ∈⋃

(M ∪ S), q�ii ∼M∪S q

� j

j and qi = q j, by definitionof the clean operation. In this case, we define ri to be the r j defined in (i). Inboth cases, ri is �-compatible with any other �-run.

The fact that the �-runs r1, . . . , rn fit 〈M, S〉 trivially follows from the fact thatr1, . . . , rn fit 〈M, S〉 and the conditions on the definitions done in (ii). Condition (a)trivially follows from the facts that r1, . . . , rn include all r1, . . . , rn, that the definitionsof the ri’s done in (i) and (ii.2) preserve the maximum height and that the definitiondone in (ii.1) has height 0, i.e. minimum height. Finally, condition (b) holds sincer1, . . . , rn include all r1, . . . , rn, the construction done in (ii.1) introduces the �-run(�→ q�) if and only if there exists i ∈ {1, . . . , n} such that qi = q�, and (ii.2) onlyreplicates �-runs of r1, . . . , rn. ��

Our inference system uses the rule R in Definition 9 below, to non-deterministicallyconstruct �-runs. As commented at the beginning of this section, �-subruns areconstructed top-down always defining first the ones that are guessed to be maximalin height. In our formalism, this corresponds to guess a rule reaching each of thelabeled states in M, replace such states by the states ocurring in the left-hand sideof the guessed rules, and leave the labeled states in S unchanged (condition (a) in


the application of R). The resulting set of labeled states is then non-deterministicallypartitioned (condition (b)) satisfying the following:

– Two labeled states q�11 , q�2

2 in the same part stay in the same part when they arein S (condition (c)). Otherwise, if they are in the same part of M, rules having q1

and q2 as right-hand sides and with the same function symbol are guessed. Thecorresponding states in the left-hand sides of the guessed rules are placed in thesame parts (condition (d)). Eventually, in both cases two �-compatible �-runsreaching states q1 and q2 and recognizing the same term, with respect to =�, willbe generated.

– Analogously, for two labeled states q�11 , q�2

2 in different parts, two �-compatible�-runs reaching states q1 and q2 and recognizing different terms, with respect to=�, will be eventually generated (conditions (c) and (e)).

– Labeled states q�11 , q�2

2 are placed in different parts whenever (q1 �≈ q2) ∈◦

D�holds, in order to guarantee that

◦D� is satisfied (condition (f)).

– Since each labeled state q� in M must be reached by a term of maximal height,at least one state in the left-hand side of the rule guessed for q� must also bereached by a term of maximal height (condition (g)).

Definition 9 Let A be a TAG[�≈R]. Let M, S be partitions of labeled states of◦

A� =〈 ◦Qc

� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉. We define the non-deterministic derivation rule R as

follows:

R : 〈M, S〉〈clean(M′), clean(S′)〉

where, for⋃

M = {q�11 , . . . , q�n

n } and for each i ∈ {1, . . . , n}, a rule fi(qi,1, . . . , qi,mi)→qi in

◦�� is guessed, and M′, S′ are guessed satisfying

(a)⋃

(M′ ∪ S′) = {q�i. ji, j | i ∈ {1, . . . , n}, j ∈ {1, . . . , mi}} ∪ S,

(b) M′, S′ are partitions and⋃

M′ ∩⋃S′ = ∅,

(c) For each different q�, q� ∈ ⋃S, it holds that q� ∼M′∪S′ q� if and only if q� ∼S q�,

(d) For each q�ii ∼M q

� j

j it holds that fi = f j and, for each k ∈ {1, . . . , mi},q�i.k

i,k ∼M′∪S′ q� j.kj,k ,

(e) For each q�ii �∼M q

� j

j it holds that either fi �= f j or ∃k ∈ {1, . . . , mi} : q�i.ki,k �∼M′∪S′

q� j.kj,k ,

(f) For each q�i.ki,k and q� ∈⋃

(M′ ∪ S′), with �i.k �= �, if (qi,k �≈ q) ∈ ◦D� or qi,k = q�,

it holds that q�i.ki,k �∼M′∪S′ q�, and

(g)⋃

(M′ ∪ S′) = ∅ or, for each i ∈ {1, . . . , n}, ∃k ∈ {1, . . . , mi} : qi,k �= q� ∧ q�i.ki,k ∈⋃

M′.

In the rest of the paper, →R denotes the derivation relation between pairs ofpartitions of labeled states. As usual, →+

Rdenotes its transitive closure and →∗

Rits

reflexive-transitive closure. Moreover, by abuse of notation, 〈M, S〉 →+R〈M′, S′〉 and

〈M, S〉 →∗R〈M′, S′〉 also denote concrete derivations from 〈M, S〉 to 〈M′, S′〉 using R,

of lengths ≥ 1 and ≥ 0, respectively. The length of a derivation 〈M, S〉 →∗R〈M′, S′〉

392 C. Creus et al.

is its number of →R-steps, and is denoted as |〈M, S〉 →∗R〈M′, S′〉|. Moreover, in

order to make explicit the guesses M′, S′ done by R, we use the notation 〈M, S〉 →R

〈clean(M′), clean(S′)〉 or 〈M, S〉 →+R〈clean(M′), clean(S′)〉.

In the following example we present a successful derivation using R.

Example 6 Let � = {a : 0, b : 0, h : 1, f : 5} and consider the language of the termsover � of the form

f

n1

⎧⎪⎪⎨

⎪⎪⎩

h...

h

a

n2

⎧⎪⎪⎨

⎪⎪⎩

h...

h

a

n3

⎧⎪⎪⎨

⎪⎪⎩

h...

h

α1

n4

⎧⎪⎪⎨

⎪⎪⎩

h...

h

α2

n5

⎧⎪⎪⎨

⎪⎪⎩

h...

h

α3

where the ni’s are natural numbers including 0, n1 �= n2, the α j’s are either the symbola or b , α1 �= α2, and α2 �= α3. Note that the last conditions force that either α1 =a, α2 = b , α3 = a or α1 = b , α2 = a, α3 = b .

This language is recognized by the TAG[ �≈R] A = 〈Q, �, F, D, �〉, where

– Q = {q f , q1, q2, q3, q4, q5},– F = {q f },– D = {(qi �≈ qi) | i ∈ {2, . . . , 5}} ∪ {(q3 �≈ q4), (q4 �≈ q5)},– � = {a → qi | i ∈ {1, . . . , 5}} ∪ {b → qi, h(qi)→ qi | i ∈ {3, . . . , 5}} ∪ {h(q1)→

q1, h(q1)→ q2, f (q2, q2, q3, q4, q5)→ q f }.Note that the accepting runs of A are of the form

q f

q2

qn11

q2

qn21

qn3+13 q

n4+14 q

n5+15

Since Q∞A = {q1, q2}, the accepting �-runs of

◦A� are of the form

q f

q� q� qn3+13 q

n4+14 q

n5+15

where q f , q3, q4, q5 are states in◦

Qc�, for which the label c has been omitted to ease

the presentation.Let’s consider the following accepting �-run r, written in the most explicit form.

f (q� , q� , q3, q4, q5)→ q f

�→ q� �→ q� h(q3)→ q3

h(q3)→ q3

a → q3

h(q4)→ q4

b → q4

h(q5)→ q5

h(q5)→ q5

a → q5


The following derivation with R (Definition 9), constructs the previous �-run r.⟨{{

qλf

}},∅

⟩

→R

⟨{{q3

3, q55

}},{{

q44

}}⟩

→R

⟨{{q3.1

3 , q5.15

},{q4

4

}},∅⟩

→R

⟨{{q3.1.1

3 , q5.1.15

},{q4.1

4

}},∅⟩

→R 〈∅,∅〉The derivation starts from the accepting state q f . In the first step, the rulef (q�, q�, q3, q4, q5)→ q f is guessed and the elements of the form q�

� are removedby the clean operation. Moreover, note that, due to the constraints (q3 �≈ q4) and(q4 �≈ q5), q3

3 and q55 have to be placed in a different part than q4

4 (see condition (f) inthe application of R). In this derivation, the terms that correspond to q3

3 and q55 have

been guessed to be equal with respect to =�. For this reason they are placed in thesame part. Moreover, the terms that correspond to q3

3 and q55 have been guessed to be

higher than the term that corresponds to q44. Checking that the rest of the derivation

steps are correct is left to the reader. As a final remark, note that the �-subrun r|λfits the starting pair of the derivation, r|3, r|5, and r|4 fit the second pair, r|3.1, r|5.1,and r|4 fit the third pair, and r|3.1.1, r|5.1.1, and r|4.1 fit the fourth pair of the derivation.

The following lemma and corollary state the correctness of R, i.e. that a derivationof the form 〈{{qλ

f }},∅〉 →∗R〈∅,∅〉, where q f is an accepting state, corresponds to the

existence of an accepting �-run. Properties (C1) and (C2) in the lemma relate theform of the derivation with the form of the �-run (its height and occurrences of q�).This will be useful when deciding finiteness.

Lemma 13 Let A be a TAG[ �≈R] and let◦

A� be 〈 ◦Qc

� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉. Let

M, S be partitions of labeled states such that 〈{{qλf }},∅〉 →∗

R〈M, S〉, where q f ∈

◦F�

and⋃

(M ∪ S) = {q�11 , . . . , q�n

n }.Then, there exists a derivation d of the form 〈M, S〉 →∗

R〈∅, ∅〉 if and only if

there exist �-runs r1, . . . , rn of◦

A� f itting 〈M, S〉. Moreover, d and the correspondingr1, . . . , rn are such that:

(C1) |d| = max({1+ height(ri) | i ∈ {1, . . . , n}}), and(C2) (∃i ∈ {1, . . . , n}, p ∈ Pos(ri) : ri(p) = q�) if and only if d can be written of the

form 〈M, S〉 →+R〈clean(M′), clean(S′)〉 →∗

R〈∅, ∅〉 for some M′, S′ such that a

part of the form {q��} occurs in M′ ∪ S′ or M ∪ S.

Proof Note that, since 〈M, S〉 is derived from 〈{{qλf }},∅〉 using R, it holds that

either 〈M, S〉 = 〈{{qλf }},∅〉, or 〈M, S〉 = 〈clean(M′), clean(S′)〉, for some M′, S′ sat-

isfying conditions (a)-(g) of Definition 9 with respect to some M′′, S′′ such that〈{{qλ

f }},∅〉 →∗R〈M′′, S′′〉 →R 〈M, S〉. It follows that M, S are partitions,

⋃M ∩

⋃S = ∅, and, for each different q�i

i , q� j

j , if (qi �≈ q j) ∈◦

D�, then q�ii �∼M∪S q

� j

j . Theseproperties trivially hold because of conditions (b) and (f) in the application of R andthe fact that the initial 〈{{qλ

f }},∅〉 satisfies them. Moreover, since the clean operation

394 C. Creus et al.

removes parts of the form {q��}, a part of the form {q�

�} in M ∪ S necessarily impliesthat q�

� = qλf , 〈M, S〉 = 〈{{q�

�}},∅〉, and q� ∈◦

F�.We prove each direction separately.

(⇒) We use induction on |d|. For the base case, i.e. when the derivation haslength 0, M = S = ∅ and the statement trivially holds (in particular, (C1) holdsbecause the maximum among an empty set is 0, by convention). For the induc-tive case, we write d more explicitly as 〈M, S〉 →R 〈clean(M), clean(S)〉 →∗

R〈∅,∅〉.In order to construct the �-runs r1, . . . , rn fitting 〈M, S〉 of the statement,we first construct �-runs fitting 〈M, S〉. By induction hypothesis, there exist�-runs r1, . . . , rn fitting 〈clean(M), clean(S)〉 and satisfying (C1) and (C2)for the subderivation 〈clean(M), clean(S)〉 →∗

R〈∅, ∅〉 of length |d| − 1. For

⋃(M ∪ S) = {q�1

1 , . . . , q�nn }, note that if qi = q�, for some i ∈ {1, . . . , n}, then

the element q�ii appears in M ∪ S in a part of the form {q�i

i }, by condition (f) inthe application of R. Hence, we can apply Lemma 12 and conclude that thereexist �-runs r1, . . . , rn fitting 〈M, S〉 and satisfying that max({1+ height(ri) | i ∈{1, . . . , n}}) = |d| − 1 and that q� occurs in some ri if it occurs in some r j or apart of the form {q�

�} is in M ∪ S, for i ∈ {1, . . . , n} and j ∈ {1, . . . , n}.We now construct r1, . . . , rn from r1, . . . , rn and the guesses done for theapplication of R. For each i ∈ {1, . . . , n}, if q�i

i ∈⋃

S then, by condition (a)

of R, q�ii ∈

⋃(M ∪ S), say q�i

i is q� j

j . In this case we define ri as r j. Otherwise,

q�ii ∈

⋃M. In this case, a rule fi(qi,1, . . . , qi,mi)→ qi in

◦�� is guessed for

the application of R. Again by condition (a), q�i.1i,1 , . . . , q�i.mi

i,mi∈⋃

(M ∪ S),

say they are q� j1j1 , . . . , q

� jmijmi

. In this case we define ri as ( fi(qi,1, . . . , qi,mi)→qi)(r j1 , . . . , r jmi

).It rests to prove that the defined r1, . . . , rn fit 〈M, S〉 and satisfy conditions (C1)and (C2). We prove separately each condition of fitness (Definition 7), (C1) isproven together with case (F2), and (C2) is trivially satisfied by construction.

– Condition (F1) is satisfied by construction.– We prove that (F2) is satisfied distinguishing cases depending on whether

clean(M) is empty or not.We first assume that clean(M) = ∅. Note that clean(S) = ∅ follows fromthe fact that r1, . . . , rn fit 〈clean(M), clean(S)〉 by induction hypothesisand hence satisfy condition (F2). It follows, by condition (a) in the ap-plication of R, that S = ∅ also holds. Since clean(M) = ∅ holds, the rulesguessed for the application of R are either of the form f (q�, . . . , q�)→ qor a → q, where a is a constant symbol. The former case is not possible byLemma 10 and, hence, height(r1) = . . . = height(rn) = 0 holds, and thus,since S = ∅, condition (F2) holds. Finally, note that |d| = 1 = max({1+height(ri) | i ∈ {1, . . . , n}}) satisfying (C1) in this case.Now assume that clean(M) �= ∅. Recall that |d| − 1 = max({1+ height(r j)

| j ∈ {1, . . . , n}}). Then, by construction, condition (g) in the applicationof R, and the fact that r1, . . . , rn fit 〈M, S〉, the following holds for


each q�ii ∈

⋃(M ∪ S). If qi = q�, then (F2) holds trivially. Otherwise, if

q�ii ∈

⋃M, then it holds that 1+ height(ri) = (|d| − 1)+ 1 = |d|, and, if

q�ii ∈

⋃S, then, by condition (a), q�i

i ∈ M ∪ S, and hence it holds that1+ height(ri) ≤ |d| − 1. Hence, condition (F2) is satisfied also in thiscase. Moreover, since

⋃M is not empty, |d| = max({1+ height(ri) | i ∈

{1, . . . , n}}) holds satisfying (C1) also in this case.– To see that (F3) is satisfied consider two different elements q�i

i , q� j

j ∈⋃

(M ∪ S). First, assume that q�ii , q

� j

j ∈⋃

M. In this case, term(ri) =�term(r j) if and only if q�i

i ∼M q� j

j holds by the fact that r1, . . . , rn fit 〈M, S〉,and conditions (d) and (e) in the application of R. Now assume thatq�i

i , q� j

j ∈⋃

S. In this case (F3) holds by the fact that r1, . . . , rn fit 〈M, S〉and condition (c) in the application of R. Finally, assume that q�i

i ∈⋃

M

and q� j

j ∈⋃

S. In this case, term(ri) �=� term(r j) follows from the fact thatr1, . . . , rn satisfy (F2).

– In order to see that (F4) is satisfied consider two different elementsq�i

i , q� j

j ∈⋃

(M ∪ S). First, assume that q�ii , q

� j

j ∈⋃

M. Note that any two

strict �-subruns of ri and r j are �-compatible since r1, . . . , rn fit 〈M, S〉.Hence, the case where (qi �≈ q j) �∈

◦D� directly holds. Otherwise, if (qi �≈

q j) ∈◦

D�, then ri and r j are �-compatible because term(ri) �=� term(r j),since (F3) is satisfied and q�i

i �∼M q� j

j holds by condition (f) in the ap-plication of R in the derivation 〈{{qλ

f }},∅〉 →∗R〈M, S〉, and the fact that

r1, . . . , rn fit 〈M, S〉. Now assume that q�ii ∈

⋃S. Then, ri is �-compatible

with any other r j by the fact that r1, . . . , rn fit 〈M, S〉 and r1, . . . , rn satisfy(F2).

(⇐) We assume that there exists �-runs r1, . . . , rn fitting 〈M, S〉 and hencesatisfying conditions (F1)–(F4) in Definition 7. For proving 〈M, S〉 →∗

R〈∅,∅〉 and conditions (C1) and (C2), we use induction on the value h =max({height(r j) | j ∈ {1, . . . , n}}). For the base case, assume that h = 0. Ifn = 0 the statement trivially holds because, in this case, M = S = ∅ andmax({1+ height(ri) | i ∈ {1, . . . , n}}) is 0 because, by convention, the maximumamong an empty set is 0. Otherwise, it holds that each �-run ri is of theform ( fi → qi), for i ∈ {1, . . . , n}, and that, by condition (F2) of fitting, S = ∅.Consider the case in which the rules guessed for the application of R areprecisely fi → qi, for i ∈ {1, . . . , n}. Note that conditions (d) and (e) in theapplication of R hold since r1, . . . , rn fit 〈M, S〉 and the rest of conditions aretrivially satisfied. Hence, by defining d as 〈M, S〉 →R 〈∅,∅〉, (C1) is satisfiedsince |d| = 1 = max({1+ height(ri) | i ∈ {1, . . . , n}}) holds and (C2) triviallyholds.For the inductive case, i.e. when h > 0, we construct M, S such that〈M, S〉 →R 〈clean(M), clean(S)〉 and show that there exists �-runs fitting〈clean(M), clean(S)〉 with height strictly smaller than h. Consider that therules guessed for the application of R are ri(λ) = ( fi(qi,1, . . . , qi,mi)→ qi), foreach i ∈ {1, . . . , n} such that q�i

i ∈⋃

M. Next we define M and S. To simplifydefinitions and arguments, by abuse of notation, we use q�i.λ

i,λ to denote the

396 C. Creus et al.

element q�ii ∈ S. Taking into account this new notation, assume that M and S

are guessed satisfying that

(i)⋃

M={q�i. ji, j | i∈{1, . . . , n}, j ∈ Pos(ri), | j| ≤ 1, qi, j = ri( j), height(ri| j) =

h− 1}, and

(ii)⋃

S = {q�i. ji, j | i ∈ {1, . . . , n}, j ∈ Pos(ri), | j| ≤ 1, qi, j = ri( j), height(ri| j) <

h− 1, (height(ri) < h ⇒ j = λ)}, and

(iii) for each different q�i1 . j1i1, j1 , q

�i2 . j2i2, j2 ∈

⋃(M ∪ S), q

�i1 . j1i1, j1 ∼M∪S q

�i2 . j2i2, j2 if and only

if term(ri1 | j1) =� term(ri2 | j2),

We now prove that 〈clean(M), clean(S)〉 can be derived from 〈M, S〉 with R

using the considered guesses. Conditions (a) and (b) are trivially sat-isfied. Condition (c) follows from the fact that r1, . . . , rn fit 〈M, S〉 andcondition (iii) in the definition of M and S. Condition (d) and (e) fol-low from the fact that r1, . . . , rn fit 〈M, S〉, the selections of the rules,and condition (iii). In order to see that condition (f) holds, first notethat, for each different q

�i1 . j1i1, j1 , q

�i2 . j2i2, j2 ∈

⋃(M ∪ S), if (qi1, j1 �≈ qi2, j2) ∈

◦D�, then

necessarily term(ri1 | j1) �=� term(ri2 | j2) because ri1 , ri2 are �-compatible since

r1, . . . , rn fit 〈M, S〉. Thus, q�i1 . j1i1, j1 �∼M∪S q

�i2 . j2i2, j2 follows from condition (iii). The

other case of condition (f), i.e. when qi1, j1 = q�, also follows from condition(iii). Finally, condition (g) follows from Lemma 10 and condition (i). Alto-gether implies that 〈M, S〉 →R 〈clean(M), clean(S)〉 holds.Note that the fact that the ri| j’s such that q�i. j

i, j ∈⋃

(clean(M) ∪ clean(S))

fit 〈clean(M), clean(S)〉 is straightforward from the definition of M, S andthe clean operation. Moreover, the maximum height of such ri| j’s is h− 1by conditions (i) and (ii) and the fact that

⋃M is not empty. Thus, we

can apply induction hypothesis and conclude that 〈clean(M), clean(S)〉→∗

R〈∅, ∅〉 satisfying (C1) and (C2) for the �-runs ri| j’s such that q�i. j

i, j ∈⋃

(clean(M) ∪ clean(S)). Hence, the derivation d exists, (C1) holds since|d| = 1+ |〈clean(M), clean(S)〉 →∗

R〈∅,∅〉| = 1+ h = max({1+ height(ri)|i ∈

{1, . . . , n}}), and (C2) holds by construction of M and S. ��

Corollary 5 Let A be a TAG[ �≈R]. Let◦

A� be 〈 ◦Qc

� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉. There

exists a derivation of the form 〈{{qλf }},∅〉 →∗

R〈∅,∅〉, where q f ∈

◦F�, if and only if

L(A) is not empty.

Proof The statement follows by Corollary 2 and Lemma 13. ��

Lemma 14 Let A = 〈Q, �, F, D,�〉 be a TAG[ �≈R] and let◦

A� be〈 ◦Qc

� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉. Let M, S be partitions of labeled states such that

〈{{qλf }},∅〉 →∗

R〈M, S〉 →∗

R〈∅, ∅〉, where q f ∈

◦F�.

Then, |⋃(M ∪ S)| ≤ 2 · |Q| · |�|maxar(�)|Q| .


Proof We assume that q f �= q�, since the case q f = q� follows trivially.By Lemma 13, for

⋃(M ∪ S) = {q�1

1 , . . . , q�nn }, there exist �-runs r1, . . . , rn of

◦A�

fitting 〈M, S〉. Assume, without loss of generality, that there exists k ∈ {0, . . . , n} suchthat (ri(λ) ∈ ◦

Qc� ⇔ i ≤ k) holds for each i ∈ {1, . . . , n}.

Consider first the �-runs r1, . . . , rk. Note that, for each i ∈ {1, . . . , k}, q� doesnot occur in ri, since otherwise, by Lemmas 4 and 5, and Definitions 5 and 6,ri(λ) = q� holds, which is not possible by definition of the clean operation andthe assumption that q f �= q�. Moreover, height(ri) < |Q|, since otherwise ri canbe pumped, which implies that q� occurs in ri. Moreover, note that there exist, atmost, |�|maxar(�)h+1

different terms of height h. Hence, q�11 , . . . , q�k

k belong to, at most,|�|maxar(�)|Q| different parts of M ∪ S. Again by definition of clean, a certain part ofM ∪ S may contain, at most, |Q| elements of the form q� with q ∈ ◦

Qc�. It follows that

k ≤ |Q| · |�|maxar(�)|Q| .Now consider the �-runs rk+1, . . . , rn. Recall that ri(λ) ∈ ◦

Qc�, for i ∈ {k+ 1, . . . , n}.

Since r1, . . . , rn fit 〈M, S〉, it follows that rk+1, . . . , rn are pairwise �-compatible.Hence, we can apply Lemma 11 and conclude that n− k ≤ |Q| · |�|maxar(�)|Q| .

In summary, |⋃(M ∪ S)| = n = k+ (n− k) ≤ 2 · |Q| · |�|maxar(�)|Q| . ��

We are ready to conclude decidability of emptiness of the language recognized bya TAG[�≈R]. As a technical detail to ease the presentation, from now on we assumethat two pairs of partitions of labeled states 〈M, S〉 and 〈M′, S′〉 are equivalent,denoted 〈M, S〉 ≡ 〈M′, S′〉, if they are equal up to renaming of the labels.

Theorem 1 Emptiness of the language recognized by a TAG[�≈R] A can be decided

in time O(222|A|).

Proof Let A be 〈Q, �, F, D,�〉 more explicitly written and consider◦

A� = 〈◦

Qc� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉, which can be computed in polynomial time. By Corollary 5,

the emptiness problem for TAG[�≈R]’s can be reduced to checking for the existenceof a derivation of the form 〈{{qλ

f }},∅〉 →∗R〈∅,∅〉, where q f ∈

◦F�. The pairs 〈M, S〉

that have to be considered satisfy that |⋃(M ∪ S)| ≤ 2 · |Q| · |�|maxar(�)|Q| , as statedin Lemma 14. Moreover, note that we do not have to consider derivations containinga subderivation of the form 〈M, S〉 →+

R〈M′, S′〉, with 〈M, S〉 ≡ 〈M′, S′〉, since the

existence of a derivation 〈M′, S′〉 →∗R〈∅,∅〉 implies the existence of a derivation

〈M, S〉 →∗R〈∅,∅〉 of the same length. Thus, the total number of non-equivalent pairs

〈M, S〉 to be considered is bounded by the number of all possible partitions of a

set of size 2 · |Q| · |�|maxar(�)|Q| , that is in O(222|A|). This guarantees that the search

terminates in time O(222|A|). ��

Note that, it can be derived from our arguments that an upper bound for the height

of a minimal accepting �-run is in O(222|A|). The traditional approach for deciding

emptiness consists on generating all terms with height smaller than the bound andchecking whether one of them is accepted by the given automaton. However, this

398 C. Creus et al.

approach would lead to an algorithm with cost doubly exponential with respect tothe bound for the height.

Now we tackle the finiteness problem for TAG[�≈R]’s. The following definitionand its corresponding lemma show how derivations with R relate to the finiteness ofthe recognized language.

Definition 10 Let A be a TAG[�≈R]. Let◦

A� be 〈 ◦Qc

� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉. A

state q f ∈◦

F� is a witness of inf initeness if it satisfies one of the following conditions:

(W1) q f = q�.(W2) There exists a derivation of the form 〈{{qλ

f }},∅〉 →+R〈clean(M), clean(S)〉 →∗

R

〈∅, ∅〉 such that a part of the form {q��} occurs in M ∪ S.

(W3) There exists a derivation of the form 〈{{qλf }},∅〉 →∗

R〈M1, S1〉 →+

R〈M2, S2〉 →∗R〈∅,∅〉, such that 〈M1, S1〉 ≡ 〈M2, S2〉.

Lemma 15 Let A be a TAG[ �≈R]. Let◦

A� be 〈 ◦Qc

� ∪◦

Qc�, ��,

◦F�,

◦D�,

◦��〉. L(A) is

inf inite if and only if there exists q f ∈◦

F� such that q f is a witness of inf initeness.

Proof Let A be 〈Q, �, F, D, �〉 more explicitly written. We prove each directionseparately.

(⇒) Consider a state q ∈ F such that L(A, q) is infinite, which is guaranteed toexist by the assumption. If q ∈ Q∞

A , then q� ∈◦

F� and condition (W1) triviallyholds. Otherwise, we distinguish cases depending on whether there exists anaccepting run r of A reaching q and containing a state in Q∞

A . If it is thecase, it is easy to construct an accepting �-run of

◦A� from r reaching q by

properly replacing subruns reaching a state in Q∞A by the �-run (�→ q�) and,

by condition (C2) of Lemma 13, q satisfies (W2). Else, there exist arbitrarilyhigh accepting runs of A reaching q and not involving states in Q∞

A . It followsthat there exist arbitrarily high accepting �-runs of

◦A� reaching q. Thus, by

condition (C1) of Lemma 13, there exist arbitrarily long derivations of theform 〈{{qλ}},∅〉 →∗

R〈∅, ∅〉. Since, for any derived pair 〈M, S〉, |⋃(M ∪ S)| is

bounded as stated in Lemma 14, by the pigeon hole principle, the existence of aderivation 〈{{qλ}},∅〉 →∗

R〈M1, S1〉 →+

R〈M2, S2〉 →∗

R〈∅, ∅〉, with 〈M1, S1〉 ≡

〈M2, S2〉, follows, and q satisfies (W3).(⇐) If q f satisfies (W1), then note that (�→ q�) is an accepting �-run of

◦A� and

the statement follows from Corollary 3. If q f satisfies (W2), then, by condition(C2) of Lemma 13, there exists an accepting �-run of

◦A� reaching q f and

containing the �-subrun (�→ q�). In this case, the statement follows againfrom Corollary 3. Finally, if q f satisfies (W3), note that the subderivation〈M1, S1〉 →+

R〈M2, S2〉 can be pumped and, hence, we can construct arbitrarily

long derivations. Thus, by condition (C1) of Lemma 13, an infinite number ofaccepting �-runs of

◦A� fitting 〈{{qλ

f }},∅〉 exist. Hence, L�(◦

A�) is infinite and,by Corollary 3, the statement follows. ��

Finally, we prove decidability of finiteness for TAG[�≈R]’s.


Theorem 2 Finiteness of the language recognized by a TAG[�≈R] A can be decided

in time O(222|A|).

Proof As proven in Lemma 15, infiniteness of the language recognized by Acan be reduced to checking for the existence of a witness of infinity in

◦A�, in

the sense of Definition 10. The pairs 〈M, S〉 that have to be considered satisfythat |⋃(M ∪ S)| ≤ 2 · |Q| · |�|maxar(�)|Q| , as stated in Lemma 14. Checking condition(W1) of Definition 10 is straightforward. For condition (W2) we do not have toconsider derivations containing a subderivation of the form 〈M, S〉 →+

R〈M′, S′〉, with

〈M, S〉 ≡ 〈M′, S′〉. Finally, for checking condition (W3) we only need to considerderivations with at most one subderivation of the form 〈M, S〉 →+

R〈M′, S′〉, with

〈M, S〉 ≡ 〈M′, S′〉. Since the total number of non-equivalent pairs 〈M, S〉 is boundedby the number of all possible partitions of a set of size 2 · |Q| · |�|maxar(�)|Q| , that is in

O(222|A|), termination is guaranteed in time O(222|A|

). ��

7 Conclusion

We have obtained algorithms with triple exponential time complexity for decidingemptiness and finiteness of tree automata with global constraints, when the con-straint is just a conjunction of atoms over the predicate �≈, and the formula defines areflexive relation among all the states occurring in it. The exact complexity of theseproblems remains open, as well as their complexity for the general case of arbitraryglobal constraints. Several variants like adding equality constraints or removing thereflexivity condition are also interesting and deserve further study. We believe thatour results can be extended to the case where term equality is interpreted modulocommutativity of some function symbols. To this end, it seems necessary to adaptthe conditions in the application of the inference system R. In particular, condition(d) should establish a bijection between the respective direct children of two labelledstates in the same part, ensuring that bijected children go to the same part, and hencegenerate the same term. Similarly, condition (e) must ensure that such a bijection isnot possible.

References

1. Baader, F., Nipkow, T.: Term Rewriting and All That. Cambridge University Press, New York(1998)

2. Barguñó, L., Creus, C., Godoy, G., Jacquemard, F., Vacher, C.: The emptiness problem for treeautomata with global constraints. In: Logic in Computer Science (LICS), pp. 263–272 (2010)

3. Bogaert, B., Tison, S.: Equality and disequality constraints on direct subterms in tree automata.In: Symposium on Theoretical Aspects of Computer Science (STACS), pp. 161–171 (1992)

4. Bojanczyk, M., Muscholl, A., Schwentick, T., Segoufin, L.: Two-variable logic on data trees andXML reasoning. J. ACM 56(3), 13:1–13:48 (2009)

5. Comon, H., Dauchet, M., Gilleron, R., Jacquemard, F., Löding, C., Lugiez, D., Tison, S.,Tommasi, M.: Tree automata techniques and applications (2007). http://tata.gforge.inria.fr

6. Comon, H., Jacquemard, F.: Ground reducibility and automata with disequality constraints. In:Symposium on Theoretical Aspects of Computer Science (STACS), pp. 151–162 (1994)

7. Comon, H., Jacquemard, F.: Ground reducibility is EXPTIME-complete. In: Logic in ComputerScience (LICS), pp. 26–34 (1997)

http://tata.gforge.inria.fr

400 C. Creus et al.

8. Dauchet, M., Caron, A.C., Coquidé, J.L.: Automata for reduction properties solving. J. Symbol.Comput. 20(2), 215–233 (1995)

9. David, C., Libkin, L., Tan, T.: Efficient reasoning about data trees via integer linear program-ming. In: ICDT, pp. 18–29 (2011)

10. Filiot, E., Talbot, J., Tison, S.: Tree automata with global constraints. Int. J. Found. Comput. Sci.21(4), 571–596 (2010)

11. Jacquemard, F., Klay, F., Vacher, C.: Rigid tree automata and applications. Inf. Comput. 209(3),486–512 (2011)

12. Mongy, J.: Transformation de noyaux reconnaissables d’arbres. forêts rateg. Ph.D. thesis, Labo-ratoire d’Informatique Fondamentale de Lille, Université des Sciences et Technologies de Lille,Villeneuve d’Ascq, France (1981)

Date post:	23-Dec-2016
Category:	Documents
Upload:	guillem
View:	213 times
Download:	1 times

Emptiness and Finiteness for Tree Automata with Global Reflexive Disequality Constraints

Documents