Learnability of automatic classes

Journal of Computer and System Sciences 78 (2012) 1910–1927

Contents lists available at SciVerse ScienceDirect

Journal of Computer and System Sciences

www.elsevier.com/locate/jcss

Learnability of automatic classes

Sanjay Jain a,∗,1, Qinglong Luo b, Frank Stephan c,1

a Department of Computer Science, National University of Singapore, Singapore 117417, Singaporeb DSO National Laboratories, 20 Science Park Drive, Singapore 118230, Singaporec Department of Mathematics and Department of Computer Science, National University of Singapore, Singapore 119076, Singapore

a r t i c l e i n f o a b s t r a c t

Article history:Received 7 July 2010Received in revised form 6 March 2011Accepted 17 November 2011Available online 22 December 2011

Keywords:Inductive inferenceAutomatic structures

The present work initiates the study of the learnability of automatic indexable classeswhich are classes of regular languages of a certain form. Angluin’s tell-tale conditioncharacterises when these classes are explanatorily learnable. Therefore, the more interestingquestion is when learnability holds for learners with complexity bounds, formulated inthe automata–theoretic setting. The learners in question work iteratively, in some caseswith an additional long-term memory, where the update function of the learner mappingold hypothesis, old memory and current datum to new hypothesis and new memoryis automatic. Furthermore, the dependence of the learnability on the indexing is alsoinvestigated. This work brings together the fields of inductive inference and automaticstructures.

© 2012 Elsevier Inc. All rights reserved.

1. Introduction

The present work studies inductive inference [17] within the framework of automata theory and, in particular, automaticstructures. The basic scenario of inductive inference is that a learner is receiving, one piece at a time, data about a targetconcept. As the learner is receiving the data, it conjectures a hypothesis about what the target concept might be. Thehypothesis may be modified or changed as more data is received. One can consider the learner to be successful if thesequence of hypotheses converges to a correct hypothesis which explains the target concept.

The concept classes of interest to us in this paper are classes of regular languages. (A regular language is a subset of Σ∗ ,for some finite alphabet Σ , which is recognised by a finite automaton.) The data provided to the learner then becomes asequential presentation of all the elements of the target language, in arbitrary order, with repetition allowed. To deal withthe empty language, we also allow a special symbol � to be presented to the learner. This symbol represents no data. Sucha presentation of data is called a text for the language. Note that a text presents only positive data to the learner, and notnegative data, that is, the learner is not explicitly told which elements do not belong to the language. If both positive andnegative data are presented to the learner, then the mode of presentation is called informant. In this paper we will only beconcerned with learning from texts.

In many cases, one considers only recursive learners. The hypotheses produced by the learner describe the language tobe learnt in some form. For example, they might be grammars generating the language. The learner is said to Ex-learn thetarget language iff the sequence of hypotheses converges to one correct hypothesis describing the language to be learnt.Here “Ex-learn” stands for explanatory learning. Learning of one language L is not interesting, as the learner might ignore

* Corresponding author.E-mail addresses: [email protected] (S. Jain), [email protected] (Q. Luo), [email protected] (F. Stephan).

1 Sanjay Jain was supported in part by NUS grants R252-000-308-112 and C252-000-087-001. Frank Stephan was supported in part by NUS grantsR146-000-114-112 and R252-000-308-112.

0022-0000/$ – see front matter © 2012 Elsevier Inc. All rights reserved.doi:10.1016/j.jcss.2011.12.011

http://dx.doi.org/10.1016/j.jcss.2011.12.011

http://www.ScienceDirect.com/

http://www.elsevier.com/locate/jcss

mailto:[email protected]



http://dx.doi.org/10.1016/j.jcss.2011.12.011

S. Jain et al. / Journal of Computer and System Sciences 78 (2012) 1910–1927 1911

all inputs and always output a hypothesis which is correct for L. Thus, what is considered is whether all languages from aclass of languages are Ex-learnt by some learner. When such a learner exists for a class, that class is said to be Ex-learnable.

Since [17], several other models of learning have been considered by the researchers. For example, in behaviourallycorrect learning (BC-learning) [5] the learner is not required to converge syntactically to one correct hypothesis; rather,it is just required that all hypotheses are correct from some time onwards. In other words, one requires only semanticconvergence in this case. In vacillatory learning (FEx-learning) [10], the learner eventually vacillates between only finitelymany hypothesis, all of which are correct.

Besides the mode of convergence, researchers have also considered several properties of learners such as

• consistency, where the hypothesis of the learner is required to contain the elements seen in the input so far (see [6,7]),• conservativeness, where the learner is not allowed to change a hypothesis which is consistent with the data seen so far

(see [1]) and• iterativeness, where the new hypothesis of the learner depends only on the previous hypothesis and the latest datum

(see [36,37]). Iterative learning is often also called incremental learning.

The formal definitions of the above criteria are given in Section 2 below.Besides considering models of learning, there has also been interest in considering learning of practical and concrete

classes such as the pattern languages [2,14,23,27], elementary formal systems [35] and the regular languages [4]. As the classof all regular languages is not learnable from positive data [17], Angluin [3] initiated the study of the learnability of certainsubclasses of the regular languages from positive data. In particular, she showed the learnability of the class of k-reversiblelanguages. These studies were later extended [13,15,18]. The classes considered in these studies were all superclasses ofthe class of all 0-reversible languages, which is not automatic; for example, every language {0,1}∗{2}n{0,1}∗ is 0-reversiblebut the class of these languages is not automatic. Also many other subclasses of the regular languages which have beenconsidered in the literature are not automatic. Furthermore, learnability of regular languages from counterexamples andqueries has also been studied (for example by Angluin [4] and Ibarra and Jiang [21]) but we will not be concerned withthese learning models in this paper.

In this work, we consider those subclasses of the regular languages where the membership problem is regular in thesense that one automaton accepts a combination (called a convolution) of an index and a word iff the word is in the languagegiven by the index. This is formalised in the framework of automatic structures [8,9,19,20,24,32,33]. Here are some examplesof automatic classes:

• The class of sets with up to k elements for a constant k.• The class of all finite and cofinite subsets of {0}∗ .• The class of all intervals of an automatic linear order on a regular set.• Given an automatic presentation of (Z,+,<) and a first-order formula Φ(x,a1, . . . ,an) with parameters a1, . . . ,an ∈ Z,

the class consisting of all sets {x ∈ Z: Φ(x,a1, . . . ,an)} with a1, . . . ,an ∈ Z.

It is known that the automatic relations are closed under first-order theory, as proven by Khoussainov and Nerode [24].This makes several properties of such classes regular and thus decidable; it also makes it possible to define learners usingfirst-order definitions. Studies in automatic structures have connections to model checking and Boolean algebra [9,24].

A tell-tale set for a language L in a class L is a finite subset D of L such that, for every L′ ∈ L, D ⊆ L′ ⊆ L impliesL′ = L. A class L satisfies Angluin’s tell-tale condition iff every language L in L has a tell-tale set (with respect to L).Angluin [1] showed that any class of languages which is learnable (even by a non-recursive learner, for which Ex-, BC-and FEx-learning are all the same) must satisfy Angluin’s tell-tale condition. We show in Theorem 9 that every automaticclass that satisfies Angluin’s tell-tale condition is Ex-learnable by a recursive learner which is additionally consistent andconservative. Additionally, it is decidable whether an automatic class satisfies Angluin’s tell-tale condition and thus whetherit is Ex-learnable (see Corollary 10).

As we are considering learning of automatic classes, it is natural to also consider learners which are simpler than justbeing recursive. A natural idea would be to consider learners which are themselves described via automatic structures. Thiswould put both, the learners and the classes to be learnt into a unified framework. Furthermore, the automatic learners arelinear time computable [11] and additional constraints on the memory can be satisfied.2 This approach is justified by theobservation that a learner might observe much more data than it can remember and therefore it is not realistic to assumethat the whole learning history can be remembered. To model the above, we consider variants of iterative learners [36,37]and learners with bounded long-term memory [16,25]. The basic idea is that the learner reads in each round a datum andupdates the long term memory and the hypothesis based on this datum; for automatic learners, this update function is thenrequired to be automatic.

2 Note that the constant in this linear time computation is at most proportional to the size of the automata representing the automatic learner. Further-more, the memory of such automatic learners cannot grow too fast, even for the most general case (see Proposition 15).

1912 S. Jain et al. / Journal of Computer and System Sciences 78 (2012) 1910–1927

As automatic structures are relatively simple to implement and analyse, it is interesting to explore the capabilities of suchlearners. In Section 3 we formally define automatic learners: iterative learners as well as iterative learners with long-termmemory. Specifically we consider the following bounds on memory: memory bounded by a constant, memory bounded bythe size of the hypothesis, memory bounded by the size of the largest word seen in the input so far, besides the defaultcases of no memory (iterative learning) and the case where we do not put any specific bounds on memory except as implicitfrom the definition of automatic learners. Theorem 16 shows that there are automatic classes which are Ex-learnable (eveniteratively) but not learnable by any automatic learners.

In Section 3 we show the relationship between various iterative automatic learners and iterative automatic learners withlong-term memory. For example, if long-term memory is not explicitly bounded, then automatic Ex-learning is the sameas automatic BC-learning, in contrast to the situation in learning of recursively enumerable languages by recursive learners,where there is a difference [5]. Additionally, for BC-learning, different bounds on long-term memory do not make a dif-ference, as all automatically BC-learnable classes (with no explicit long-term memory bound) are iteratively automaticallyBC-learnable (see Theorem 17). Similarly, for FEx-learning, all automatically FEx-learnable classes with long-term mem-ory bounded by size of the hypothesis are iteratively FEx-learnable. However, for both explanatory learning and vacillatorylearning, there is a difference if one considers long-term memory bounded by hypothesis size, or whether long-term mem-ory is bounded by the size of the largest word seen in the input so far (see Theorem 18). For both Ex and FEx-learning, itis open at this point whether bounding the size of the long-term memory by the size of the longest word seen so far isequivalent to there being no explicit bound on the size of the long-term memory. For explanatory learning, it is addition-ally open whether constant size memory is equivalent to having hypothesis size memory and whether longest word sizememory can simulate hypothesis size memory.

In Section 4 we consider consistent learning by automatic learners. Unlike Theorem 9, where we show that generallearners for automatic classes can be made consistent, automatic learners cannot in general be made consistent. Theorem 22shows that there is an automatic class L which is Ex-learnable by an automatic iterative learner but not Ex-learnable bya consistent automatic learner with no constraints on long-term memory, except those implicit due to the learner beingautomatic. Theorem 23 shows that there is an automatic class L, which is Ex-learnable by a consistent automatic learneror an iterative automatic learner, but not by a consistent iterative learner. Theorem 25 shows the existence of an automaticclass L which is Ex-learnable by a consistent and iterative automatic learner using a class comprising hypothesis space (i.e.,using hypotheses from an automatic class which is a superset of the class L), but not Ex-learnable by a consistent automaticlearner (with no constraints on long-term memory, except those implicit due to the learner being automatic) using a classpreserving hypothesis space, i.e., using a hypothesis space which contains languages only from L.

One of the reasons for the difficulty of learning by iterative learners is that they forget past data. An attempt to overcomethis is to require that every datum appears infinitely often in the text — such a text is called a fat text [31]. Fat texts are quitefrequently studied in learning theory. In Section 5 we investigate the natural question of whether requiring fat texts permitsthe limitations of iterative learning and related criteria to be overcome. In Theorem 28 we show that every automatic classthat satisfies Angluin’s tell-tale condition is Ex-learnable (using the automatic class itself as the hypothesis space) from fattexts by an automatic learner with long-term memory bounded by the size of the largest word seen so far. If one allowsan arbitrary class preserving hypothesis space, then one can even do Ex-learning in the above case by iterative automaticlearners and no additional long-term memory is needed.

In Theorem 31, we show the existence of automatic classes which are automatically iteratively learnable (even fromnormal texts) using a class preserving hypothesis space, but not conservatively iteratively learnable using a one–one classpreserving hypothesis space (even by arbitrary recursive learners) on fat texts.

Partial identification is a very general learning criterion, where one requires that some fixed correct hypothesis is outputinfinitely often by the learner while all other hypotheses are output only finitely often [31]. In Theorem 35 we show thatevery automatic class is partially learnable by an automatic iterative learner. This corresponds to the result by Osherson,Stob and Weinstein [31] that the whole class of all recursively enumerable languages is partially learnable by some recursivelearner.

2. Preliminaries

Let N denote the set of natural numbers. Let Z denote the set of integers. The symbol ∅ denotes the empty set. Symbols⊆,⊇,⊂,⊃, respectively, denote subset, superset, proper subset and proper superset. Furthermore, max S , min S and card S ,respectively, denote the maximum, minimum and cardinality of a set S , where max ∅ = 0 and min ∅ = ∞.

An alphabet Σ is any non-empty finite set and Σ∗ is the set of all strings (words) over the alphabet Σ . The symbol εdenotes the empty string. A string of length n over Σ will be treated as a function from the set {0,1,2, . . . ,n − 1} to Σ .Thus, string x of length n is the same as x(0)x(1)x(2) . . . x(n − 1). A language is a subset of Σ∗ and a class is a set oflanguages.

The relation x <lex y denotes that x is lexicographically (that is, in dictionary order) before y. The relation x <ll y denotesthat x is length-lexicographically before y, that is, either |x| < |y|, or |x| = |y| and x <lex y. Note that, for any given alphabet,the reflexive closure of <ll is a linear order over the strings of that alphabet. When we consider sets of strings, we takemin S to denote the length-lexicographically least string in S . Let succL(x) denote the length-lexicographically least y, if any,in L such that x <ll y.


For a language L ⊆ Σ∗ , we define the characteristic function CFL as an infinite string as follows. Suppose z0, z1, . . . isthe ordering of all strings over Σ∗ in length-lexicographic order. Then, CFL(n) = 1, if zn ∈ L; CFL(n) = 0, otherwise. For anylanguage L, we let L[y] denote the set {x ∈ L: x �ll y}.

In the present work we will only consider classes of regular sets. Furthermore, Σ will always refer to the alphabet onwhich languages and language classes are defined.

Definition 1. An indexing of a class L is a sequence of sets Lα with α ∈ I , for some domain I , such that L = {Lα: α ∈ I}.

Often we will refer to both, the class and the indexing, as {Lα: α ∈ I}, where the indexing is implicit. The I above iscalled the set of legal indices. We will always assume that the indices in I are taken as words over an alphabet and weusually denote this alphabet with the letter Γ .

Now we consider notions related to automatic structures. First, we consider the definition of a convolution of a tuple ofstrings. Intuitively, a convolution transforms rows of strings into a string of columns.

Definition 2. (See Khoussainov, Nerode [24].) Let n > 0 and Σ1,Σ2, . . . ,Σn be alphabets not containing #. Let x1 ∈ Σ∗1 , x2 ∈

Σ∗2 , . . . , xn ∈ Σ∗

n be given. Let � = max{|x1|, |x2|, . . . , |xn|} and let yi = xi#�−|xi | . Define z to be a string of length � such thatz( j) is the symbol made up of the j-th symbols of the strings y1, y2, . . . , yn: z( j) = (y1( j), y2( j), . . . , yn( j)), where z( j) isa symbol in the alphabet (Σ1 ∪ {#}) × (Σ2 ∪ {#}) × · · · × (Σn ∪ {#}). We call z the convolution of x1, x2, . . . , xn and denoteit as conv(x1, x2, . . . , xn). Let R ⊆ Σ∗

1 × Σ∗2 × · · · × Σ∗

n . We call the set S = {conv(x1, x2, . . . , xn): (x1, x2, . . . , xn) ∈ R}, theconvolution of R . Furthermore, we say that R is automatic iff the convolution of R is regular.

For ease of notation, we often write just (x1, x2, . . . , xn) instead of conv(x1, x2, . . . , xn) and Lx1,...,xn or Hx1,...,xn in placeof Lconv(x1,...,xn) or Hconv(x1,...,xn) respectively. We next define the notion of an automatic indexing.

Definition 3. An indexing {Lα: α ∈ I} is automatic iff I is regular and E = {(α, x): x ∈ Lα, α ∈ I} is automatic. A class isautomatic iff it has an automatic indexing.

Khoussainov and Nerode [24] found the following fundamental result on automatic structures which is useful to defineautomatic learners and to decide the learnability of automatic classes.

Fact 4. (See Blumensath, Grädel [9], Khoussainov, Nerode [24].) Any relation that is first-order definable from existing automaticrelations is automatic.

Next, we recall a few learning relevant definitions, followed by a result from Angluin [1] that characterises learnableclasses. For any alphabet Σ , Γ , we let

• � be a special character not in Σ∗ which is called the pause symbol;• ? be a special character not in Γ ∗ which is called the no-conjecture symbol.

Let Σ be the alphabet over which languages are being considered. We use σ ,τ to denote finite sequences over Σ∗ ∪ {�}and T to denote infinite sequences over Σ∗ ∪ {�}. Furthermore, λ denotes the empty sequence. The length of a sequenceσ is denoted by |σ |. T [m] denotes the initial segment of T of length m. We let σ � τ (respectively, σ � T ) denote theconcatenation of σ and τ (respectively, σ and T ). For a sequence σ and string x, we often use σ � x to denote theconcatenation of sequence σ with the sequence of length 1 consisting of string x. For ease of notation, when it is clear fromthe context that concatenation of sequences is meant, we sometimes drop the symbol �. Thus, στ means σ � τ . For a finitesequence σ over Σ∗ ∪ {�}, content of σ , denoted by cnt(σ ), is defined as cnt(σ ) = {x ∈ Σ∗: ∃n < |σ |(σ (n) = x)}. Similarly,for every infinite sequence T over Σ∗ ∪ {�}, content of T , denoted by cnt(T ), is defined as cnt(T ) = {x ∈ Σ∗: ∃n ∈ N(T (n) =x)}. For every set L and every infinite sequence T over Σ∗ ∪ {�} with L = cnt(T ), we call T a text for L. For every L ⊆ Σ∗ ,let txt(L) = {T ∈ (Σ∗ ∪ {�})ω: cnt(T ) = L} and seq(L) = {σ ∈ (Σ∗ ∪ {�})∗: cnt(σ ) ⊆ L}.

Given a class L, a hypothesis space for L is an indexing {Hα: α ∈ J } ⊇ L, where J is the set of indices for the hypothesisspace. We will only consider automatic hypothesis spaces. A hypothesis space is class preserving with respect to L iff L ={Hα: α ∈ J }. A hypothesis space is class comprising with respect to L iff L ⊆ {Hα: α ∈ J }. A hypothesis space is one–oneclass preserving with respect to L iff it is class preserving and, for every L ∈ L, there is exactly one α ∈ J with L = Hα .In use of the above definitions, we often drop “with respect to L”, if the class L is clear from context.

A learner is a function F : (Σ∗ ∪ {�})∗ → J ∪ {?}. We use M and N for recursive learners, and F for learners which maynot be recursive. We use P for iterative learners and Q for iterative learners with additional long-term memory. The learnersP and Q are usually automatic. Iterative and automatic learners are defined in Section 3 below.


Definition 5. Fix a class L and a hypothesis space {Hα: α ∈ J } with J being the set of indices. Let F be a learner.

(a) (See [17].) We say that F Ex-learns L iff for every L ∈ L and every T ∈ txt(L), there exists an n ∈ N and an α ∈ J withHα = L such that, for every m � n, F(T [m]) = α.

(b) (See [5].) We say that F BC-learns L iff for every L ∈ L and every T ∈ txt(L), there exists an n ∈ N such that, for everym � n, HF(T [m]) = L.

(c) (See [10].) We say that F FEx-learns L iff F BC-learns L and for every L ∈ L and every T ∈ txt(L), the set {F(T [n]):n ∈ N} is finite.

(d) (See [31].) We say that F Part-learns L iff for every L ∈ L and every T ∈ txt(L), there exists an α ∈ J such that (i) Lα = L,(ii) for every n ∈ N, there exists a k � n such that F(T [k]) = α and (iii) for every β ∈ J with β �= α, there exists an n ∈ Nsuch that, for every k � n, F(T [k]) �= β .

For Ex, FEx, BC and Part learning, one can assume without loss of generality that the learner never outputs ?. However,for some other criteria of learning, this is not necessarily the case.

Definition 6. Let Σ and Γ be alphabets. Let {Hα: α ∈ J } be a hypothesis space with some J ⊆ Γ ∗ being the set of indices.Let F be a learner.

(a) (See [6].) We say that F is consistent on L ⊆ Σ∗ iff for every σ ∈ seq(L), if F(σ ) ∈ J , then cnt(σ ) ⊆ HF(σ ) . We say that Fis consistent on L ⊆ powerset(Σ∗) iff it is consistent on each L ∈ L.

(b) (See [1].) We say that F is conservative on L ⊆ Σ∗ iff for every σ ,σ ′ ∈ seq(L), if F(σ ) ∈ J and cnt(σ � σ ′) ⊆ HF(σ ) , thenF(σ � σ ′) = F(σ ). We say that F is conservative on L ⊆ powerset(Σ∗) iff it is conservative on each L ∈ L.

(c) (See [30,34].) We say that F is set-driven iff for every σ1, σ2 ∈ (Σ∗ ∪ {�})∗ , if cnt(σ1) = cnt(σ2), then F(σ1) = F(σ2).

When we are considering learning consistently (conservatively, set-drivenly) a class L, we mean learning of the class bya learner which is consistent (conservative, set-driven) on L.

For each learning criterion LC such as Ex, FEx, BC and Part, we let LC also denote the collection of all classes which areLC-learned by a recursive learner using some class comprising hypothesis space.

Blum and Blum [7] introduced the notion of a locking sequence for a learner F on a set L learnt by F: a locking sequencefor a learner F on L is any sequence σ ∈ seq(L) such that, for some fixed index e for L, F(στ ) = e for all τ ∈ seq(L). Blumand Blum showed that a locking sequence always exists for languages Ex-learnt by F and this notion can be adapted formost learning criteria considered in this paper.

Using locking sequences, techniques of Angluin [1] can be used to characterise classes that are Ex-learnable by a, notnecessarily recursive, learner. First, let us recall the definition of a tell-tale set, while introducing the definition of a tell-talecut-off word.

Definition 7 (Angluin’s tell-tale condition). (See [1].) Suppose L is a class of languages.

(a) For every L ∈ L, we say that D is a tell-tale set of L (in L) iff D is a finite subset of L and for every L′ ∈ L with D ⊆ L′ ⊆ Lwe have L′ = L.

(b) For every L ∈ L and x ∈ Σ∗ , we say that x is a tell-tale cut-off word of L (in L) iff {y ∈ L: y �ll x} is a tell-tale set of L(in L).

(c) We say that L satisfies Angluin’s tell-tale condition iff every L ∈ L has a tell-tale set (in L), or equivalently, a tell-talecut-off word (in L).

Fact 8. (Based on Angluin [1].) Let Σ be an alphabet. A class L of recursively enumerable languages is Ex-learnable (by a not necessarilyrecursive learner) iff L satisfies Angluin’s tell-tale condition.

Note that for non-recursive learners, Ex, FEx and BC learning are equivalent. Given a uniformly recursive class{Lα: α ∈ J }, Angluin [1] proved that the learner can be chosen to be recursive iff there is a uniformly recursively enu-merable class of sets, {Eα: α ∈ J }, such that each Eα is a tell-tale set for Lα . Note that in general such recursive learnersmay not be consistent, conservative or set-driven. In particular it can be shown that there are classes of languages whichcan be recursively learnt, but cannot be consistently, conservatively or set-drivenly learnt (see respectively [1,6] and [30,34]).

Using the Fundamental Theorem for automatic structures, the following theorem shows that any automatic class satis-fying Angluin’s tell-tale condition is Ex-learnable and the learner can be made to be recursive, consistent, conservative andset-driven.

Theorem 9. Suppose L is automatic. Then, there is a learner which recursively, consistently, conservatively and set-drivenly Ex-learnsL iff L satisfies Angluin’s tell-tale condition.


Proof. (⇐) This follows from Fact 8.(⇒) Suppose L is automatic and satisfies Angluin’s tell-tale condition. Let {Lα: α ∈ I}, be an indexing of L. Now consider

the learner M (which uses {Lα: α ∈ I} as the hypothesis space) such that M(σ ) is defined as follows.

• If there exists an α ∈ I such that, for some w ∈ Σ∗ ,(a) cnt(σ ) ⊆ Lα ,(b) {x: x �ll w, x ∈ Lα} ⊆ cnt(σ ),(c) for all β ∈ I , [{x: x �ll w, x ∈ Lα} ⊆ Lβ ⇒ ¬[Lβ ⊂ Lα]],(d) for all β ∈ I , [β �ll α ∧ cnt(σ ) ⊆ Lβ ⇒ Lα ⊆ Lβ ],

• Then M(σ ) is the length-lexicographically least such α,• Else M(σ ) =?.

It is easy to verify that M is consistent, recursive and set driven. Also, if M(σ ) = α and cnt(τ ) ⊆ Lα , then M(σ � τ ) = α also(as the conditions (a)–(d) above will be satisfied for σ � τ also) and thus M is conservative.

Consider now any text T for a language L ∈ L and let α be its minimal index. If n is sufficiently large, then it followsfrom Angluin’s tell-tale condition that (i) cnt(T [n]) is a tell-tale set for L and (ii) for all β <ll α, either cnt(T [n]) � Lβ orLα ⊆ Lβ . So all large enough n satisfy M(T [n]) = α. It follows that M Ex-learns L. �

As the tell-tale cut-off word version of Angluin’s tell-tale condition is first-order definable, we have the following corol-lary.

Corollary 10. It is decidable whether an automatic family L = {Lα: α ∈ I} is Ex-learnable, where the input given to the decision-procedure are descriptions of Σ , Γ (where, Lα ⊆ Σ∗ and I ⊆ Γ ∗) and finite automata recognising the regular languages I and{(α, x): x ∈ Lα, α ∈ I}.

Remark 11. One can obtain similar characterisations for other fundamental notions of learning.(a) Recall that a class is finitely learnable [17] iff there is an Ex-learner which on every text T of a language in the class

outputs exactly one index (plus perhaps the symbol ?) and this index is correct. For automatic classes L, finite learnabilitycan be characterised as follows.

L is finitely learnable iff for every L ∈ L there is a finite set DL such that DL ⊆ L and DL � L′ for all L′ ∈ L − {L}.The implication (⇒) follows directly from the work of Mukouchi [29]. For (⇐), suppose the right-hand side holds. Let

{Lα: α ∈ I}, be an automatic indexing of L. Now the following learner M finitely learns L. On input σ , M(σ ) conjecturesthe length-lexicographically least α ∈ I such that

• cnt(σ ) ⊆ Lα and• for all β ∈ I , cnt(σ ) ⊆ Lβ implies Lα = Lβ .

If such an α does not exist then M(σ ) =?. It is easy to verify that M finitely learns L.(b) A class is strong monotonically learnable [22] iff there exists an Ex-learner M for the class such that for any two

subsequent hypotheses α,β of M on a text, with α �=? and β �=?, it holds that Lα ⊆ Lβ . Given an automatic class L, onecan again characterise whether L is strong monotonically learnable:

L is strong monotonically learnable iff for all L ∈ L, there exists a finite set DL such that DL ⊆ L and for all L′ ∈ L, ifDL ⊆ L′ then L ⊆ L′ .

Lange, Zeugmann and Kapur [28] showed the direction (⇒). For the direction (⇐), assume that the right-hand sideholds. Let {Lα: α ∈ I}, be an automatic indexing of L. Now the following learner M strong monotonically learns L. Oninput σ , M(σ ) conjectures the length-lexicographically least α such that, cnt(σ ) ⊆ Lα and for all β ∈ I , if cnt(σ ) ⊆ Lβ thencnt(σ ) ⊆ Lα ⊆ Lβ . In the case that there is no such α then M(σ ) =?. It is easy to verify that M strong monotonicallylearns L.

3. Automatic learning of automatic classes

It was shown above that all automatic classes that satisfy Angluin’s tell-tale condition, can be learnt using a recursivelearner. However, there are practical limitations to recursive learners. Learners that are able to memorise all past data arenot practical. Rather, most learners in the setting of artificial intelligence are iterative, in the sense that these learnersconjecture incrementally as they are fed the input, one word at a time [36,37]. An iterative learner bases its new conjectureonly on its previous conjecture and the new datum. In other words, such a learner does not remember its past data, exceptas coded in the hypothesis.

In the realm of automatic structures, it is natural to consider automatic learners, where the learning function is in someway automatic. In the case of general recursive learners, there does not seem to be any natural correspondence whichwould lead to an interesting model. However, for iterative learners, there is a natural corresponding definition for automatic


learners where the update function is automatic. Below we formally define automatic iterative learning and its variant,iterative learning with long-term memory.

Definition 12. (See Wexler and Culicover [36], Wiehagen [37].) Let the alphabets Σ , Γ and be given. Let L be a class(defined over alphabet Σ ) and {Hα: α ∈ J } be a hypothesis space with J ⊆ Γ ∗ . An iterative learner is any function

P : ( J ∪ {?}) × (Σ∗ ∪ {�}) → J ∪ {?}.

An iterative learner with long-term memory is any function

Q : (( J ∪ {?}) × ∗) × (Σ∗ ∪ {�}) → (

J ∪ {?}) × ∗,

where the strings in ∗ represent the memory of the learner.

Given an iterative learner P, we now write P(w0 w1 . . . wn) as a short hand for the expression P(. . . P(P(?, w0), w1),

. . . , wn). Similarly, for an iterative learner Q with long term memory, we write Q(w0 w1 . . . wn) as a short hand for theexpression Q(. . . Q(Q((?, ε), w0), w1), . . . , wn). Here, for Q(σ ) = (α,μ), we consider α as the conjecture and μ implicitlyas its memory and not as its output. With these modifications, P and Q are seen as learners and the definitions of all thelearning criteria carry over. Note that convergence of a learner Q is defined only with respect to the hypothesis and notthe memory. For example, Q Ex-learns L on a text T iff the sequence of hypotheses converges syntactically to a correctone while there are no convergence constraints on the memory. Similarly one defines the other learning criteria only withrespect to the sequence of hypotheses. Parts (b)–(d) of the following definition are based on [16,25].

Definition 13. Suppose L is defined over alphabet Σ , and {Hα: α ∈ J }, J ⊆ Γ ∗ , is a hypothesis space. Suppose P is aniterative learner and Q is an iterative learner with long-term memory over some alphabet .

(a) We say that P is automatic iff the relation

{(α, w, β): α,β ∈ J ∪ {?}, w ∈ Σ∗ ∪ {�} and P(α, w) = β

}

is automatic. We say that Q is automatic iff the relation

{(α,μ, w, β, ν): α,β ∈ J ∪ {?}, μ,ν ∈ ∗, w ∈ Σ∗ ∪ {�} and Q

((α,μ), w

) = (β,ν)}

is automatic.(b) We say that the long-term memory of Q is bounded by the longest datum seen so far iff there exists a constant c ∈ N such

that, for every σ ∈ (Σ∗ ∪ {�})∗ , if Q(σ ) = (α,μ), then max{|α|, |μ|} � max{|x|: x ∈ cnt(σ )} + c.(c) We say that the long-term memory of Q is bounded by the hypothesis size iff there exists a constant c ∈ N such that, for

every σ ∈ (Σ∗ ∪ {�})∗ , if Q(σ ) = (α,μ), then |μ| � |α| + c.(d) We say that the long-term memory of Q is bounded by a constant iff there exists a constant c ∈ N such that, for every

σ ∈ (Σ∗ ∪ {�})∗ , if Q(σ ) = (α,μ), then |μ| � c.3

Automatic iterative learners with long-term memory are called automatic learners from here on.

Definition 14. For the following, the hypothesis space is allowed to be any class comprising automatic family. Let LC be oneof Ex, FEx, BC and Part. We let

(a) AutoLC be the set of all classes of languages that are LC-learned by some automatic learner with arbitrary long-termmemory,

(b) AutoWordLC be the set of all classes of languages that are LC-learned by some automatic learner with long-termmemory that is bounded by the longest datum seen so far,

(c) AutoIndexLC be the set of all classes of languages that are LC-learned by some automatic learner with long-termmemory that is bounded by the hypothesis size,

(d) AutoConstLC be the set of all classes of languages that are LC-learned by some automatic learner with long-termmemory that is bounded by a constant and

(e) AutoItLC be the set of all classes of languages that are LC-learned by some automatic iterative learner.

3 Note that in subsequent work [12], a more restrictive version of constant memory was considered. Therein, the learner Q only memorises μ and formseach hypothesis as a function of μ and the current datum. To address the issue of a learner wishing to repeat its previous hypothesis (which is not stored),a slight modification of the learning definition is done: the learner is said to be successful iff eventually it conjectures a correct hypothesis α, and fromthen onwards always outputs either ? or α. This restrictive learnability notion is not implied by iterative learnability as the class of all finite subsets of {0}∗is iteratively learnable but not with constant memory in the just described setting.


We first show that automatic learners are not as powerful as general learners, even for learning automatic classes. Thefollowing proposition is useful:

Proposition 15. Suppose Q is an automatic iterative learner with long-term memory. Then, for some constant c, for all σ ∈ (Σ∗ ∪{�})∗ , if Q(σ ) = (α,μ), then max({|μ|, |α|}) � c ∗ |σ | + max{|w|: w ∈ cnt(σ )}.

Proof. The proposition follows using the fact that, for some constant c, if Q ((α,μ′), x) = (α′,μ′′), then max{|μ′′|, |α′|} �max{|μ′|, |α|, |x|} + c. �

We will implicitly use the above proposition in several of our proofs.

Theorem 16. There exists an automatic L that is Ex learnable by some recursive iterative learner, but which is not AutoEx-learnable.

Proof. Any class of finite sets is easily seen to be learnable by a recursive iterative learner. However, the class L givenby the indexing Lα = {x: |x| = |α|, x �= α}, α ∈ {0,1}∗ , is an automatic class but not in AutoEx. To see this, suppose QAutoEx learns L. Then, for large enough m, there exist σ ,σ ′ such that (i) each of σ ,σ ′ is of length m and contains mdistinct strings from {0,1}m , (ii) cnt(σ ) �= cnt(σ ′) and (iii) Q(σ ) = Q(σ ′). Note that there exist such σ , σ ′ for large enoughm as there are

(2m

m

)possibilities for the sequences of length m (with distinct content) containing exactly m elements from

{0,1}m , but the size of the hypothesis and memory of Q after seeing such sequences can be of length at most cm, forsome constant c (see Proposition 15). Let y, y′ respectively be in cnt(σ ) − cnt(σ ′) and cnt(σ ′) − cnt(σ ). Let T be a text for{z: |z| = |y|, z �= y, z �= y′}. Then, Q on σ T and σ ′T converges to the same index or diverges on both. Thus, Q does notAutoEx learn L. �

We now consider the relationship between various long-term memory limitations for the main criteria of learning: Ex,BC and FEx. Interestingly, if the memory is not explicitly constrained, then every automatic class which is BC-learnable canbe Ex-learnt. For BC-learning, long-term memory is not useful (for automatic learners), as such memory can be coded intothe hypothesis itself, as long as one is allowed padding of the hypothesis.

Theorem 17. The following equivalences and containments hold.

(a) AutoBC = AutoWordBC = AutoIndexBC = AutoConstBC = AutoItBC.(b) AutoEx = AutoFEx = AutoBC.(c) AutoIndexFEx = AutoConstFEx = AutoItFEx.(d) AutoWordEx = AutoWordFEx.(e) AutoIndexEx = AutoIndexFEx.(f) AutoConstEx = AutoItEx.

Proof. For the simulations below, we assume without loss of generality that the simulated learner does not output ?.(a) It follows from the definitions that AutoItBC ⊆ AutoConstBC ⊆ AutoWordBC ⊆ AutoBC and AutoItBC ⊆

AutoIndexBC ⊆ AutoBC. Thus it suffices to show that AutoBC ⊆ AutoItBC. Suppose Q AutoBC-learns L, where the hy-pothesis space is {Hα: α ∈ I}, and the memory is over the alphabet . Let H ′

α,μ = Hα , for α ∈ I,μ ∈ ∗ . If Q((α,μ), x) =(α′,μ′), then let P((α,μ), x) = (α′,μ′). It can easily be verified that P AutoItBC-learns L using the hypothesis space{H ′

α,μ: α ∈ I, μ ∈ ∗}.(b) It suffices to show that AutoBC ⊆ AutoEx. Suppose Q AutoBC-learns L, where the hypothesis space is {Hα: α ∈ I},

I ⊆ Γ ∗ , and the memory is over the alphabet . Then consider the following Q′ . Q′ uses the same hypothesis space Hα ,but the memory is an element of Γ ∗ × ∗ .

Suppose Q((β,μ), x) = (β ′,μ′). Then Q′((α, (β,μ)), x) = (α′, (β ′,μ′)), where α′ is the length-lexicographically leastmember of I such that Lα′ = Lβ ′ . It is easy to verify that above Q′ AutoEx-learns L.

(c) It suffices to show AutoIndexFEx ⊆ AutoItFEx. Suppose that Q AutoIndexFEx-learns L. Then the construction of part(a) witnesses that P AutoItFEx-learns L, as the number of distinct (α,μ) which are output by Q on a given text for alanguage learnt by Q will be finite.

(d) The direction AutoWordEx ⊆ AutoWordFEx follows from the definition. For the converse direction, one can use thesame proof as under (b); but one has to note explicitly that the sizes of β and μ are always bounded by a constant plus thesize of the longest datum seen so far; as α is the length-lexicographically first index with Lα = Lβ , (hypothesis, memory) ofthe new learner Q′ given as (α, (β,μ)) satisfies the same length-bound.

(e) It suffices to show that AutoIndexFEx ⊆ AutoIndexEx. This can be proved similarly to part (b), except that instead ofsimply choosing the length-lexicographically least equivalent index, one additionally pads the index so that its length is atleast the length of the largest hypothesis output by Q so far. (This is to make sure that the memory length is bounded bythe size of the hypothesis plus a constant.)


(f) It suffices to show that AutoConstEx ⊆ AutoItEx. Suppose Q AutoConstEx-learns L using the hypothesis space{Hα: α ∈ I} and constant memory over alphabet . Without loss of generality assume that memory size is always 1.Define H ′

α,w,S = Hα , where w ∈ , S ⊆ × .Define Q′ , using the hypothesis space given by {H ′

α,w,S : α ∈ I, w ∈ , S ⊆ × } as follows. Suppose Q((α, w), x) =(β, y). If α �= β , then Q′((α, w, S), x) = (β, y,∅), else if α = β and (y, w) is in the reflexive and transitive closure of Sviewed as a relation, then Q′((α, w, S), x) = (α, w, S ∪ {(w, y)}), else Q′((α, w, S), x) = (α, y, S ∪ {(w, y)}).

Note that, for any σ , if Q′(σ ) = (α, w, S), then for all (y, y′) ∈ S , there exists an x ∈ cnt(σ )∪{�} such that Q((α, y), x) =(α, y′). Thus, if (y, w) is in the reflexive and transitive closure of S , then there exists a sequence τ , with cnt(τ ) ⊆ cnt(σ ),such that Q((α, y), τ ) = (α, w). In other words, for every σ , there is a σ ′ , which is obtained by replacing each symbol x inthe sequence σ by a sequence x � τx such that, if Q′(x0 � x1 � · · · � xn) = (α, w, S), then Q(x0 � τx0 � x1 � τx1 � · · · � xn � τxn ) =(α, w). Now fix a text T for L ∈ L. Suppose Q′(T [n]) = (αn, wn, Sn). Then, there exists an n0 and an index α with Hα = Lsuch that, for all n � n0, αn = α. This holds because, by the previous analysis, there exists a suitably modified text for Lon which Q converges to an index α for L. Further note that, if Q′((α, w, S), x) = (α, w ′, S ′), then S ⊆ S ′ and (w, w ′) is inthe reflexive and transitive closure of S ′ . It follows that limn→∞ Sn converges and limn→∞ wn converges, as all but finitelymany wn belong to the same equivalence class (with respect to the relation defined by S = limn→∞ Sn). It follows that Q′Ex-learns L. �

Note that the above theorem (along with its proof) also holds if we require class preserving learning in all the cases,that is, if the hypothesis space used by the learners is class preserving.

The next theorem shows that, for Ex and FEx learning, there are classes which can be learnt by automatic learnershaving long-term memory bounded by longest word size seen so far while they cannot be learnt by automatic learnershaving long-term memory bounded by hypothesis size. Note that AutoIndexEx = AutoIndexFEx, by Theorem 17.

The following theorem holds even when one considers class preserving hypothesis spaces. The diagonalisation in part (c)can be done by using the given indexing as the hypothesis space on the positive side, and any class comprising hypothesisspace on the negative side.

Theorem 18.

(a) AutoItEx ⊆ AutoWordEx ⊆ AutoEx.(b) AutoItEx ⊆ AutoIndexEx ⊆ AutoEx.(c) AutoWordEx � AutoIndexEx.

Proof. The statements (a) and (b) follow from the definitions.For statement (c), consider the class L = {Lα: α ∈ {0,1}∗} with Lε = 0+ and Lα = {0i+1: α(i) = 1}∪{ε} for all α ∈ {0,1}+ .To AutoWordEx learn L, one uses memory over the alphabet {0,1}∗ and memorises all strings in Lε seen so far. The

memory of the learner (on any input σ ) is a word z = z(0)z(1) . . . z(n) such that z(i) = 1 iff 0i+1 ∈ cnt(σ ), where n =max({i: 0i+1 ∈ cnt(σ )} ∪ {0}). Now the learner outputs index ε (with memory z as computed above) as long as it has notseen ε. Once it has seen ε, it outputs z as its conjecture and has z also as its memory. It is easy to verify that the abovelearner witnesses that L ∈ AutoWordEx.

On the other hand, suppose by way of contradiction that Q AutoIndexEx-learns L. Then, let σ be such that (i) cnt(σ ) ⊆Lε and (ii) for all σ ′ ⊇ σ such that cnt(σ ′) ⊆ Lε , if Q(σ ′) = (α,μ) and Q(σ ) = (α′,μ′), then α = α′ . Such a σ is calledthe locking sequence for Q on Lε . Note that there exists such a sequence σ , as Q Ex-learns L. Now there exist τ , τ ′ withcnt(τ ) ∪ cnt(τ ′) ⊆ Lε such that cnt(στ ) �= cnt(στ ′), and Q(στ ) = Q(στ ′). The existence of such τ and τ ′ follows from thefact that the memory of Q(στ ) has only finitely many possibilities, even though cnt(στ ) takes infinitely many possibilities.

Let T1 = στ � ε∞ and T2 = στ ′ � ε∞ . It follows that Q would fail to AutoIndexEx-learn at least one of cnt(T1) andcnt(T2) respectively from the texts T1 and T2. �

Note that the class L used in Theorem 18(c) is also not iteratively learnable by a recursive learner. Essentially the sameproof as used above shows this. The following lists some of the open problems for automatic learners.

Open problem 19. The following problems are currently open:

(a) Is AutoEx = AutoWordEx?(b) Is AutoIndexEx ⊆ AutoWordEx?(c) Is AutoIndexEx ⊆ AutoItEx?

If the alphabet is unary, then every AutoEx-learner can be replaced by an AutoWordEx-learner which answers (a) and(b) above in the affirmative for this special case. Also, note that the separation in Theorem 18(c) is witnessed by a family oflanguages defined over unary alphabet.

Theorem 20. Suppose that Σ = {0} and L ⊆ powerset(Σ∗) is an automatic class. Then L is in AutoWordEx as witnessed by aconservative, consistent and set-driven learner iff L satisfies Angluin’s tell-tale condition.


Proof. (⇐) This follows from Fact 8.(⇒) This proof is similar to the proof of Theorem 9. Suppose L is {Lα: α ∈ I}, where I is the set of indices. The learner

codes into memory, using alphabet {0,1}, all the strings seen so far. The memory of the learner after having seen input σis a word z = z(0)z(1) . . . z(n) such that z(i) = 1 iff 0i ∈ cnt(σ ), where n = max({|w| + 1: w ∈ cnt(σ )} ∪ {0}). Then, on anyinput σ , the learner searches for an α such that, for some w ∈ Σ∗ ,

(a) cnt(σ ) ⊆ Lα ,(b) {x: x �ll w, x ∈ Lα} ⊆ cnt(σ ),(c) for all β ∈ I , [{x: x �ll w, x ∈ Lα} ⊆ Lβ ⇒ ¬[Lβ ⊂ Lα]],(d) for all β ∈ I , [β �ll α ∧ cnt(σ ) ⊆ Lβ ⇒ Lα ⊆ Lβ ].

The learner then outputs length-lexicographically least such α, if any; otherwise, the learner outputs ?. Note that the abovelearner is automatic, as cnt(σ ) can be obtained using the memory and the new input element. Furthermore, the size of αas above is bounded by the size of the largest element in σ plus a constant: the reason is that the memory is not longerthan the longest word seen so far and that the hypothesis is computed by an automatic function from the memory and thecurrent datum. Now, similarly to the proof of Theorem 9, it can be shown that the above learner AutoWordEx-learns L.The theorem follows. �

Hence, for language classes over a unary alphabet, AutoWordEx and AutoEx coincide and properly contain AutoIndexEx.

Remark 21. If one were to consider not an automatic class, but just a subclass L of an automatic class K, then one couldsolve some of the open problems mentioned above.

For example, there is a class L ⊆ powerset({0}∗) which is a subclass of an automatic class and which has an automaticbut neither a conservative nor a set-driven learner. Furthermore, there is no learnable automatic class H with L ⊆ H. Also,no automatic learner of L can be an AutoWordEx-learner.

Here is a proof-sketch of this fact. Let k(0),k(1), . . . be a recursive one–one enumeration of K , the halting problem. Theclass consists of all sets Ln = {0m: m � n} for all n and all sets Ln,r = {0m: n � m � n + r} for which there exists a numbers > r with k(s) = n. Note that the set Ln,r is added to the class iff n ∈ K − {k(0),k(1), . . . ,k(r)}.

It can be shown that L is automatically learnable using an automatic class comprising hypothesis space, given by Hn,0 =Ln and Hn,r+1 = Ln,r . It can also be shown that the class is neither conservatively nor set-driven learnable nor AutoWordEx-learnable.

Furthermore, assume by way of contradiction that a learnable automatic class H ⊇ L exists. Then no infinite set in His the ascending union of finite sets in H, see [17]. Hence there exists, for every n, a number h(n) � n such that {0m: n �m � h(n)} /∈ H. As 0n �→ 0h(n) is first-order definable from H, h is recursive and n ∈ K iff n ∈ {k(0),k(1), . . . ,k(h(n) − n)},a contradiction.

4. Consistent learning

Note that for general recursive learners, all learnable automatic classes have a consistent, conservative and set-drivenrecursive learner (see Theorem 9 above). Thus, on one hand, consistency, conservativeness and set-drivenness are not restric-tive for learning automatic classes by recursive learners. On the other hand, in this section, we will show that consistencyis a restriction when learning automatic classes by automatic learners. It will be interesting to explore similar questions forconservativeness and set-drivenness.

The following theorem gives an automatic class which can be Ex-learnt by an iterative automatic learner but whichcannot be Ex-learnt by any consistent automatic learner.

Theorem 22. There exists an automatic L such that

(a) L is AutoItEx learnable using a class preserving hypothesis space;(b) L is not consistently AutoEx learnable even using a class comprising hypothesis space.

Proof. Let Σ = {0,1,2}. Let L = {L y: y ∈ {0,1}∗ ∪ {2}} where

• Lε = {0,1}∗ ,• L y = {2|y|} ∪ {x ∈ {0,1}∗: y is not a prefix of x}, for all y ∈ {0,1}+ ,• L2 = {0,1,2}∗ .

We first show that L can be AutoItEx-learnt. We use the following hypothesis space:

• Hε,ε = Lε ,• H2,2 = {0,1,2}∗ ,


• for y, z ∈ {0,1}+ with |y| = |z| and y �ll z, H y,z = {0,1,2}∗ and• for y ∈ {0,1}+ , H y,2 = L y .

Thus, the hypothesis space used is {Hα: α ∈ J }, where J = {(ε, ε), (2,2)} ∪ {(y,2), (y, z): y, z ∈ {0,1}+, y �ll z, |y| = |z|}.Below, let succ(w) denote succ{0,1}∗ (w), the length-lexicographic least string w ′ in {0,1}∗ such that w <ll w ′ . We nowdefine the iterative learner P. If the learner ever sees the input 02 then it outputs (2,2) and never changes its mindthereafter. Besides the above case, the learner starts with the conjecture (ε, ε). If it ever sees 2i , for some i > 0, in the input,then it continues with conjectures of the form (y, z), where |y| = |z| = i and initially y = z = 0i . Intuitively, a conjectureof the form (y, z) (with |y| = |z| > 0) means that the learner has seen extensions (in {0,1}∗) for all y′ �ll z, with y′ �= yand |y′| = i. If the learner later sees an extension of y, then it updates both y, z to succ(z). If the learner sees an extensionof succ(z), then it will update z to succ(z). This continues, until the learner has seen extensions of all strings of lengthi, except for the one currently denoted by y. At this point, the learner can conclude that the input language must be L y

(unless it sees 02 in the input). Formally,

• P(λ) = (ε, ε).• P((ε, ε),02) = P((y,2),02) = P((y, z),02) = P((2,2), w) = (2,2), for all w ∈ {0,1,2}∗ , y, z ∈ {0,1}+ , |y| = |z| and

y �ll z.• P((ε, ε), w) = (ε, ε), if w /∈ {2i: i > 0} ∪ {02}.• For i > 0, P((ε, ε),2i) = (0i,0i).• For y, z ∈ {0,1}+ , |y| = |z|, P((y, z), w) = (succ(z), succ(z)), if w ∈ {0,1}∗ and w is an extension of y and succ(z) is not

the length-lexicographically maximal string of length |z|.• For y, z ∈ {0,1}+ , |y| = |z|, P((y, z), w) = (succ(z),2), if w ∈ {0,1}∗ and w is an extension of y and succ(z) is the

length-lexicographically maximal string of length |z|.• For y, z ∈ {0,1}+ , |y| = |z|, P((y, z), w) = (y, succ(z)), if w ∈ {0,1}∗ and w is an extension of succ(z) and succ(z) is not

the length-lexicographically maximal string of length |z|.• For y, z ∈ {0,1}+ , |y| = |z|, P((y, z), w) = (y,2), if w ∈ {0,1}∗ and w is an extension of succ(z) and succ(z) is the

length-lexicographically maximal string of length |z|.• For y, z ∈ {0,1}+ , |y| = |z|, P((y, z), w) = (y, z), if w �= 02 and (w /∈ {0,1}∗ or w is not an extension of either y or

succ(z)).• P((y,2), w) = (y,2), for w �= 02.

It is easy to verify that the above P AutoItEx-learns L.To see that L is not consistently AutoEx-learnable, suppose by way of contradiction otherwise as witnessed by Q using

the hypothesis space {Hα: α ∈ J }.Consider a locking sequence σ (conjecture wise) for Q on Lε — that is, for some α such that Hα = Lε: σ ∈ seq(Lε),

Q(σ ) = (α,μ) and, for all τ ∈ seq(Lε), Q(σ � τ ) = (α′,μ′) implies α = α′ . Let τ ′, τ ′′ and m be such that

(i) m is greater than the length of any string in cnt(σ ),(ii) each of τ ′ and τ ′′ is of length m and contains m distinct strings from {0,1}m ,

(iii) cnt(τ ′) �= cnt(τ ′′) and(iv) Q(στ ′) = Q(στ ′′).

Note that there exist such τ ′ , τ ′′ for large enough m as there are(2m

m

)possibilities for the sequences of length m from

{0,1}m with distinct content, but only c2m+s possibilities for Q(στ ′′′′), for some constant c where τ ′′′′ is a sequence oflength m from {0,1}m and s = |σ | (see Proposition 15). Suppose y, y′ ∈ {0,1}m are such that y ∈ cnt(τ ′) − cnt(τ ′′) andy′ ∈ cnt(τ ′′)− cnt(τ ′). Let T be a text for L y . Then, Q(στ ′′T ) must converge to an index for L y . Let σ ′′′ be a prefix of T suchthat Q(στ ′′σ ′′′) = (α,μ) where Hα = L y . But, then Q is not consistent on the text στ ′σ ′′′T ′ , where T ′ is a text for L2. �

The following theorem gives an automatic class L which can be Ex-learnt by a consistent automatic learner or Ex-learntby an iterative automatic learner but which cannot be Ex-learnt by a consistent iterative automatic learner. Thus, requiringboth consistency and iterativeness is more restrictive than requiring only one of them.

Theorem 23. There exists an automatic L such that

(a) L is consistently AutoEx learnable using L as the hypothesis space;(b) L is AutoItEx learnable using L as the hypothesis space;(c) L is not consistently AutoItEx learnable.


Proof. Let Σ = {0,1,2}. Let L = {Lα: α ∈ {ε} ∪ {0,1}∗1} where Lε = {0,1}∗ and Lα = {2} ∪ {x: x �lex α0∞} for α ∈ {0,1}∗1.It is easy to verify that L is AutoItEx learnable, as one can initially output ε as the conjecture, and once 2 appears in

the input, search for lexicographically largest α ending in a 1 such that some extension of α is in the remaining text (suchextensions will appear infinitely often, for each α which has an extension in the input language).

To see that L is consistently AutoEx-learnable, note that one can memorise the lexicographically largest possible αending in a 1 which is a prefix of some input string. Thus, we could essentially use the above algorithm to consistentlyAutoEx-learn L.

To see that L cannot be consistently AutoItEx-learned, suppose by way of contradiction that P witnesses such learning.Let σ be a locking sequence for P on Lε . Then, let α be the lexicographically largest string ending in 1 which is a prefix

of some string in σ ; if there is no such string, then we take α to be 1. Then, P on text σ � T , where T is a text for Lα ,must converge to an index for Lα . Thus, P(στ ) is an index for Lα , for some τ ∈ seq(Lα). But, then P is not consistent onσ � α1 � τ , as α1 /∈ Lα . �Remark 24. Note that the class from Theorem 23 is also not set-driven iteratively learnable: given an iterative learner P, letσ be a locking sequence for Lε and α = P(σ �2). There is now a sequence τ of strings in Lε such that P(σ �2�τ ) �= P(σ �2).But from the locking sequence property of σ , it follows that P(σ � τ � 2) = α and P is not set-driven.

One can extend this result and also show that an AutoIndexEx-learner Q of this class cannot be set-driven. The long-term memory of such a learner after having seen σ is bounded by a constant plus the hypothesis size and there are onlyfinitely many different values which the long term memory can take after input of the form σ � τ , with τ being a sequenceof data from Lε . But there are infinitely many languages in L which contain cnt(σ � 2). Hence there are two sequences τ , τ ′over Lε such that Q(σ � 2 � τ ) and Q(σ � 2 � τ ′) output different conjectures while the long term memory after σ � τ andσ � τ ′ is the same. It follows that the hypotheses issued by Q(σ � τ � 2) and Q(σ � τ ′ � 2) are the same while those issuedby Q(σ � 2 � τ ) and Q(σ � 2 � τ ′) are different; hence Q is not set-driven.

The following theorem shows the existence of an automatic class which can be Ex-learnt by a consistent automaticiterative learner using a class comprising hypothesis space, but cannot be Ex-learnt by a consistent automatic learner usinga class preserving hypothesis space. Thus, in some cases having a larger hypothesis space makes the consistency problemeasier to handle. Similar phenomenon for monotonic learning (for recursive learners) has been observed by Lange andZeugmann [26].

Theorem 25. There exists an automatic class L such that

(a) L is AutoItEx-learnable using a class preserving hypothesis space;(b) L is consistently AutoItEx-learnable using some class comprising hypothesis space for L;(c) L is not consistently AutoEx-learnable using any class preserving hypothesis space for L.

Proof. Let Σ = {0,1,2}. Let L = {Lε} ∪ {L y: y ∈ {0,1}+} where

• Lε = {0,1}∗;• L y = {2|y|} ∪ {x ∈ {0,1}∗: y is not a prefix of x}, for all y ∈ {0,1}+ .

One can verify that the learner P given in the proof of Theorem 22 AutoItEx-learns L using a class comprising hypothesisspace. This learner is consistent on L.

To see AutoItEx learnability using a class preserving hypothesis space, one can use the learner P in the proof of Theo-rem 22, but for y, z ∈ {0,1}∗ , |y| = |z|, we define H2,2 and H y,z to be Lε instead of {0,1,2}∗ (in particular, we do not needto use H2,2).

To show that no learner using a class preserving hypothesis space can consistently AutoEx-learn L we proceed as follows.Suppose by way of contradiction that Q consistently AutoEx-learns L using a class preserving hypothesis space {Hα: α ∈ J }.

Consider the locking sequence σ (conjecture wise) for Q on Lε (that is, for some α such that Hα = Lε: σ ∈ seq(Lε),Q(σ ) = (α,μ) and, for all τ ∈ seq(Lε), Q(σ � τ ) = (α′,μ′) implies α = α′).

Let τ ′, τ ′′ and m be such that (i) m is greater than the length of any string in cnt(σ ), (ii) each of τ ′, τ ′′ is of length mand contains m distinct strings from {0,1}m , (iii) cnt(τ ′) �= cnt(τ ′′) and (iv) Q(στ ′) = Q(στ ′′). Note that there exist such τ ′ ,τ ′′ for large enough m as there are

(2m

m

)possibilities for the sequences of length m from {0,1}m with distinct content, but

only c2m+s possibilities for Q(στ ′′′′), for some constant c where τ ′′′′ is a sequence of length m from {0,1}m and s = |σ |(see Proposition 15). Suppose y, y′ ∈ {0,1}m are such that y ∈ cnt(τ ′) − cnt(τ ′′) and y′ ∈ cnt(τ ′′) − cnt(τ ′). Let τ be asequence which contains all elements of length m, except for y and y′ . Then, Q(στ ′τ � 2m) = Q(στ ′′τ � 2m), but Q cannotbe consistent on both στ ′τ � 2m and στ ′′τ � 2m . �


5. Automatic learning from fat text

One of the reasons why iterative learning and its variations are restrictive is because the learners forget past data. Soit is interesting to study the case when each datum appears infinitely often. Such a text is called fat text. In the case oflearning recursively enumerable sets, it has been shown that every explanatorily learnable class is also iteratively learnablefrom fat texts [31]. In the following, it is investigated to which extent this result transfers to automatic learners.

Definition 26. (See [31].) Let Σ be an alphabet. Let T ∈ (Σ∗ ∪ {�})ω . We say that T is fat iff for every x ∈ cnt(T ) and n ∈ N,there exists a k � n such that T (k) = x. For L ⊆ Σ∗ , we let ftxt(L) = {T ∈ txt(L): T is fat}.

Definition 27. Let Σ be an alphabet. Let {Hα: α ∈ J } be a hypothesis space with some J being the set of indices. Let P bean iterative learner. We say that P Ex-learns L from fat texts iff for every L ∈ L and every T ∈ ftxt(L), there exists an n ∈ Nand an α ∈ J with Hα = L such that, for every m � n, P(T [m]) = α. The other learning criteria considered in this paper aresimilarly adapted to fat texts.

Corollary 30 to the proof of the following theorem shows that fat texts allow one to iteratively automatically learn anyclass which is potentially learnable, that is, which satisfies Angluin’s tell-tale condition.

Theorem 28. Let L = {Lα: α ∈ I} be an automatic class. Then L is AutoWordEx-learnable from fat texts using the given hypothesisspace {Lα: α ∈ I} iff L satisfies Angluin’s tell-tale condition.

Proof. (⇐) This follows from Fact 8, as for learning by recursive learners without memory constraints, Ex-learnability fromfat texts is the same as Ex-learnability from normal texts.

(⇒) Let Σ be the alphabet used for L and I be the set of indices. We will assume below that α,β range over I . Withoutloss of generality we assume that if ∅ ∈ L, then ε ∈ I and Lε = ∅.

We will now construct the learner Q. We denote the conjecture/memory of the learner Q by (α, x, cons), where α is theconjecture, and (x, cons) is the memory. Here x ∈ Σ∗ and cons is just a consistency bit.

Suppose T is the input fat text for a language L ∈ L. Then we will have the following four invariants, whenever α, α′below are not ?:

(I) If Q(T [m]) = (α, x, cons), then Lα[x] ⊆ cnt(T [m]). Furthermore, for any m′ < m, if Q(T [m′]) = (α′, x′, cons′), then Lα[x] ⊆Lα′ [x′] ∪ {T (m′), T (m′ + 1), . . . , T (m − 1)}.

(II) If Q(T [m]) = (α, x, cons) and Q(T [m + m′]) = (α′, x′, cons′) then CFLα [x] �lex CFLα′ [x′] �lex CFL .(III) If Q(T [m]) = (α, x, cons), then either Lα[x] = Lα and no β <ll α satisfies Lα[x] = Lβ or Lα[x] /∈ L and no β <ll α satisfies

Lα[x] = Lβ [x].(IV) If cons = 0, then L � Lα .

If Lα is infinite, then let ttcow(α) denote the length-lexicographically least word w in Lα such that w is a tell-tale cut offword for Lα and for all β <ll α such that Lα �= Lβ , Lβ [w] �= Lα[w]. If Lα is finite, then let ttcow(α) denote max Lα . Notethat ttcow(α) is automatic. We now define our learner Q.

• If ∅ ∈ L, then Q(λ) = (ε, ε,1).• If ∅ /∈ L, then Q(λ) =?. In this case, Q continues to output ? until it receives an input y such that, for some α, y is the

length-lexicographically least element of Lα — at which point it outputs (α, y,1), for the length-lexicographically leastα such that Lα = {y}, if there is such an α; otherwise it outputs (α, y,1), for the length-lexicographically least α suchthat y is the length-lexicographically least element of Lα .

• Q((α, x, cons),�) = (α, x, cons).• To define Q((α, x, cons), y), for y �= �, use the first case below which applies.

– Case 1: If y �ll x and y ∈ Lα , then output (α, x, cons).– Case 2: If y �ll x and y /∈ Lα and there exists a β such that Lβ [y] = Lα[y] ∪ {y}, then

if there exists a β such that Lβ = Lα[y] ∪ {y},then output (β, y,1) for the length-lexicographically least such β ,else output (β, y,1) for the length-lexicographically least β with Lβ [y] = Lα[y] ∪ {y}.

– Case 3: If y >ll x and [y /∈ Lα or x < ttcow(α) or cons = 0] and there exists a β such that Lβ [y] = Lα[x] ∪ {y}, then

if there exists a β such that Lβ = Lα[x] ∪ {y},then output (β, y,1) for the length-lexicographically least such β ,else output (β, y,1) for the length-lexicographically least β with Lβ [y] = Lα[x] ∪ {y}.

– Case 4: Otherwise, let cons′ = (cons ∧ y ∈ Lα) and output (α, x, cons′).


Note that the size of new hypothesis β in Case 2 and Case 3 above, if any, is bounded by the size of y plus a constant.Furthermore, it is easy to verify that the four invariants are satisfied. Also, clearly if ∅ ∈ L, then Q learns ∅. Now, supposeT is a fat text for a language L = Lβ ∈ L, where β is length-lexicographically minimised and L �= ∅. Let (αn, xn, consn)

denote Q(T [n]). Note that by construction, except for an initial period where Q conjectures ∅ or ?, xn always belongs to L.Furthermore, ttcow(β) ∈ L.

Claim 29.

(a) For y ∈ L, if L[y] = Lαn [y] and y � xn, then for all n′ � n, L[y] = Lαn′ [y] and y � xn′ .(b) For y = min L, for all but finitely many n, L[y] = Lαn [y] and y � xn.(c) Suppose y < ttcow(β), and for all n � n0 , L[y] = Lαn [y] and y � xn. Then, there exists an n3 � n0 such that xn3 � succL(y).(d) For all y ∈ L with y � ttcow(β), for all but finitely many n, L[y] = Lαn [y] and y � xn.

Proof of Claim. (a) This follows from the invariants (I) and (II).(b) Let n be least such that T (n − 1) = min L. By using invariant (I) and either by the first non-? hypothesis of Q or by

the usage of Case 2 or 3 in the definition of Q when it receives T (n −1), we have that L[min L] = Lαn [min L] and min L � xn .Part (b) now follows from part (a).

(c) Suppose by way of contradiction that such an n3 does not exist. Then, for all n > n0, we have that xn = y, andαn = αn0 , as Case 2 would not apply and an application of Case 3 would make xn > y. Now, if y < ttcow(αn0 ), then for theleast n1 > n0 such that T (n1 − 1) = succL(y), we would have that xn1 is made to be succL(y) by Case 3. On the other hand,if y � ttcow(αn0 ), then L � Lαn0

, by Angluin’s tell-tale condition. Thus, for some n2 > n0, we have that cons = 0. It followsthat, for the least n3 > n2 such that T (n3 − 1) = succL(y), we would have that xn3 = succL(y), by Case 3.

(d) We show the statement by induction on length-lexicographic ordering of y ∈ L with y � ttcow(β). By part (b), thestatement holds for y = min L. Suppose, the statement holds for some y < ttcow(β), y ∈ L. Then, we show it for succL(y).Let n0 be such that, for all n � n0, L[y] = Lαn [y] and y � xn . By part (c), there exists an n3 such that xn3 � succL(y). Now,if succL(y) ∈ Lαn3

, then we are done by part (a), invariants (I), (II) and induction. Otherwise, for the least n4 > n3, such thatT (n4 − 1) = succL(y), we will have that xn4 = succL(y) and succL(y) ∈ Lαn4

, by Case 2 (all the intermediate steps betweenn3 and n4 will not reduce x to below succL(y), as succL(y) does not appear in between T (n3 − 1) and T (n4 − 1), and Case 2is the only case which can reduce xn). This proves the statement for succL(y) and completes the proof of the claim. �

It follows from part (d) of the claim that for some number n5, for all n � n5, CFL[ttcow(β)] �lex CFLαn [xn] �lex CFL . If αn = β ,for one such n, then the learner Q will not change its mind later, by the construction of Q. We show that such an n mustexist. So suppose αn5 �= β . This means that L[xn5 ] �= Lαn5

[xn5 ] (by invariant (III), definition of ttcow and the fact that Lαn5

cannot be equal to L[xn5 ], by Angluin’s tell-tale condition). Thus using invariants (I) and (III) we have that L − Lα[xn5 ]contains a length-lexicographically least element x such that ttcow(β) < x � xn5 . It follows (using part (a)) that, for the leastn′ > n5 such that T (n′ − 1) = x, Q(T [n′]) will be (β, xn′ , cons), by Case 2 and the definition of ttcow.

It follows that Q AutoWordEx-learns L. �Suppose instead of using the given hypothesis space {Lα: α ∈ I} one uses the hypothesis space {Hα,x,cons: α ∈ I, x ∈ Σ∗,

cons ∈ {0,1}}, where Hα,x,cons = Lα . Then the above learning algorithm Q becomes an iterative learner using this hypothesisspace. It uses conjectures (α, x, cons) instead of conjecture α and memory (x, cons). Note that the update rules guaranteethat the learner Q does not update its hypothesis if x � ttcow(α), and Lα is the set to be learnt. Hence the modified learnerusing the new hypothesis space does also converge syntactically on texts for sets to be learnt. This yields the followingcorollary.

Corollary 30. Every automatic class satisfying Angluin’s tell-tale condition is AutoItEx-learnable from fat texts using a class preservinghypothesis space.

The next result shows that one cannot learn every given class iteratively from fat texts using a one–one class preservinghypothesis space. So “padding”, that is, the usage of the hypothesis as an auxiliary memory, is necessary for iterative learningfrom fat texts in the above theorem. Furthermore, the following also shows constraints of iterative conservative automaticlearners.

Theorem 31. Let Σ = {0,1} and for every n ∈ Z and m ∈ {0,1}, let g(m,n) = 4n +2m if n � 0 and g(m,n) = −3−4n +2m if n < 0.Then the class L defined by the indexing

L0g(m,n) = {0g(i, j)1k: i ∈ {0,1} ∧ j ∈ Z ∧ k ∈ N ∧ (i = m ⇒ j � n)

}

is automatic. This class is class preservingly AutoItEx-learnable from normal texts, class comprisingly conservatively AutoItEx learn-able from normal texts, but neither conservatively iteratively learnable from fat texts using a one–one class preserving hypothesis spacenor iteratively learnable from fat texts using a one–one class preserving hypothesis space.


Proof. Intuitively, g(0, ·) and g(1, ·) are 1–1 computable functions such that {g(0,n): n ∈ Z} and {g(1,n): n ∈ Z} partitionthe set of natural numbers. Furthermore, from 0g(i, j) and 0g(i′, j′) one can automatically determine whether i = 0 or i = 1,whether i′ = 0 or i′ = 1, whether j < j′ and whether j′ < j (this later property is needed for P below to be automatic).

For conservative AutoItEx-learning using a class comprising hypothesis space, the hypothesis space used is:

• Hh(0,n0,n1) = L0g(0,n0) , for n0,n1 ∈ Z;• Hh(1,n0,n1) = L0g(1,n1) , for n0,n1 ∈ Z;• Hh(ε,n0,n1) = ∅,

where, for n0,n1 ∈ {ε} ∪ Z, h(a,b, c) is the convolution of a, b′ and c′ with b′ = 0g(0,b) if b �= ε, b′ = 1 if b = ε; c′ = 0g(0,c)

if c �= ε and c′ = 1 if c = ε. The learner P initially conjectures h(ε, ε, ε). We mention below the cases when P modifies itsconjecture. In all other cases, the conjectures are not modified. Intuitively, conjectures of the form h(·, j, ·), (respectivelyh(·, ·, j)) imply that a string of the form 0g(0, j)1k (respectively, a string of the form 0g(1, j)1k) has been seen in the input.

• P(h(ε, ε, ε),0g(0, j)1k) = h(ε, j, ε);• P(h(ε, ε, ε),0g(1, j)1k) = h(ε, ε, j);• P(h(ε, j, ε),0g(1, j′)1k) = h(0, j, j′);• P(h(ε, ε, j),0g(0, j′)1k) = h(0, j′, j);• P(h(0, j, j′),0g(0,r)1k) = h(1, r, j′), if r > j;• P(h(1, j, j′),0g(1,r)1k) = h(0, j, r), if r > j′ .

One can verify that P conservatively AutoItEx-learns L.For class preserving AutoItEx-learning, one just modifies the above hypothesis space to have Hh(ε,n0,n1) = L0g(0,0) and the

rest of the proof remains the same. Note that the learner is no longer conservative.To show that L is not conservatively learnable from fat texts using a one–one class preserving hypothesis space nor

iteratively learnable from fat texts using a one–one class preserving hypothesis space, note the following: an iterative learnerthat uses a one–one class preserving hypothesis space is conservative. So it suffices to show that no conservative learnerthat uses a one–one class preserving hypothesis space iteratively learns L from fat texts.

Let F be any conservative iterative learner that uses a one–one class preserving hypothesis space {Hα: α ∈ J }. Letx be such that F(?, x) �=?. Without loss of generality assume that F(?, x) conjectures a language of the form L0g(0,n) , forsome n. (Case of the conjecture being of the form L0g(1,·) is symmetric.) If x ∈ L0g(0,n−1) , then F has overgeneralised andthus does not conservatively learn L. Otherwise, if there is no σ (where cnt(σ ) ⊆ 0∗1∗) such that F(x � σ) conjecturesa language of the form L0g(1,·) , then we have that F does not learn L. Otherwise, let y1, y2, . . . , yk ∈ 0∗1∗ be such thatHF(x�y1�y2�···�yk) = L0g(1,n′) , for some n′ , where HF(x�y1�y2�···�yr) = L0g(0,sr ) , for r < k and some sr , where sr ’s are distinct anddifferent from n. Then, we have that y1, y2, . . . , yk must be of the form 0g(0,·)1∗ , since the learner is conservative and usesa one–one class preserving hypothesis space. Also, note that x is of the form 0g(0,·)1∗ as x /∈ L0g(0,n−1) . Thus x, y1, . . . , ykbelong to L0g(1,n′−1) and thus F overgeneralises and cannot conservatively learn L. �Theorem 32. Suppose an automatic iterative learner Ex-identifies L using a class preserving hypothesis space. Then, there is anautomatic, conservative and iterative learner M′ which identifies L from fat texts.

Proof. Suppose M is an automatic iterative learner which Ex-identifies {Lα: α ∈ I} from fat texts using the hypothesis space{Lα: α ∈ I}. Without loss of generality assume that the initial conjecture of M is ?.

Let S = ⋂α∈I Lα . For α ∈ I ∪ {?}, let mc(α) = 1, if α =? or there exists an x ∈ Lα such that M(α, x) �= α; otherwise, let

mc(α) = 0.If S ∈ L, then let e0 = 0 and He0 = S; otherwise, let e0 =?.For α ∈ I ∪ {?} and w ∈ Σ∗ − S , let Hα,w = Lα , if mc(α) = 0; otherwise, let Hα,w = Lα′ , where α′ is length-

lexicographically least such that w /∈ Lα′ .Define M′ as follows, where M′ uses hypothesis space {He: e ∈ J }, where J = {(α, w): α ∈ I ∪ {?}, w ∈ Σ∗ − S} ∪

{e0} − {?}.

Initially, M′(λ) = e0.M′(e0, x) = e0, for x ∈ S ∪ {#}.M′(e0, x) = (M(?, x), x), for x /∈ S ∪ {#}.M′((α, w), x) = (M(α, x), w).

Now, suppose T is the input fat text for a language L ∈ L. If L = S , then clearly, M′ identifies L. Otherwise, let n be leastsuch that T (n) /∈ S . Let T ′ be obtained from T by deleting T [n], that is T ′(m) = T (n + m). Let w = T ′(0) = T (n). Then, itis easy to see that T ′ is still a fat text for L. Furthermore, for all m � 0, M′(T [n + m + 1]) = (M(T ′[m + 1]), w). Also, notethat M converges on T ′ to α such that mc(α) = 0 (otherwise, due to T ′ being a fat text, either M does not identify L or


makes a further mind change on T ′). Thus, M′ converges on T to (α, w) and Hα,w = Lα . Furthermore, M′ is conservative ason previous conjecture (α′, w) and input x, if M′((α′, w), x) �= (α′, w), then either mc(α′) = 1 (and thus w /∈ Hα′,w , but wbelongs to the input seen so far) or mc(α′) = 0 (and thus x /∈ Lα′ = Hα′,w ). �Remark 33. Suppose one uses the following modified definition of conservativeness: M is conservative if for any σ and x,if x ∈ HM(σ ) , then M(σ x) = M(σ ). Then the class used in Theorem 31 cannot be learnt by any conservative and iterativelearner from fat texts using a class preserving hypothesis space; the diagonalisation proof given for Theorem 31 works forthis case also.

One might ask whether there are classes which can be learnt using some one–one class preserving hypothesis space butcannot be learnt using some other hypothesis space. The answer is “no”. That is, if a class is AutoItEx-learnable using aone–one class preserving hypothesis space then it is also prescribed AutoItEx-learnable, that is, it can be learnt using anyclass comprising automatic indexing as hypothesis space. In the next result, the option “(from fat texts)” has to be takeneither at both places or at no place in the theorem.

Proposition 34. If {Lα: α ∈ I}, {Hβ : β ∈ J } are automatic indexings, the mapping α �→ Lα is one–one, every Lα is equal to someHβ and {Lα: α ∈ I} is AutoItEx-learnable (from fat texts) using the hypothesis space {Lα: α ∈ I}, then {Lα: α ∈ I} is also AutoItEx-learnable (from fat texts) using the hypothesis space {Hβ : β ∈ J }.

Proof. The proof of this proposition can be given by the straight-forward translation of the learner: Let f (α) = min{β: Hβ =Lα} and g be the (partial) inverse with g(β) = min{α: Lα = Hβ}. Furthermore, let f (?) =? and g(?) =?. The functions f , gare both first-order definable and hence automatic. Furthermore, g is defined on the range of f . Now one can replacethe learner Q using the hypothesis space {Lα: α ∈ I} by a new learner Q′ mapping a hypothesis β and an input x tof (Q(g(β), x)); note that, under the assumption that the initial value of the learner is ?, one can easily see by induction thatall hypotheses output by Q′ are in the range of f and hence in the domain of g . Thus Q′ is well defined on valid inputs forthe class being learnt. As automatic functions are closed under composition, the learner Q′ is automatic. Furthermore, Q′converges to f (α) whenever Q converges to α. Hence the learner Q′ is correct and uses the hypothesis space {Hβ : β ∈ J }.Note that the type of text used (normal text or fat text) is for both learners the same. �

The next theorem shows that every automatic class (even those that may not satisfy Angluin’s tell-tale condition) ispartially learnable from fat texts by an automatic iterative learner. This corresponds to the result by [31] that the wholeclass of all recursively enumerable languages is partially learnable by some recursive learner.

Theorem 35. Every automatic L is AutoWordPart-learnable from fat texts.

Proof. This is a modification of the proof of Theorem 28. In this case we do not need to keep track of cons and the memoryx may grow unbounded.

Let Σ be the alphabet used for L and I be the set of indices. We will assume below that α,β range over I . Withoutloss of generality we assume that if ∅ ∈ L, then ε ∈ I and Lε = ∅.

We now construct the learner Q. We will denote the conjecture/memory of the learner Q by (α, x), where α is theconjecture, and x is the memory. Here x ∈ Σ∗ .

Suppose T is the input fat text for a language L ∈ L. Then we will have the following invariants, whenever α and α′below are not ?:

(I) If Q(T [m]) = (α, x), then Lα[x] ⊆ cnt(T [m]). Furthermore, for any m′ < m, if Q(T [m′]) = (α′, x′), then Lα[x] ⊆ Lα′ [x′] ∪{T (m′), T (m′ + 1), . . . , T (m − 1)}.

(II) If Q(T [m]) = (α, x) and Q(T [m + m′]) = (α′, x′) then CFLα [x] �lex CFLα′ [x′] �lex CFL .

We now describe the learner Q.

• If ∅ ∈ L then Q(λ) = (ε, ε).• If ∅ /∈ L then Q(λ) =?. In this case, Q continues to output ? until it receives an input y such that, for some α, y =

min Lα — at which point it outputs (α, y), for the length-lexicographically least such α.• Q((α, x),�) = (α, x).• To define Q((α, x), y), for y �= �, use the first case below which applies.

– Case 1: If y >ll x and there exists a β such that Lα[x]∪{y} = Lβ [y], then output (β, y), for the length-lexicographicallyleast such β .

– Case 2: If y �ll x and y /∈ Lα , and there exists a β such that, Lβ [y] = Lα[y] ∪ {y}, then output (β, y), for the length-lexicographically least such β .

– Case 3: If there exists a β such that Lβ = Lα[x], then output (β, x), for length-lexicographically least such β .– Case 4: Otherwise output (α, x).


Fig. 1. Major results and open problems. Solid arrows denote inclusion. Arrows with a question mark denote that the inclusion is open. If an inclusion doesnot follow by using reflexive and transitive closure of any of these two types of arrows, then it does not hold.

Note that the size of new hypothesis β in Cases 1 to 3 above, if any, is bounded by the size of y plus a constant. Further-more, it is easy to verify that the invariants are satisfied. Clearly, if ∅ ∈ L, then Q learns ∅. So suppose L �= ∅ and L = Lβ ∈ L,where β is length-lexicographically minimised. Suppose T is a fat text for L. Let (αn, xn) denote Q(T [n]).

Claim 36.

(a) For all n, if L[y] = Lαn [y] and y � xn, then for all n′ � n, L[y] = Lαn′ [y] and y � xn′ .(b) For all y ∈ L, there exists an n such that L[y] = Lαn [y] and y � xn.

Proof of Claim. Part (a): This follows using invariants (I) and (II).Part (b): Clearly, y = min L satisfies part (b), as for the least n such that T (n − 1) = min L, we will have L[min L] =

Lαn [min L] and min L � xn (using invariant (I) and either by first hypothesis of Q or by the usage of Case 1 or 2 in definitionof Q when it receives T (n − 1)).

Now suppose part (b) holds for some y ∈ L. Then we show that it holds for succL(y). Let n0 be large enough such thatfor all n � n0, L[y] = Lαn [y] and xn �ll y. Let n′′ > n0 be such that T (n′′ − 1) = succL(y). Then, Lαn′′ [succL(y)] = L[succL(y)]and xn′′ �ll succL(y) (Cases 1 and 2 both will ensure this, if it is not already true). This completes the proof of the claim. �

It follows that CFLαn [xn] converges to CFL from below. If L is finite, then let n1 be such that CFLαn1= CFL and xn1 � max L.

Let n2 > n1 be such that T (n2 − 1) �= �. Then, by Case 3, it follows that, for all n � n2, αn = β . Thus, Q AutoItPart-learns allthe finite sets in L.

Now suppose L is infinite. As CFLαn [xn] converges to CFL from below, it follows that no α with Lα �= L would be out-put infinitely often. Furthermore, no index which is not length-lexicographically minimal index for some language, is everoutput. So it suffices to show that β is output infinitely often. Let x be large enough so that x ∈ L, L[x] �= Lα[x], for anyα <ll β . Now, using the claim above and as CFLαn [xn] converges to CFL from below, for large enough n, for all n′ � n,xn′ �ll x and CFL[x] = CFLαn′ [x] . Now consider any n′ � n. Note that either αn′ = β or L − Lαn′ �= ∅, as either Lαn′ is finite orL[x] = Lαn′ [x] ⊆ Lαn′ [xn′ ] ⊆ L and αn′ is the length-lexicographically least index α which satisfied Lα[xn′ ] = Lαn′ [xn′ ] (by thedefinition of Q). Thus, by Case 1, using invariant (I), for any n′′ > n′ such that T (n′′ − 1) is the length-lexicographically leastelement in L − Lαn′ [x], we have αn′′ = β , unless αn′′′ = β for some n′′′ with n′ � n′′′ � n′′ . The theorem follows. �6. Conclusion

The present work initiates the investigations of the learnability of automatic classes and also the notion of automaticlearners. Such learners are restrictive when they have to learn from all texts; only if they are fed with fat texts where eachdata item-occurs infinitely often they can explanatorily learn all automatic classes which satisfy Angluin’s tell-tale condition.Furthermore, partial automatic learners can infer all automatic families from fat text. Fig. 1 gives the most important inclu-sions found — note that all notions except for the two topmost ones learn from normal texts. Several implications linked tomemory are neither proven nor disproven. For example, is there a class which can be learnt by an automatic learner usinghypothesis sized memory which cannot be learnt by an iterative automatic learner? Furthermore, is restricting the memoryto the size of the longest word seen so far a real restriction in automatic learning? Besides these fundamental questions, wealso studied the amount of restrictions given by consistency and conservativeness. While in standard inductive inference,the undecidability of membership problem with respect to the hypotheses is the main reason for inconsistent learners beingmore powerful, in automatic learning, the main reason that inconsistent learners might be more powerful than consistent


ones are the implicit and explicit memory restrictions during the learning process which make it impossible to keep trackof all the data observed so far.

Acknowledgments

We would like to thank John Case, Henning Fernau, Pavel Semukhin, Trong Dao Le and Thomas Zeugmann for discussionsabout the subject of learning classes with automatic indexings. We thank Trong Dao Le for pointing out an error in an earlierversion of Theorem 31. We also thank the anonymous referees for several helpful comments.

References

[1] Dana Angluin, Inductive inference of formal languages from positive data, Inform. and Control 45 (1980) 117–135.[2] Dana Angluin, Finding patterns common to a set of strings, J. Comput. System Sci. 21 (1980) 46–62.[3] Dana Angluin, Inference of reversible languages, J. ACM 29 (1982) 741–765.[4] Dana Angluin, Learning regular sets from queries and counterexamples, Inform. and Comput. 75 (1987) 87–106.[5] Janis Barzdinš, Two theorems on the limiting synthesis of functions, in: Theory of Algorithms and Programs, vol. 1, 1974, pp. 82–88.[6] Janis Barzdinš, Inductive inference of automata, functions and programs, in: Twentieth International Congress of Mathematicians, 1974, pp. 455–460

(in Russian); English translation in: Amer. Math. Soc. Transl. Ser. 2, vol. 109, 1977, pp. 107–112.[7] Lenore Blum, Manuel Blum, Toward a mathematical theory of inductive inference, Inform. and Control 28 (1975) 125–155.[8] Achim Blumensath, Automatic structures, Diploma thesis, RWTH Aachen, 1999.[9] Achim Blumensath, Erich Grädel, Automatic structures, in: Fifteenth Annual IEEE Symposium on Logic in Computer Science (LICS), IEEE Computer

Society, 2000, pp. 51–62.[10] John Case, The power of vacillation in language learning, SIAM J. Comput. 28 (1999) 1941–1969.[11] John Case, Sanjay Jain, Trong Dao Le, Yuh Shin Ong, Pavel Semukhin, Frank Stephan, Automatic learning of subclasses of pattern languages, in: Fifth

International Conference on Language and Automata Theory and Applications (LATA), in: Lecture Notes in Comput. Sci., vol. 6638, Springer, 2011,pp. 192–203.

[12] John Case, Sanjay Jain, Yuh Shin Ong, Pavel Semukhin, Frank Stephan, Automatic learners with feedback queries, in: Models of Computation in Context,Seventh Conference on Computability in Europe (CiE), in: Lecture Notes in Comput. Sci., vol. 6735, Springer, 2011, pp. 31–40.

[13] François Denis, Aurélien Lemay, Alain Terlutte, Some classes of regular languages identifiable in the limit from positive data, in: Sixth InternationalColloquium on Grammatical Inference: Algorithms and Applications (ICGI), in: Lecture Notes in Comput. Sci., vol. 2484, Springer, 2002, pp. 63–76.

[14] Thomas Erlebach, Peter Rossmanith, Hans Stadtherr, Angelika Steger, Thomas Zeugmann, Learning one-variable pattern languages very efficiently onaverage, in parallel, and by asking queries, Theoret. Comput. Sci. 261 (2001) 119–156.

[15] Henning Fernau, Identification of function distinguishable languages, Theoret. Comput. Sci. 290 (2003) 1679–1711.[16] Rusins Freivalds, Efim Kinber, Carl Smith, On the impact of forgetting on learning machines, J. ACM 42 (1995) 1146–1168.[17] E. Mark Gold, Language identification in the limit, Inform. and Control 10 (1967) 447–474.[18] Tom Head, Satoshi Kobayashi, Takashi Yokomori, Locality, reversibility, and beyond: learning languages from positive data, in: Ninth International

Conference on Algorithmic Learning Theory (ALT), in: Lecture Notes in Artificial Intelligence, vol. 1501, Springer, 1998, pp. 191–204.[19] Bernard R. Hodgson, Théories décidables par automate fini, PhD thesis, University of Montréal, 1976.[20] Bernard R. Hodgson, On direct products of automaton decidable theories, Theoret. Comput. Sci. 19 (1982) 331–335.[21] Oscar H. Ibarra, Tao Jiang, Learning regular languages from counterexamples, in: First Annual Workshop on Computational Learning Theory (COLT),

Morgan Kaufmann Publishers, 1988, pp. 371–385.[22] Klaus P. Jantke, Monotonic and non-monotonic inductive inference, New Generation Computing 8 (1991) 349–360.[23] Michael Kearns, Leonard Pitt, A polynomial-time algorithm for learning k-variable pattern languages from examples, in: Second Annual Workshop on

Computational Learning Theory (COLT), Morgan Kaufmann Publishers, 1989, pp. 57–71.[24] Bakhadyr Khoussainov, Anil Nerode, Automatic presentations of structures, in: Logical and Computational Complexity (LCC), 1994, in: Lecture Notes in

Comput. Sci., vol. 960, Springer, 1995, pp. 367–392.[25] Efim Kinber, Frank Stephan, Language learning from texts: mind changes, limited memory and monotonicity, Inform. and Comput. 123 (1995) 224–241.[26] Steffen Lange, Thomas Zeugmann, Language learning in dependence on the space of hypotheses, in: Sixth Annual Conference on Computational Learn-

ing Theory (COLT), ACM Press, 1993, pp. 127–136.[27] Steffen Lange, Rolf Wiehagen, Polynomial time inference of arbitrary pattern languages, New Generation Computing 8 (1991) 361–370.[28] Steffen Lange, Thomas Zeugmann, Shyam Kapur, Characterizations of monotonic and dual monotonic language learning, Inform. and Comput. 120

(1995) 155–173.[29] Yasuhito Mukouchi, Characterization of finite identification, in: Third International Workshop on Analogical and Inductive Inference (AII), in: Lecture

Notes in Artificial Intelligence, vol. 642, Springer, 1992, pp. 260–267.[30] Daniel Osherson, Michael Stob, Scott Weinstein, Learning strategies, Inform. and Control 53 (1982) 32–51.[31] Daniel Osherson, Michael Stob, Scott Weinstein, Systems That Learn: An Introduction to Learning Theory for Cognitive and Computer Scientists, Brad-

ford — The MIT Press, Cambridge, MA, 1986.[32] Sasha Rubin, Automatic structures, PhD thesis, University of Auckland, 2004.[33] Sasha Rubin, Automata presenting structures: a survey of the finite string case, Bull. Symbolic Logic 14 (2008) 169–209.[34] Gisela Schäfer-Richter, Uber Eingabeabhangigkeit und Komplexität von Inferenzstrategien, PhD thesis, RWTH Aachen, 1984.[35] Takeshi Shinohara, Rich classes inferable from positive data: length-bounded elementary formal systems, Inform. and Comput. 108 (1994) 175–186.[36] Kenneth Wexler, Peter W. Culicover, Formal Principles of Language Acquisition, MIT Press, 1980.[37] Rolf Wiehagen, Limes–Erkennung rekursiver Funktionen durch spezielle Strategien, Elektronische Informationsverarbeitung und Kybernetik (Journal of

Information Processing and Cybernetics) 12 (1976) 93–99.

Date post:	28-Nov-2016
Category:	Documents
Upload:	sanjay-jain
View:	215 times
Download:	1 times

Learnability of automatic classes

Documents