The Subset Principle: Consequences and Conspiracies Mayfest 2006, University of Maryland

1

The Subset Principle: Consequences and Conspiracies

Mayfest 2006, University of Maryland

William Gregory SakasCity University of New York (CUNY)

2

Joint work with

Janet Dean Fodor and Arthur Hoskey

CUNY-CoLAG

CUNY Computational Language Acquisition Group

http://www.colag.cs.hunter.cuny.edu

Fodor J.D.& Sakas, W.G. (2005) The Subset Principle in Syntax: Costs of Compliance Journal of Linguistics 41, 513-569

http://www.colag.cs.hunter.cuny.edu/

3

Overview for today:

1) Brief introduction to Gold-style learnability

2) The Subset Principle and Conspiracies

3) Innate enumeration: A possible way to thwart the conspiracies

4

Syntax acquisition can be viewed as a state space search

— nodes represent grammars including a start state and a target state.

— arcs represent a possible change from one hypothesized grammar to another.

G0

G3

G5

G2

G6

G5G4

Gtarg

5

Gold’s grammar enumeration learner (1967)

G1 G2 G3 GtargG0

sL(G0)

sL(G1)sL(G0) sL(G3)

sL(G1) sL(G2) sL(G3) sL(Gtarg)

sL(G2)

where s is a sentence drawn from the input sample being encountered by the learner

6

• the learner is error-driven

• error-driven learners converge on the target in the limit (no oracle required)

• no limit on how far along the enumeration the learner can go on a single sentence

• in Gold’s original work, s was a set of accumulated sentences

• subsets precede supersets in the enumeration

Gold’s grammar enumeration learner (con’t)

G1 G2 G3 GtargG0

sL(G0)

sL(G1)sL(G0) sL(G3)

sL(G1) sL(G2) sL(G3) sL(Gtarg)

sL(G2)

7

Learnability - Under what conditions is learning possible?

Feasibility - Is acquisition possible within a reasonable amount of time and/or with a reasonable

amount of work?

A domain of grammars , is learnable iff a learner such that

G , texts (fair) generable by G, the learner converges on G.

Definitions

8

An early learnability result (Gold, 1967)

Exposed to input strings of an arbitrary target language Ltarg = L(G) where G , it is impossible to guarantee that a learner can converge on G if is any class in the Chomsky hierarchy.

9

Angluin’s Theorem (1980)

A class of grammars is learnable iff for every languageLi = L(Gi), Gi there exists a finite subset D such thatno other language L(G), G includes D and is included in Li.

if this language can be generated bya grammar in ,is not learnable!

D

L(Gi)

L(G)

10

Relevant assumptions (for today) about the Learning Mechanism (LM):

Incremental – no memory for either past hypotheses or past input sentences

receives only ‘positive’ inputs from the linguistic environment (the target language)

error-driven

11

The Subset Principle (SP)

Avoid an Overgeneralization Hazard:

LM adopts a language hypothesis that is overly general (OverGen–L) from which there is no retreat

OverGen-L = a language containing all the sentences of the target and then some; i.e. a superset of the target language.

A learning problem noticed early on by Gold (1967)

12

Targ-L

OverGen-L

"Walked."

"She walked."

"She eated."

"Eated."

"Walked she."

Safe-L

13

An interpretation adopted by many linguists, put forth by Berwick (1985), Manzini and Wexler (1987) (who coined) Subset Principle (SP):

LM must never hypothesize a language which is a proper superset of another language that is equally compatible with the available data

Given a choice, don’t choose the superset hypothesis!

Note that simply avoiding a superset of the target is an impossible strategy since LM has no knowledge of what the target is! Faced with a choice, LM must always pick the subset.

14

Can SP be obeyed under our Incremental assumption?

Well it depends on how one interprets:

“available data”

Many interesting problems come to light when “standard” psycholinguistically attractive memory constraints (e.g. Incrementality) are applied to LM working under SP.

15

A Safe Incemental SP Interpretation (SP-d’ in the paper.):

Inc-SP: When LM’s current language is incompatible with a new input sentence s, LM should hypothesize a UG-compatible language which is a smallest language containing s.

A smallest language is one that contains s and has no proper subset that also contains s

Notes:

1) There may be more than one smallest language containing s. If so, any are safe for LM to choose.

2) Our use of “smallest” does not indicate any relation of set size (cardinality). A smallest language might be the “largest” language in the domain!

16

A Safe Incremental definition of SP (con’t)

Inc-SP safely protects from an OverGen-L, but

problematic “retrenchment”:

Previous facts that were correctly learned may have to be abandoned if the input does not exhibit them; each and every sentence is essentially the first sentence ever encountered (Janet’s focus at a talk here last semester).

17

Learnability under Inc-SP

A domain is not learnable unless for all potential targets, targ-L, targ-L contains a subset-free-trigger

Subset-free-trigger (sft) – a sentence, t, that is an element of targ-L but is not in any subsets of targ-L (i.e., targ-L is the smallest language containing t)

sft’s are not necessarily an unambiguous trigger so a single encounter does not instantly identify targ-L

(Note that sft’s function similarly to an Angluin telltale set of cardinality 1.)

18

Even a finite domain of languages might be unlearnable if every potential target does not contain an sft

Li

Targ-L = Li Lj

Lj

Targ-L does not have a subset-free-trigger, since by construction, Li and Lj cover all sentences of Targ-L and both are smaller languages than Targ-L.

Inc-SP forces LM to chronically oscillate between the two smaller languages, never reaching the target language.

19

Also, though necessary, sft’s are not sufficient to ensure acquisition

Li Lj

Targ-LAll the sentences of Targ-L are sft’s, but all three languages stand on equal footing in terms of Inc-SP – it appears to Inc-SP that there are no supersets in the domain so Inc-SP has nothing to say.

If LM utilizes a metric which evaluates that Li and Lj are “closer” to each other than either is to Targ-L, LM is forced to chronically oscillate between the them, never reaching the target language.

20

Summary so far:

Inc-SP is a safe interpretation of an Incremental version of SP

It is necessary for a learner that obeys Inc-SP to make use of subset-free-triggers to assure convergence on the target

21

Problems with Inc-SP and sft’s

1) Previous facts are unlearned– no batch processing allowed, no

gradualism, when an sft triggers the target, it might be directly from a very different language

2) sft’s are not sufficient to attain targ-L- since some superset-conspiracies are effectively hidden from Inc-SP

3) linguistically-meaningful domains may lack sfts

22

SP is basically inconsistent with incremental learning

Solution: Give LM Memory for past hypotheses

One way might be borrow an important notion from the formal learning community and try and give it psycho-computational reality – give LM:

an “innate” enumeration (listing) that posits subset languages in the domain before their

supersets

An enumeration offers a solution to all three problems on previous slide. How?

23

Identification by Enumeration Learners

- never taken up by the (psycho)linguistics community.- Why not? The notion of innateness is certainly not new!

For Gold and many other folk since, LM is endowed not only with an enumeration but the ability to “skip ahead” in the enumeration

to the next language hypothesis compatible with X

(X = the data encountered; could be 1, or more sentences)

- Motivation for this was to determine what a powerful learner can’t learn to set a strong learnability bound. Very reasonable!

-But from the viewpoint of developmental or psychocomputational modeling, we need to explain how LM can exploit the enumeration.

24

A psychocomputational parser/learner that could, plausibly, exploit an enumeration.

- What would be required of an Enumeration-LM - possible strategy:

- get a sentence - if consistent with current hypothesized language, do nothing since error-driven - otherwise, try the next hypothesis in the enumeration. - if still inconsistent, try the next - if still inconsistent, try the next etc. This works – but excessive computation – possibly thousands/millions/billions of grammars to check (See Pinker 1979).

25


- Another strategy - if a sentence is inconsistent with the current hypothesized language, - generate all parses, - order the parses by the grammars that license them - “jump” to the grammar that first appears in the enumerationPyschologically infeasible:

– thousands, even millions, of parses per (ambiguous) sentence.

26

But we have a model: The Structural Triggers Learner (Fodor, 1998; Sakas & Fodor, 2001)

That can fold all the grammars into one supergrammar – and can decode the generating grammar serially from each parse tree

27

ParserSentence

Sentencestructure

Current grammar

Parameter value treelets

+ UG principles

STL Serial Algorithm

At a choice point, go to the pool of parameter treelets, take what is needed (perhaps several possibilities) and fold the chosen treelets into the current grammar.

28


- Another strategy, which might work out: - parse serially with the supergrammar - if no choice point (unambiguous trigger) jump to the target. Stop. - otherwise, complete the parse with the

parameter value treelets from the supergrammar

- jump to the earliest grammar in the enumeration that is indicated by the parse.

This is a “resource-palatable” model, but danger of overshooting in the enumeration into a superset of the target because parse does not yield perfect information :-(

29


One hope is that parsing heuristics or strategies could be ranked in terms of subset-superset relations.

parse serially with the supergrammar - if no choice point (unambiguous trigger) jump to the target. Stop. - otherwise, at each choice point, pick the choice that yields will eventually yields a subset grammar rather than a superset. - jump to the earliest grammar in the enumeration that is indicated by the (now full) parse.

30

Summary:

1) Gold-style learnability offers bounds on what can and can’t be learned; introduces the fatal consequences of overgeneralization.

2) The inc-SP is (as far as we know) the only existing safe definition of SP under incrementality assumptions. But needs subset-free-triggers.

3) Innate enumeration together with a decoding learner can possibly be used to implement a psychologically plausible model of SP, it will help LM avoid learning failures and help to stave off excessive retrenchment.

4) But still not “the” answer. Suggestions welcome!

31

Thank you.

Date post:	01-Feb-2016
Category:	Documents
Upload:	misu
View:	48 times
Download:	0 times

The Subset Principle: Consequences and Conspiracies Mayfest 2006, University of Maryland

Documents