Download - (Preliminary) Experiments in Morphological Evolution Richard Sproat University of Illinois at Urbana-Champaign [email protected] 3rd Workshop on "Quantitative.

(Preliminary)

Experiments in Morphological Evolution

Richard SproatUniversity of Illinois at Urbana-Champaign

[email protected]

3rd Workshop on "Quantitative Investigations in Theoretical Linguistics" (QITL-3)Helsinki, 2-4 June 2008

mailto:[email protected]

Overview

• The explananda

• Previous work on evolutionary modeling

• Computational models and preliminary

experiments

Phenomena

• How do paradigms arise?– Why do words fall into different inflectional

“equivalence classes”

• Why do stem alternations arise?

• Why is there syncretism?– Why are there “rules of referral”?

Stem alternations in Sanskrit

guna zero

Examples from: Stump, Gregory (2001) Inflectional Morphology:A Theory of Paradigm Structure. Cambridge University Press.

Stem alternations in Sanskrit

vrddhi lexeme-class particularlexeme-classparticular

morphomic (Aronoff, M. 1994. Morphology by Itself. MIT Press.)

Evolutionary Modeling (A tiny sample)

• Hare, M. and Elman, J. L. (1995) Learning and morphological change. Cognition, 56(1):61--98.

• Kirby, S. (1999) Function, Selection, and Innateness: The Emergence of Language Universals. Oxford

• Nettle, D. "Using Social Impact Theory to simulate language change". Lingua, 108(2-3):95--117, 1999.

• de Boer, B. (2001) The Origins of Vowel Systems. Oxford

• Niyogi, P. (2006) The Computational Nature of Language Learning and Evolution. Cambridge, MA: MIT Press.

Experiment 1: Rules of Referral

Rules of referral

• Stump, Gregory (1993) “On rules of referral”. Language. 69(3), 449-479– (After Zwicky, Arnold (1985) “How to describe inflection.” Berkeley

Linguistics Society. 11, 372-386.)

Latin declensions

Are rules of referral interesting?

• Are they useful for the learner?– Wouldn’t the learner have heard instances of

every paradigm?

• Are they historically interesting:– Does morphological theory need mechanisms

to explain why they occur?

Another example: Böğüstani nominal declension

NomAccGenDatLocInstAblIllat

sSg Du Pl sSg Du Pl sSg Du Pl

Böğüstani A language of Uzbekistan

ISO 639-3: bgsPopulation 15,500 (1998 Durieux).Comments Capsicum chinense and Coffea arabica farmers

Monte Carlo simulation(generating Böğüstani)

• Select a re-use bias B

• For each language:– Generate a set of vowels, consonants and

affix templates• a, i, u, e• n f r w B s x j D• V, C, CV, VC

– Decide on p paradigms (minimum 3), r rows (minimum 2), c columns (minimum 2)

Monte Carlo simulation

• For each paradigm in the language:– Iterate over (r, c):

• Let α be previous affix stored for r: with p = B retain α in L• Let β be previous affix stored for c: with p = B retain β in L• If either L is non-empty, set (r, c) to random choice from L • Otherwise generate a new affix for (r, c)

• Store (r, c)’s affix for r and c

• Note that P(new-affix) = (1-B)2

Sample language: bias = 0.04Consonants x n p w j B t r s S mVowels a i u eTemplates V, C, CV, VC

Sample language: bias = 0.04Consonants n f r w B s x j DVowels a i u eTemplates V, C, CV, VC

Sample language: bias = 0.04Consonants r p j d G DVowels a i u e o y OTemplates V, C, CV, VC, CVC, VCV, CVCV, VCVC

Sample language: bias = 0.04Consonants D k S n b s l t w j B g G dVowels a i u e Templates V, C, CV, VC

Results of Monte Carlo simulations(8000 runs, 5000 languages per run)

Interim conclusion

• Syncretism, including rules of referral, may arise as a chance byproduct of tendencies to reuse inflectional exponents --- and hence reduce the number of exponents needed in the system.

• Side question: is the amount of ambiguity among inflectional exponents statistically different from that among lexemes? (cf. Beard’s Lexeme-Morpheme-Base Morphology)– Probably not since inflectional exponents tend to be

shorter, so the chances of collisions are much higher

Experiment 2:Stabilizing Multiple Paradigms in

a Multiagent Network

Paradigm Reduction in Multi-agent Models with Scale-Free Networks

• Agents connected in scale-free network• Only connected agents communicate• Agents more likely to update forms from interlocutors

they “trust”• Each individual agent has pressure to simplify its

morphology by collapsing exponents:– Exponent collapse is picked to minimize an increase in paradigm

entropy– Paradigms may be simplified – removing distinctions and thus

reducing paradigm entropy– As the number of exponents decreases so does the pressure to

reduce– Agents analogize paradigms to other words

Scale-free networks

Scale-free networks

• Connection degrees follow the Yule-Simon distribution:

where for sufficiently large k:

i.e. reduces to Zipf’s law (cf. Baayen, Harald (2000) Word Frequency Distributions. Springer.)

Scale-free vs. Random:1000 nodes

Relevance of scale-free networks

• Social networks are scale-free

• Nodes with multiple connections seem to be relevant for language change.– cf: James Milroy and Lesley Milroy (1985) “Linguistic change,

social network and speaker innovation.” Journal of Linguistics, 21:339–384.

Scale-free networks in the model

• Agents communicate individual forms to other agents

• When two agents differ on a form, one agent will update its form with a probability p proportional to how well connected the other agent is:– p = MaxP X ConnectionDegree(agent)/MaxConnectionDegree

– (Similar to Page Rank)

Paradigm entropy

• For exponents φ and morphological functions μ, define the Paradigm Entropy as:

(NB: this is really just the conditional entropy)• If each exponent is unambiguous, the paradigm

entropy is 0

Example

Syncretism tends to be most common in “rarer” parts of paradigm

Old Latin 1st/2nd Declensionspuella, –aīgirl, maiden f.

Singular Plural

Nominative puella puellai

Genitive puellās/-es/-aī puellōm/ -āsom

Dative puellai puellaīs/-eīs/ -abos

Accusative puellam puellā

Ablative puellād puellaīs/-eīs/ -abos

Vocative puella puellai

Locative puellā puellaīs/-eīs

campos, –oīfield, plain m.

saxom, –oīrock, stone n.

Singular Plural Singular Plural

Nominative campos campoī saxom saxa

Genitive campoī campōm/ -ōsom saxoī saxōm/ -ōsom

Dative campoī campoīs saxoī saxoīs

Accusative campom campōs saxom saxa

Ablative campōd campoīs saxōd saxoīs/ -oes

Vocative campe campoī saxe saxoī

Locative campō campoīs saxō saxoīs/ -oes

Simulation• 100 agents in scale-free or random network

– Roughly 250 connections in either case

• 20 bases• 5 “cases”, 2 “numbers”: each slot associated with a probability• Max probability of updating one’s form for a given slot given what another

agent has is 0.2 or 0.5 • Probability of analogizing within one’s own vocabulary is 0.01, 0.02 or 0.05

– Also a mode where we force analogy every 50 iterations– Analogize to words within same “analogy group” (4 such groups in current

simulation)– Winner-takes all strategy

• (Numbers in the titles of the ensuing plots are given as UpdateProb/AnalogyProb (e.g. 0.2/0.01))

• Run for 1000 iterations

Features of simulation

• At nth iteration, compute:– The paradigm distribution over agents for

each word.• Paradigm purity is the proportion of the “winning

paradigm”

– The number of distinct winning paradigms

Scale-free Network: 0.2/0.01

Scale-free network: 0.5/0.05

Random network: 0.5/0.05

Scale-free network: 0.5/0.055000 runs

Random network: 0.5/0.055000 runs

Scale-free network: 0.5/0.005000 runs: No analogy

Scale-free network: 0.5/0.0030,000 runs: No analogy

Sample final state0.24 0.21 0.095 0.095 0.06 0.12 0.095 0.048 0.024 0.012

Adoption of acc/acc/acc/acc/acc/ACC/ACC/ACC/ACC/ACC

in a 0.5/0.05 run

Interim conclusions

• Scale-free networks don’t seem to matter: convergence behavior seems to be no different from a random network– Is that a big surprise?

• Analogy matters

• Paradigm entropy (conditional entropy) might be a model for paradigm simplification

Experiment 3:Large-scale multi-agent

evolutionary modeling with learning

(work in progress…)

Synopsis• System is seeded with a grammar and small number of agents

– Initial grammars all show an agglutinative pattern

– Each agent randomly selects a set of phonetic rules to apply to forms

– Agents are assigned to one of a small number of social groups

• 2 parents “beget” child agents.– Children are exposed to a predetermined number of training forms combined from both parents

• Forms are presented proportional to their underlying “frequency”

– Children must learn to generalize to unseen slots for words

– Learning algorithm similar to:• David Yarowsky and Richard Wicentowski (2001) "Minimally supervised morphological analysis by multimodal alignment."

Proceedings of ACL-2000, Hong Kong, pages 207-216.

• Features include last n-characters of input form, plus semantic class

– Learners select the optimal surface form to derive other forms from (optimal = requiring the simplest resulting ruleset – a Minimum Description Length criterion)

• Forms are periodically pooled among all agents and the n best forms are kept for each word and each slot

• Population grows, but is kept in check by “natural disasters” and a quasi-Malthusian model of resource limitations

– Agents age and die according to reasonably realistic mortality statistics

Population growth, 300 “years”

Phonological rules• c_assimilation• c_lenition• degemination• final_cdel• n_assimilation• r_syllabification• umlaut• v_nasalization• voicing_assimilation• vowel_apocope• vowel_coalescence• vowel_syncope

K = [ptkbdgmnNfvTDszSZxGCJlrhX]L = [wy]V = [aeiouAEIOU&@0âêîôûÂÊÎÔÛãõÕ]

## Regressive voicing assimilation

b -> p / - _ #?[ptkfTsSxC]d -> t / - _ #?[ptkfTsSxC]g -> k / - _ #?[ptkfTsSxC]D -> T / - _ #?[ptkfTsSxC]z -> s / - _ #?[ptkfTsSxC]Z -> S / - _ #?[ptkfTsSxC]G -> x / - _ #?[ptkfTsSxC]J -> C / - _ #?[ptkfTsSxC]

K = [ptkbdgmnNfvTDszSZxGCJlrhX]L = [wy]V = [aeiouAEIOU&@0âêîôûÂÊÎÔÛãõÕ]

[td] -> D / [aeiou&âêîôûã]#? _ #?[aeiou&âêîôûã][pb] -> v / [aeiou&âêîôûã]#? _ #?[aeiou&âêîôûã][gk] -> G / [aeiou&âêîôûã]#? _ #?[aeiou&âêîôûã]

Example run• Initial paradigm:

– Abog pl+acc Abogmeon– Abog pl+dat Abogmeke– Abog pl+gen Abogmei– Abog pl+nom Abogmeko– Abog sg+acc Abogaon– Abog sg+dat Abogake– Abog sg+gen Abogai– Abog sg+nom Abogako

• NUMBER 'a' sg 0.7 'me' pl 0.3• CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen 0.2 'ke' dat 0.1• PHONRULE_WEIGHTING=0.60• NUM_TEACHING_FORMS=1500

Behavior of agent 4517 at 300 “years”

Abog pl+acc AbogmeonAbog pl+dat AbogmekeAbog pl+gen AbogmeiAbog pl+nom AbogmekoAbog sg+acc AbogaonAbog sg+dat AbogakeAbog sg+gen AbogaiAbog sg+nom Abogako

Abog pl+acc AbogmeôAbog pl+dat AbogmekeAbog pl+gen AbogmeiAbog pl+nom AbogmekoAbog sg+acc Abogaô Abog sg+dat AbogakeAbog sg+gen AbogaiAbog sg+nom Abogako

lArpux pl+acc lArpuxmeôlArpux pl+dat lArpuxmeGelArpux pl+gen lArpuxmeilArpux pl+nom lArpuxmeGolArpux sg+acc lArpuxaô lArpux sg+dat lArpuxaGelArpux sg+gen lArpuxailArpux sg+nom lArpuxaGo

lIdrab pl+acc lIdravmeôlIdrab pl+dat lIdrabmekelIdrab pl+gen lIdravmeilIdrab pl+nom lIdrabmeGolIdrab sg+acc lIdravaôlIdrab sg+dat lIdravaGelIdrab sg+gen lIdravailIdrab sg+nom lIdravaGo

59 paradigms covering 454 lexemes

Another run

Another run

• Initial paradigm:– Adgar pl+acc Adgarmeon– Adgar pl+dat Adgarmeke– Adgar pl+gen Adgarmei– Adgar pl+nom Adgarmeko– Adgar sg+acc Adgaraon– Adgar sg+dat Adgarake– Adgar sg+gen Adgarai– Adgar sg+nom Adgarako

• PHONRULE_WEIGHTING=0.80• NUM_TEACHING_FORMS=1500



Albir pl+acc ElbirmenAlbir pl+dat ElbirmeGeAlbir pl+gen ElbirmAlbir pl+nom ElbirmeGoAlbir sg+acc ElbiranAlbir sg+dat ElbiraAlbir sg+gen ElbiAlbir sg+nom Elbira


rIsxuf pl+acc rIsxufamenrIsxuf pl+dat rIsxufamkerIsxuf pl+gen rIsxufmerIsxuf pl+nom rIsxufmeGorIsxuf sg+acc rIsxufanrIsxuf sg+dat rIsxufaGerIsxuf sg+gen rIsxufarIsxuf sg+nom rIsxufaGo

Utber pl+acc UbbermenUtber pl+dat UbbermeGeUtber pl+gen UbbermeUtber pl+nom UbberameGoUtber sg+acc UbberanUtber sg+dat UbberaGeUtber sg+gen UbberaUtber sg+nom UbberaGo

One more example

One more example

• Initial paradigm … as before

• PHONRULE_WEIGHTING=0.80

• NUM_TEACHING_FORMS=1000




Odeg pl+acc OdmOdeg pl+dat ÔOdeg pl+gen OdmOdeg pl+nom OxmOdeg sg+acc OOdeg sg+dat OOdeg sg+gen OOdeg sg+nom O

dugfIp pl+acc dikfIdmdugfIp pl+dat dikfÎdugfIp pl+gen dikfIdmdugfIp pl+nom dikfIxmdugfIp sg+acc dikfIdugfIp sg+dat dikfIdugfIp sg+gen dikfIdugfIp sg+nom dikfI

fApbof pl+acc fAbofdmfApbof pl+dat fAbofmfApbof pl+gen fAbofdmfApbof pl+nom fAbofxmfApbof sg+acc fAboffApbof sg+dat fAboffApbof sg+gen fAboffApbof sg+nom fAbof

unfEr pl+acc ûfEdmunfEr pl+dat ûfÊunfEr pl+gen ûfEtmunfEr pl+nom ûfExmunfEr sg+acc ûfEunfEr sg+dat ûfEunfEr sg+gen ûfEunfEr sg+nom ûfE

exgUp pl+acc exgUdmexgUp pl+dat exgÛexgUp pl+gen exgUgmexgUp pl+nom exgUxmexgUp sg+acc exgUexgUp sg+dat exgUexgUp sg+gen exgUexgUp sg+nom exgU

One final example

Final example…

• NUMBER 'a' sg 0.6 'tu' du 0.1 'me' pl 0.3

• CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen 0.2 'ke' dat 0.1

• PHONRULE_WEIGHTING=0.80

• NUM_TEACHING_FORMS=1000

Final example (some agent or other)Abbus du+acc AbbustuonAbbus du+dat AbbustukeAbbus du+gen AbbustuiAbbus du+nom AbbustukoAbbus pl+acc AbbusmeonAbbus pl+dat AbbusmekeAbbus pl+gen AbbusmeiAbbus pl+nom AbbusmekoAbbus sg+acc AbbusaonAbbus sg+dat AbbusakeAbbus sg+gen AbbusaiAbbus sg+nom Abbusako

Agsaf du+acc AksafAgsaf du+dat AkstuGAgsaf du+gen AksafAgsaf du+nom AksafAgsaf pl+acc AksafmAgsaf pl+dat AksafmAgsaf pl+gen AksafmAgsaf pl+nom AksafmAgsaf sg+acc AksafAgsaf sg+dat AksafAgsaf sg+gen AksafAgsaf sg+nom Aksaf

mampEl du+acc mãpElmampEl du+dat mãptuGmampEl du+gen mãpElmampEl du+nom mãpElmampEl pl+acc mãpElmmampEl pl+dat mãpElrmmampEl pl+gen mãpElmmampEl pl+nom mãpElmmampEl sg+acc mãpElmampEl sg+dat mãpElmampEl sg+gen mãpElmampEl sg+nom mãpEl

odEs du+acc odEsodEs du+dat ottuGodEs du+gen odEsodEs du+nom oktuGodEs pl+acc odEsmodEs pl+dat odEsrmodEs pl+gen odEsmodEs pl+nom odEskmodEs sg+acc odEsodEs sg+dat odEsodEs sg+gen odEsodEs sg+nom odEs

rIndar du+acc rÎdarrIndar du+dat rÎttuGrIndar du+gen rÎdarrIndar du+nom rÎktuGrIndar pl+acc rÎdarmrIndar pl+dat rÎdarmrIndar pl+gen rÎdarmrIndar pl+nom rÎdarmrIndar sg+acc rÎdarrIndar sg+dat rÎdarrIndar sg+gen rÎdarrIndar sg+nom rÎdar


Questions

• Are there too many paradigms?

• Is there too much irregularity?

How many paradigms can there be?

• Russian: “nouns belong to one of three declension patterns”. (Wade, Terence (1992) Comprehensive Russian Grammar. Blackwell, Oxford)– Wade discusses many subclasses

• From Zaliznjak, A. (1987) Gramaticheskij slovar russkogo jazyka, Russki jazyk, Moscow: – at least 500 classes spread over 55,000 nouns

How irregular can things be? Hindi/Urdu Number Names

1 eik 21 ik-kees 41 ikta-lees 61 ik-shat 81 ik-si

2 dau 22 ba-ees 42 baya-lees 62 ba-shat 82 baya-si

3 teen 23 ta-ees 43 tainta-lees 63 tere-shat 83 tera-si

4 chaar 24 chau-bees 44 chawa-lees 64 chaun-shat 84 chaura-si

5 paanch 25 pach-chees 45 painta-lees 65 paen-shat 85 picha-si

6 chay 26 chab-bees 46 chaya-lees 66 sar-shat / chay-aa-shat 86 chaya-si

7 saath 27 satta-ees 47 santa-lees 67 sataath 87 sata-si

8 aath 28 attha-ees 48 arta-lees 68 athath 88 atha-si

9 nau 29 unat-tees 49 un-chas 69 unat-tar 89

10 dus 30 tees 50 pa-chas 70 sat-tar 90 navay

11 gyaa-raan 31 ikat-tees 51 ika-vun 71 ikat-tar 91 ikan-vay

12 baa-raan 32 bat-tees 52 ba-vun 72 bahat-tar 92 ban-vay

13 te-raan 33 tain-tees 53 tera-pun 73 tehat-tar 93 teran-vay

14 chau-daan 34 chaun-tees 54 chav-van 74 chohat-tar 94 chauran-vay

15 pand-raan 35 pan-tees 55 pach-pan 75 pagat-tar 95 pichan-vay

16 so-laan 36 chat-tees 56 chap-pan 76 chayat-tar 96 chiyan-vay

17 sat-raan 37 san-tees 57 sata-van 77 satat-tar 97 chatan-vay

18 attha-raan 38 ear-tees 58 atha-van 78 athat-tar 98 athan-vay

19 un-nees 39 unta-lees 59 un-shat 79 una-si 99 ninan-vay

20 bees 40 cha-lees 60 shaat 80 assi 100 saw

Future work

• More realistic learning

• Incorporate paradigm reduction and analogy mechanisms from Experiment 2

• Add other sources of variation, such as borrowing of other forms

• Develop evaluation metrics:– Can we go beyond “look Ma, it learns”?

Acknowledgments

• Center for Advanced Studies for release time Fall 2007• “The National Science Foundation through TeraGrid

resources provided by the National Center for Supercomputing Applications”

• Google Research grant (for infrastructure originally associated with another project…)

• For helpful discussion/suggestions:– Chen Li– Shalom Lappin– Juliette Blevins– Les Gasser & the LEADS group– Audience at UIUC Linguistics Seminar