(Preliminary)
Experiments in Morphological Evolution
Richard SproatUniversity of Illinois at Urbana-Champaign
3rd Workshop on "Quantitative Investigations in Theoretical Linguistics" (QITL-3)Helsinki, 2-4 June 2008
Overview
• The explananda
• Previous work on evolutionary modeling
• Computational models and preliminary
experiments
Phenomena
• How do paradigms arise?– Why do words fall into different inflectional
“equivalence classes”
• Why do stem alternations arise?
• Why is there syncretism?– Why are there “rules of referral”?
Stem alternations in Sanskrit
guna zero
Examples from: Stump, Gregory (2001) Inflectional Morphology:A Theory of Paradigm Structure. Cambridge University Press.
Stem alternations in Sanskrit
vrddhi lexeme-class particularlexeme-classparticular
morphomic (Aronoff, M. 1994. Morphology by Itself. MIT Press.)
Evolutionary Modeling (A tiny sample)
• Hare, M. and Elman, J. L. (1995) Learning and morphological change. Cognition, 56(1):61--98.
• Kirby, S. (1999) Function, Selection, and Innateness: The Emergence of Language Universals. Oxford
• Nettle, D. "Using Social Impact Theory to simulate language change". Lingua, 108(2-3):95--117, 1999.
• de Boer, B. (2001) The Origins of Vowel Systems. Oxford
• Niyogi, P. (2006) The Computational Nature of Language Learning and Evolution. Cambridge, MA: MIT Press.
Experiment 1: Rules of Referral
Rules of referral
• Stump, Gregory (1993) “On rules of referral”. Language. 69(3), 449-479– (After Zwicky, Arnold (1985) “How to describe inflection.” Berkeley
Linguistics Society. 11, 372-386.)
Latin declensions
Are rules of referral interesting?
• Are they useful for the learner?– Wouldn’t the learner have heard instances of
every paradigm?
• Are they historically interesting:– Does morphological theory need mechanisms
to explain why they occur?
Another example: Böğüstani nominal declension
NomAccGenDatLocInstAblIllat
sSg Du Pl sSg Du Pl sSg Du Pl
Böğüstani A language of Uzbekistan
ISO 639-3: bgsPopulation 15,500 (1998 Durieux).Comments Capsicum chinense and Coffea arabica farmers
Monte Carlo simulation(generating Böğüstani)
• Select a re-use bias B
• For each language:– Generate a set of vowels, consonants and
affix templates• a, i, u, e• n f r w B s x j D• V, C, CV, VC
– Decide on p paradigms (minimum 3), r rows (minimum 2), c columns (minimum 2)
Monte Carlo simulation
• For each paradigm in the language:– Iterate over (r, c):
• Let α be previous affix stored for r: with p = B retain α in L• Let β be previous affix stored for c: with p = B retain β in L• If either L is non-empty, set (r, c) to random choice from L • Otherwise generate a new affix for (r, c)
• Store (r, c)’s affix for r and c
• Note that P(new-affix) = (1-B)2
Sample language: bias = 0.04Consonants x n p w j B t r s S mVowels a i u eTemplates V, C, CV, VC
Sample language: bias = 0.04Consonants n f r w B s x j DVowels a i u eTemplates V, C, CV, VC
Sample language: bias = 0.04Consonants r p j d G DVowels a i u e o y OTemplates V, C, CV, VC, CVC, VCV, CVCV, VCVC
Sample language: bias = 0.04Consonants D k S n b s l t w j B g G dVowels a i u e Templates V, C, CV, VC
Results of Monte Carlo simulations(8000 runs, 5000 languages per run)
Interim conclusion
• Syncretism, including rules of referral, may arise as a chance byproduct of tendencies to reuse inflectional exponents --- and hence reduce the number of exponents needed in the system.
• Side question: is the amount of ambiguity among inflectional exponents statistically different from that among lexemes? (cf. Beard’s Lexeme-Morpheme-Base Morphology)– Probably not since inflectional exponents tend to be
shorter, so the chances of collisions are much higher
Experiment 2:Stabilizing Multiple Paradigms in
a Multiagent Network
Paradigm Reduction in Multi-agent Models with Scale-Free Networks
• Agents connected in scale-free network• Only connected agents communicate• Agents more likely to update forms from interlocutors
they “trust”• Each individual agent has pressure to simplify its
morphology by collapsing exponents:– Exponent collapse is picked to minimize an increase in paradigm
entropy– Paradigms may be simplified – removing distinctions and thus
reducing paradigm entropy– As the number of exponents decreases so does the pressure to
reduce– Agents analogize paradigms to other words
Scale-free networks
Scale-free networks
• Connection degrees follow the Yule-Simon distribution:
where for sufficiently large k:
i.e. reduces to Zipf’s law (cf. Baayen, Harald (2000) Word Frequency Distributions. Springer.)
Scale-free vs. Random:1000 nodes
Relevance of scale-free networks
• Social networks are scale-free
• Nodes with multiple connections seem to be relevant for language change.– cf: James Milroy and Lesley Milroy (1985) “Linguistic change,
social network and speaker innovation.” Journal of Linguistics, 21:339–384.
Scale-free networks in the model
• Agents communicate individual forms to other agents
• When two agents differ on a form, one agent will update its form with a probability p proportional to how well connected the other agent is:– p = MaxP X ConnectionDegree(agent)/MaxConnectionDegree
– (Similar to Page Rank)
Paradigm entropy
• For exponents φ and morphological functions μ, define the Paradigm Entropy as:
(NB: this is really just the conditional entropy)• If each exponent is unambiguous, the paradigm
entropy is 0
Example
Syncretism tends to be most common in “rarer” parts of paradigm
Old Latin 1st/2nd Declensionspuella, –aīgirl, maiden f.
Singular Plural
Nominative puella puellai
Genitive puellās/-es/-aī puellōm/ -āsom
Dative puellai puellaīs/-eīs/ -abos
Accusative puellam puellā
Ablative puellād puellaīs/-eīs/ -abos
Vocative puella puellai
Locative puellā puellaīs/-eīs
campos, –oīfield, plain m.
saxom, –oīrock, stone n.
Singular Plural Singular Plural
Nominative campos campoī saxom saxa
Genitive campoī campōm/ -ōsom saxoī saxōm/ -ōsom
Dative campoī campoīs saxoī saxoīs
Accusative campom campōs saxom saxa
Ablative campōd campoīs saxōd saxoīs/ -oes
Vocative campe campoī saxe saxoī
Locative campō campoīs saxō saxoīs/ -oes
Simulation• 100 agents in scale-free or random network
– Roughly 250 connections in either case
• 20 bases• 5 “cases”, 2 “numbers”: each slot associated with a probability• Max probability of updating one’s form for a given slot given what another
agent has is 0.2 or 0.5 • Probability of analogizing within one’s own vocabulary is 0.01, 0.02 or 0.05
– Also a mode where we force analogy every 50 iterations– Analogize to words within same “analogy group” (4 such groups in current
simulation)– Winner-takes all strategy
• (Numbers in the titles of the ensuing plots are given as UpdateProb/AnalogyProb (e.g. 0.2/0.01))
• Run for 1000 iterations
Features of simulation
• At nth iteration, compute:– The paradigm distribution over agents for
each word.• Paradigm purity is the proportion of the “winning
paradigm”
– The number of distinct winning paradigms
Scale-free Network: 0.2/0.01
Scale-free network: 0.5/0.05
Random network: 0.5/0.05
Scale-free network: 0.5/0.055000 runs
Random network: 0.5/0.055000 runs
Scale-free network: 0.5/0.005000 runs: No analogy
Scale-free network: 0.5/0.0030,000 runs: No analogy
Sample final state0.24 0.21 0.095 0.095 0.06 0.12 0.095 0.048 0.024 0.012
Adoption of acc/acc/acc/acc/acc/ACC/ACC/ACC/ACC/ACC
in a 0.5/0.05 run
Interim conclusions
• Scale-free networks don’t seem to matter: convergence behavior seems to be no different from a random network– Is that a big surprise?
• Analogy matters
• Paradigm entropy (conditional entropy) might be a model for paradigm simplification
Experiment 3:Large-scale multi-agent
evolutionary modeling with learning
(work in progress…)
Synopsis• System is seeded with a grammar and small number of agents
– Initial grammars all show an agglutinative pattern
– Each agent randomly selects a set of phonetic rules to apply to forms
– Agents are assigned to one of a small number of social groups
• 2 parents “beget” child agents.– Children are exposed to a predetermined number of training forms combined from both parents
• Forms are presented proportional to their underlying “frequency”
– Children must learn to generalize to unseen slots for words
– Learning algorithm similar to:• David Yarowsky and Richard Wicentowski (2001) "Minimally supervised morphological analysis by multimodal alignment."
Proceedings of ACL-2000, Hong Kong, pages 207-216.
• Features include last n-characters of input form, plus semantic class
– Learners select the optimal surface form to derive other forms from (optimal = requiring the simplest resulting ruleset – a Minimum Description Length criterion)
• Forms are periodically pooled among all agents and the n best forms are kept for each word and each slot
• Population grows, but is kept in check by “natural disasters” and a quasi-Malthusian model of resource limitations
– Agents age and die according to reasonably realistic mortality statistics
Population growth, 300 “years”
Phonological rules• c_assimilation• c_lenition• degemination• final_cdel• n_assimilation• r_syllabification• umlaut• v_nasalization• voicing_assimilation• vowel_apocope• vowel_coalescence• vowel_syncope
K = [ptkbdgmnNfvTDszSZxGCJlrhX]L = [wy]V = [aeiouAEIOU&@0âêîôûÂÊÎÔÛãõÕ]
## Regressive voicing assimilation
b -> p / - _ #?[ptkfTsSxC]d -> t / - _ #?[ptkfTsSxC]g -> k / - _ #?[ptkfTsSxC]D -> T / - _ #?[ptkfTsSxC]z -> s / - _ #?[ptkfTsSxC]Z -> S / - _ #?[ptkfTsSxC]G -> x / - _ #?[ptkfTsSxC]J -> C / - _ #?[ptkfTsSxC]
K = [ptkbdgmnNfvTDszSZxGCJlrhX]L = [wy]V = [aeiouAEIOU&@0âêîôûÂÊÎÔÛãõÕ]
[td] -> D / [aeiou&âêîôûã]#? _ #?[aeiou&âêîôûã][pb] -> v / [aeiou&âêîôûã]#? _ #?[aeiou&âêîôûã][gk] -> G / [aeiou&âêîôûã]#? _ #?[aeiou&âêîôûã]
Example run• Initial paradigm:
– Abog pl+acc Abogmeon– Abog pl+dat Abogmeke– Abog pl+gen Abogmei– Abog pl+nom Abogmeko– Abog sg+acc Abogaon– Abog sg+dat Abogake– Abog sg+gen Abogai– Abog sg+nom Abogako
• NUMBER 'a' sg 0.7 'me' pl 0.3• CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen 0.2 'ke' dat 0.1• PHONRULE_WEIGHTING=0.60• NUM_TEACHING_FORMS=1500
Behavior of agent 4517 at 300 “years”
Abog pl+acc AbogmeonAbog pl+dat AbogmekeAbog pl+gen AbogmeiAbog pl+nom AbogmekoAbog sg+acc AbogaonAbog sg+dat AbogakeAbog sg+gen AbogaiAbog sg+nom Abogako
Abog pl+acc AbogmeôAbog pl+dat AbogmekeAbog pl+gen AbogmeiAbog pl+nom AbogmekoAbog sg+acc Abogaô Abog sg+dat AbogakeAbog sg+gen AbogaiAbog sg+nom Abogako
lArpux pl+acc lArpuxmeôlArpux pl+dat lArpuxmeGelArpux pl+gen lArpuxmeilArpux pl+nom lArpuxmeGolArpux sg+acc lArpuxaô lArpux sg+dat lArpuxaGelArpux sg+gen lArpuxailArpux sg+nom lArpuxaGo
lIdrab pl+acc lIdravmeôlIdrab pl+dat lIdrabmekelIdrab pl+gen lIdravmeilIdrab pl+nom lIdrabmeGolIdrab sg+acc lIdravaôlIdrab sg+dat lIdravaGelIdrab sg+gen lIdravailIdrab sg+nom lIdravaGo
59 paradigms covering 454 lexemes
Another run
Another run
• Initial paradigm:– Adgar pl+acc Adgarmeon– Adgar pl+dat Adgarmeke– Adgar pl+gen Adgarmei– Adgar pl+nom Adgarmeko– Adgar sg+acc Adgaraon– Adgar sg+dat Adgarake– Adgar sg+gen Adgarai– Adgar sg+nom Adgarako
• PHONRULE_WEIGHTING=0.80• NUM_TEACHING_FORMS=1500
Behavior of agent 5061 at 300 “years”
109 paradigms covering 397 lexemes
Albir pl+acc ElbirmenAlbir pl+dat ElbirmeGeAlbir pl+gen ElbirmAlbir pl+nom ElbirmeGoAlbir sg+acc ElbiranAlbir sg+dat ElbiraAlbir sg+gen ElbiAlbir sg+nom Elbira
Abog pl+acc AbogmeonAbog pl+dat AbogmekeAbog pl+gen AbogmeiAbog pl+nom AbogmekoAbog sg+acc AbogaonAbog sg+dat AbogakeAbog sg+gen AbogaiAbog sg+nom Abogako
rIsxuf pl+acc rIsxufamenrIsxuf pl+dat rIsxufamkerIsxuf pl+gen rIsxufmerIsxuf pl+nom rIsxufmeGorIsxuf sg+acc rIsxufanrIsxuf sg+dat rIsxufaGerIsxuf sg+gen rIsxufarIsxuf sg+nom rIsxufaGo
Utber pl+acc UbbermenUtber pl+dat UbbermeGeUtber pl+gen UbbermeUtber pl+nom UbberameGoUtber sg+acc UbberanUtber sg+dat UbberaGeUtber sg+gen UbberaUtber sg+nom UbberaGo
One more example
One more example
• Initial paradigm … as before
• PHONRULE_WEIGHTING=0.80
• NUM_TEACHING_FORMS=1000
Behavior of agent 4195 at 300 “years”
Abog pl+acc AbogmeonAbog pl+dat AbogmekeAbog pl+gen AbogmeiAbog pl+nom AbogmekoAbog sg+acc AbogaonAbog sg+dat AbogakeAbog sg+gen AbogaiAbog sg+nom Abogako
66 paradigms covering 250 lexemes
Odeg pl+acc OdmOdeg pl+dat ÔOdeg pl+gen OdmOdeg pl+nom OxmOdeg sg+acc OOdeg sg+dat OOdeg sg+gen OOdeg sg+nom O
dugfIp pl+acc dikfIdmdugfIp pl+dat dikfÎdugfIp pl+gen dikfIdmdugfIp pl+nom dikfIxmdugfIp sg+acc dikfIdugfIp sg+dat dikfIdugfIp sg+gen dikfIdugfIp sg+nom dikfI
fApbof pl+acc fAbofdmfApbof pl+dat fAbofmfApbof pl+gen fAbofdmfApbof pl+nom fAbofxmfApbof sg+acc fAboffApbof sg+dat fAboffApbof sg+gen fAboffApbof sg+nom fAbof
unfEr pl+acc ûfEdmunfEr pl+dat ûfÊunfEr pl+gen ûfEtmunfEr pl+nom ûfExmunfEr sg+acc ûfEunfEr sg+dat ûfEunfEr sg+gen ûfEunfEr sg+nom ûfE
exgUp pl+acc exgUdmexgUp pl+dat exgÛexgUp pl+gen exgUgmexgUp pl+nom exgUxmexgUp sg+acc exgUexgUp sg+dat exgUexgUp sg+gen exgUexgUp sg+nom exgU
One final example
Final example…
• NUMBER 'a' sg 0.6 'tu' du 0.1 'me' pl 0.3
• CASE 'ko' nom 0.4 'on' acc 0.3 'i' gen 0.2 'ke' dat 0.1
• PHONRULE_WEIGHTING=0.80
• NUM_TEACHING_FORMS=1000
Final example (some agent or other)Abbus du+acc AbbustuonAbbus du+dat AbbustukeAbbus du+gen AbbustuiAbbus du+nom AbbustukoAbbus pl+acc AbbusmeonAbbus pl+dat AbbusmekeAbbus pl+gen AbbusmeiAbbus pl+nom AbbusmekoAbbus sg+acc AbbusaonAbbus sg+dat AbbusakeAbbus sg+gen AbbusaiAbbus sg+nom Abbusako
Agsaf du+acc AksafAgsaf du+dat AkstuGAgsaf du+gen AksafAgsaf du+nom AksafAgsaf pl+acc AksafmAgsaf pl+dat AksafmAgsaf pl+gen AksafmAgsaf pl+nom AksafmAgsaf sg+acc AksafAgsaf sg+dat AksafAgsaf sg+gen AksafAgsaf sg+nom Aksaf
mampEl du+acc mãpElmampEl du+dat mãptuGmampEl du+gen mãpElmampEl du+nom mãpElmampEl pl+acc mãpElmmampEl pl+dat mãpElrmmampEl pl+gen mãpElmmampEl pl+nom mãpElmmampEl sg+acc mãpElmampEl sg+dat mãpElmampEl sg+gen mãpElmampEl sg+nom mãpEl
odEs du+acc odEsodEs du+dat ottuGodEs du+gen odEsodEs du+nom oktuGodEs pl+acc odEsmodEs pl+dat odEsrmodEs pl+gen odEsmodEs pl+nom odEskmodEs sg+acc odEsodEs sg+dat odEsodEs sg+gen odEsodEs sg+nom odEs
rIndar du+acc rÎdarrIndar du+dat rÎttuGrIndar du+gen rÎdarrIndar du+nom rÎktuGrIndar pl+acc rÎdarmrIndar pl+dat rÎdarmrIndar pl+gen rÎdarmrIndar pl+nom rÎdarmrIndar sg+acc rÎdarrIndar sg+dat rÎdarrIndar sg+gen rÎdarrIndar sg+nom rÎdar
171 paradigms covering 228 lexemes
Questions
• Are there too many paradigms?
• Is there too much irregularity?
How many paradigms can there be?
• Russian: “nouns belong to one of three declension patterns”. (Wade, Terence (1992) Comprehensive Russian Grammar. Blackwell, Oxford)– Wade discusses many subclasses
• From Zaliznjak, A. (1987) Gramaticheskij slovar russkogo jazyka, Russki jazyk, Moscow: – at least 500 classes spread over 55,000 nouns
How irregular can things be? Hindi/Urdu Number Names
1 eik 21 ik-kees 41 ikta-lees 61 ik-shat 81 ik-si
2 dau 22 ba-ees 42 baya-lees 62 ba-shat 82 baya-si
3 teen 23 ta-ees 43 tainta-lees 63 tere-shat 83 tera-si
4 chaar 24 chau-bees 44 chawa-lees 64 chaun-shat 84 chaura-si
5 paanch 25 pach-chees 45 painta-lees 65 paen-shat 85 picha-si
6 chay 26 chab-bees 46 chaya-lees 66 sar-shat / chay-aa-shat 86 chaya-si
7 saath 27 satta-ees 47 santa-lees 67 sataath 87 sata-si
8 aath 28 attha-ees 48 arta-lees 68 athath 88 atha-si
9 nau 29 unat-tees 49 un-chas 69 unat-tar 89
10 dus 30 tees 50 pa-chas 70 sat-tar 90 navay
11 gyaa-raan 31 ikat-tees 51 ika-vun 71 ikat-tar 91 ikan-vay
12 baa-raan 32 bat-tees 52 ba-vun 72 bahat-tar 92 ban-vay
13 te-raan 33 tain-tees 53 tera-pun 73 tehat-tar 93 teran-vay
14 chau-daan 34 chaun-tees 54 chav-van 74 chohat-tar 94 chauran-vay
15 pand-raan 35 pan-tees 55 pach-pan 75 pagat-tar 95 pichan-vay
16 so-laan 36 chat-tees 56 chap-pan 76 chayat-tar 96 chiyan-vay
17 sat-raan 37 san-tees 57 sata-van 77 satat-tar 97 chatan-vay
18 attha-raan 38 ear-tees 58 atha-van 78 athat-tar 98 athan-vay
19 un-nees 39 unta-lees 59 un-shat 79 una-si 99 ninan-vay
20 bees 40 cha-lees 60 shaat 80 assi 100 saw
Future work
• More realistic learning
• Incorporate paradigm reduction and analogy mechanisms from Experiment 2
• Add other sources of variation, such as borrowing of other forms
• Develop evaluation metrics:– Can we go beyond “look Ma, it learns”?
Acknowledgments
• Center for Advanced Studies for release time Fall 2007• “The National Science Foundation through TeraGrid
resources provided by the National Center for Supercomputing Applications”
• Google Research grant (for infrastructure originally associated with another project…)
• For helpful discussion/suggestions:– Chen Li– Shalom Lappin– Juliette Blevins– Les Gasser & the LEADS group– Audience at UIUC Linguistics Seminar