Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics...

Models of Language Evolution: Part IVLinguistic Coherence as Emergent Property

Matilde MarcolliCS101: Mathematical and Computational Linguistics

Winter 2015

CS101 Win2015: Linguistics Language Evolution 4

Main Reference

Partha Niyogi, The computational nature of language learningand evolution, MIT Press, 2006.


Population Dynamics Model

• following previous model: languages µ1, . . . , µn and linguisticevolution in population modelled by ODE

xj =∑i

xi fiQij − φ xj

xj = αj proportion of individuals speaking language µj

• matrix Q measure fidelity of language map (how much deviationfrom teacher to learner)

• fi = fitnessfi =

∑j

xj F (µi , µj)


Assumptions

• assuming as before that

Qii = q and Qij = 1−qn−1 for i 6= j

F (µi , µi ) = 1 and F (µi , µj) = a for all i 6= j

fi = (1− a)xi + a + f0

Threshold behavior depending on parameter q

for q small only stable critical point is uniform distribution: allxj = 1/n

bifurcation at some q = q1: two new critical points r±

• one-grammar solutions emerge where the majority of populationspeaks one of the languages


Without fitness

• Note: same equation with fi = f0 (without fitness function)

• would have φ = f0∑

j xj = f0

• equation would be

xj = f0∑i

xiQij − f0xj

becomes a linear system of ODE

• only equilibrium solution at xj = 1/n, uniform distribution

• no bifurcation and no emergent behavior creating languagecoherence: those are effects of the presence of the fitness function


Social Learning

• this model was based on assumption that learner takes input onlyfrom one teacher (with the possibility of errors in reproductionencoded in Qij)

• consider again other scenario where learner’s input is comingfrom the entire population

• given n languages L1, . . . ,Ln assume a set of expressions isespecially useful for language acquisition (triggers, cues, ...)

• this gives subsets Ci ⊆ Li ; assume Ci ∩ Cj = ∅ (these areunambiguous cues)

• speakers of Li produce sentences randomly with distribution Pi

and likelihood of producing a cue is

ai = Pi (Ci )

• simplifying assumption: all ai = a same


Case of two languages

• proportions α, 1− α of speakers: function of time x1(t) = α(t),x2(t) = 1− α(t)

• cue-frequency based batch learner: m = k1 + k2 + k3

k1 sentences in input that are in C1

k2 in C2

k3 are not cues

• probability of k1 > k2

f1,a,m(x1, x2) =∑(

m

k1k2k3

)(ax1(t))k1(ax2(t))k2(1− a)k3

sum over (k1, k2, k3) with m = k1 + k2 + k3 and k1 > k2

• probability f2,a,m of k1 < k2, same with sum over (k1, k2, k3) withm = k1 + k2 + k3 and k1 > k2


• symmetric assumption ai = a gives f2,a,m(x1, x2) = f1,a,m(x2, x1)

• probability after m inputs of learner acquiring L1

f1 +1

2(1− f2 − f1)

(if no cues received at all: 1/2 chance of one language or other)

• population dynamics equation

x1(t + 1) =1

2(1 + f1,a,m(x1(t), x2(t))− f2,a,m(x1(t), x2(t))

• a fixed point at x1 = x2 = 1/2: uniform distribution ofpopulation among the two languages


• if number of inputs m small: only fixed point (stable)

• for larger m other fixed points appear (one language becomesdominant)

• for larger m uniform solution x1 = x2 = 1/2 becomes unstable

• the value of m where bifurcation occurs is a function ofparameter a

• can also keep m fixed and vary a:

a close to zero: only x1 = x2 = 1/2 (stable fixed point)

bifurcation when a grows: new stable fixed points andx1 = x2 = 1/2 becomes unstable

bifurcation occurs at a value of a dependent on m


Stability of x1 = x2 = 1/2: more details

• derivative at the fixed point

f ′1,a,m(1/2, 1/2) =∑k1>k2

(m

k1k2k3

)am−k3(1−a)k3(k1−k2)(

1

2)k1+k2−1

similar for f ′2,a,m

• f ′1,a,m(1/2, 1/2)|a=0 = 0 so by continuity for small a have

|f ′1,a,m(1/2, 1/2)| < 1

stability while in this range

• also see that when a = 1, for sufficiently large m havef ′1,a,m(1/2, 1/2)|a=1 > 1 so in between will cross value 1: wherebifurcation occurs

• emergence of linguistic coherence in the population


Case of n languages

• learner is exposed to a mixture of languages form theenvironment

• learner scans incoming data for cues and chooses the languagefrom which largest number of cues is received

• if multiple languages with same number of cues: pick one amongthem randomly

• same simplifying assumption as before Pi (Ci ) = a same for alllanguages


Algorithm

1 Count cues

ki = number of cues in Ci out of m inputskn+1 = number of non-cues (in any of the languages)m = k1 + · · ·+ kn + kn+1

2 Find maximal languages: languages Li with ki = maxj kj :I = set of indices of Li maximal

3 Choose language: if |I| = 1 choose that language; if |I| > 1choose one language randomly in the set I with probability1/|I|

4 of naive version: just choose a language randomly among all nwith probability 1/n


Population Dynamics in this model

• P =∑

i xi (t)Pi probability with which input is generated

• pi = pi (t) = axi (t) probability of receiving a cue from languageLi ; pn+1 = 1− a

• probability of receiving (strictly) more cues from language L1

than from any other

F1,m,a(x1, . . . , xn) =∑(

m

k1 · · · kn+1

)pk1

1 · · · pknn p

kn+1

n+1

sum over all (k1, . . . , kn+1) with m = k1 + · · ·+ kn+1 and k1 > kjfor all j 6= 1

• similar for other languages with symmetry

Fi ,m,a(· · · , xi , · · · , xj · · · ) = Fj ,m,a(· · · , xj , · · · , xi · · · )


• in this model, probability that learner will choose Li after minput data

fi ,m,a(x1, . . . , xn) = Fi ,m,a(x1, . . . , xn)+(1−n∑

j=1

Fj ,m,a(x1, . . . , xn))1

n

(with naive version of choice in the cue-less case)

• Recursion relation for population distribution in next generation

xi (t + 1) = fi ,m,a(x1(t), . . . , xn(t))


Fixed Points

• f = (fi ,m,a)ni=1 continuous map f : ∆n−1 → ∆n−1

• Results

1 f has finite number of fixed points: at most m2n

2 for small m only fixed point is ( 1n , . . . ,

1n ), stable

3 for fixed (sufficiently large) m number of fixed points varieswith a: small a only one fixed point (uniform distribution); asa increases bifurcation: other fixed points arise

4 large values of a ∼ 1: uniform distribution no longer stable,only the fixed points with one dominant language are


Language Learning and Statistical Physics

• these bifurcations and emergence of linguistic coherencereminiscent of behavior of Ising model and spin glass systems inStatistical Physics

• an ensemble of interacting components

• degree of interaction governed by a thermodynamic parameterβ ∼ 1/T inverse temperature

• these systems often exhibit phase transitions between differentregimes, at some critical temperature T = Tc (different states ofmatter, loss of magnetization, etc.)


Language Evolution in Locally Connected Societies

• two possible languages: {L0,L1} = {0, 1}

• Graph G representing linguistic agents and their interaction

each vertex v ∈ V (G ) has an associated random variableXv (t)

Xv (t) ∈ {0, 1}: language of agent occupying position v

Xv (t + 1) language occupying same position at next step(generation)

P(Xv (t + 1) = 1) = ga,m(µv (t))

µv =1

val(v)

∑e∈E(G):∂(e)={v ,v ′}

Xv ′(t)

• nearest neighbor interaction considered only


• as before assuming a = Pi (Ci ) same for both languages

• a possible choice for the function ga,m : [0, 1]→ [0, 1]:

g(x) =1

2+

1

2(f1,a,m(x , 1− x)− f1,a,m(1− x , x))

with f1,a,m as before counting probability of set of cues k1 > k2

f1,a,m(x , 1− x) =∑(

m

k1k2k3

)(ax)k1(a(1− x))k2(1− a)k3

sum over (k1, k2, k3) with m = k1 + k2 + k3 and k1 > k2


• study evolution of

αG (t) =1

#V (G )

∑v∈V (G)

Xv (t)

average number of L1-speakers at time/generation t

• for a complete graph have all language users connected to allothers: recover model in which learning from whole community

• can consider asymptotic behaviors when size of graph becomeslarge #V (G ) = N →∞

• can simplify the geometry making special assumptions on thegraph: e.g. a square lattice


The Ising Model of spin systems on a graph G

• configurations of spins s : V (G )→ {±1}

• magnetic field B and correlation strength J: Hamiltonian

H(s) = −J∑

e∈E(G):∂(e)={v ,v ′}

sv sv ′ − B∑

v∈V (G)

sv

• first term measures degree of alignment of nearby spins

• second term measures alignment of spins with direction ofmagnetic field


Equilibrium Probability Distribution

• Partition Function ZG (β)

ZG (β) =∑

s:V (G)→{±1}

exp(−βH(s))

• Probability distribution on the configuration space: Gibbsmeasure

PG ,β(s) =e−βH(s)

ZG (β)

• low energy states weight most

• at low temperature (large β): ground state dominates; at highertemperature (β small) higher energy states also contribute


Average Spin Magnetization

MG (β) =1

#V (G )

∑s:V (G)→{±1}

∑v∈V (G)

sv P(s)

• Free energy FG (β,B) = logZG (β,B)

MG (β) =1

#V (G )

1

β

(∂FG (β,B)

∂B

)|B=0

• thermodynamic limit: #V (G ) = N →∞

m(β) = lim#V (G)→∞

MG (β)

• in these thermodynamic limits need to fix a way in which thegeometry of the graph grows: if it is a lattice, just grow size N oflattice; for other kinds of graphs, fix how smaller graphs embeddedin larger graphs


Ising Model on a 2-dimensional lattice

• ∃ critical temperature T = Tc where phase transition occurs

• for T > Tc equilibrium state has m(T ) = 0 (computed withrespect to the equilibrium Gibbs measure PG ,β

• demagnetization: on average as many up as down spins

• for T < Tc have m(T ) > 0: spontaneous magnetization

• Warning: beware of thermodynamic limits!

• a lot of technical problems in these spin glass models go intohow one takes these limits where the size N of the graph N →∞(even for simple geometries like lattice case)


A Spin Glass model of Language Learning

• a multilingual society = a graph G

• linguistic agents = vertices of the graph V (G )

• which agents interact with which others = edges E (G )

• possible languages = spin states- Ising models {±1}: two languages model- Potts models {1, . . . , q}: many languages model

• distribution of population across different languages = averagemagnetization

• previous analysis for “input from whole society” = mean fieldtheory for case of Ising model on complete graph


Syntactic Parameters and Ising/Potts Models

• a different view on how to use spin glass models for languageevolution

• characterize set of n = 2N languages Li by binary strings of Nsyntactic parameters (Ising model)

• or by ternary strings (Potts model) if take values ±1 forparameters that are set and 0 for parameters that are not definedin a certain language

• a system of n interacting languages = graph G with n = #V (G )

• languages Li = vertices of the graph (though of as, for instance,the language that occupies a certain geographic area)

• languages that have interaction with each other = edges E (G )(geographical proximity, or high volume of exchange for otherreasons)


graph of language interaction (detail) from Global LanguageNetwork of MIT Medialab, with interaction strengths Je on edgesbased on number of book translations


• if only one syntactic parameter, would have an Ising model onthe graph G : configurations s : V (G )→ {±1} set the parameterat all the locations on the graph

• variable interaction energies along edges (some pairs oflanguages interact more than others) • magnetic field B andcorrelation strength J: Hamiltonian

H(s) = −∑

e∈E(G):∂(e)={v ,v ′}

N∑i=1

Je sv ,i sv ′,i

• if N parameters, configurations

s = (s1, . . . , sN) : V (G )→ {±1}N

• if all N parameters are independent, then it would be like havingN non-interacting copies of a Ising model on the same graph G (orN independent choices of an initial state in an Ising model on G )


• an interesting problem in this model is the entailment ofparameters: it is known that flipping certain syntactic parameterscauses others to flip as well

• so in addition to the edge interaction, instead of alignment withexternal magnetic field term in H

−B∑

v∈V (G)

sv

should have a term that favors alignment of entailed parameters

• set of parameters P with N = #P; subset E ⊂ P × P ofentailments: pairs of parameters (Π,Π′) such that flipplng Πcauses Π′ to flip as well

• term in Hamiltonian favoring alignments of entailed parameters

−B∑

v∈V (G)

∑(Π,Π′)∈E

sv ,Π sv ,Π′


Date post:	25-Jun-2020
Category:	Documents
Upload:	others
View:	0 times
Download:	0 times

Models of Language Evolution: Part IV Linguistic …matilde/LangEvolution4.pdfPopulation Dynamics...

Documents