Models of Language Evolution: Part IVLinguistic Coherence as Emergent Property
Matilde MarcolliCS101: Mathematical and Computational Linguistics
Winter 2015
CS101 Win2015: Linguistics Language Evolution 4
Main Reference
Partha Niyogi, The computational nature of language learningand evolution, MIT Press, 2006.
CS101 Win2015: Linguistics Language Evolution 4
Population Dynamics Model
• following previous model: languages µ1, . . . , µn and linguisticevolution in population modelled by ODE
xj =∑i
xi fiQij − φ xj
xj = αj proportion of individuals speaking language µj
• matrix Q measure fidelity of language map (how much deviationfrom teacher to learner)
• fi = fitnessfi =
∑j
xj F (µi , µj)
CS101 Win2015: Linguistics Language Evolution 4
Assumptions
• assuming as before that
Qii = q and Qij = 1−qn−1 for i 6= j
F (µi , µi ) = 1 and F (µi , µj) = a for all i 6= j
fi = (1− a)xi + a + f0
Threshold behavior depending on parameter q
for q small only stable critical point is uniform distribution: allxj = 1/n
bifurcation at some q = q1: two new critical points r±
• one-grammar solutions emerge where the majority of populationspeaks one of the languages
CS101 Win2015: Linguistics Language Evolution 4
Without fitness
• Note: same equation with fi = f0 (without fitness function)
• would have φ = f0∑
j xj = f0
• equation would be
xj = f0∑i
xiQij − f0xj
becomes a linear system of ODE
• only equilibrium solution at xj = 1/n, uniform distribution
• no bifurcation and no emergent behavior creating languagecoherence: those are effects of the presence of the fitness function
CS101 Win2015: Linguistics Language Evolution 4
Social Learning
• this model was based on assumption that learner takes input onlyfrom one teacher (with the possibility of errors in reproductionencoded in Qij)
• consider again other scenario where learner’s input is comingfrom the entire population
• given n languages L1, . . . ,Ln assume a set of expressions isespecially useful for language acquisition (triggers, cues, ...)
• this gives subsets Ci ⊆ Li ; assume Ci ∩ Cj = ∅ (these areunambiguous cues)
• speakers of Li produce sentences randomly with distribution Pi
and likelihood of producing a cue is
ai = Pi (Ci )
• simplifying assumption: all ai = a same
CS101 Win2015: Linguistics Language Evolution 4
Case of two languages
• proportions α, 1− α of speakers: function of time x1(t) = α(t),x2(t) = 1− α(t)
• cue-frequency based batch learner: m = k1 + k2 + k3
k1 sentences in input that are in C1
k2 in C2
k3 are not cues
• probability of k1 > k2
f1,a,m(x1, x2) =∑(
m
k1k2k3
)(ax1(t))k1(ax2(t))k2(1− a)k3
sum over (k1, k2, k3) with m = k1 + k2 + k3 and k1 > k2
• probability f2,a,m of k1 < k2, same with sum over (k1, k2, k3) withm = k1 + k2 + k3 and k1 > k2
CS101 Win2015: Linguistics Language Evolution 4
• symmetric assumption ai = a gives f2,a,m(x1, x2) = f1,a,m(x2, x1)
• probability after m inputs of learner acquiring L1
f1 +1
2(1− f2 − f1)
(if no cues received at all: 1/2 chance of one language or other)
• population dynamics equation
x1(t + 1) =1
2(1 + f1,a,m(x1(t), x2(t))− f2,a,m(x1(t), x2(t))
• a fixed point at x1 = x2 = 1/2: uniform distribution ofpopulation among the two languages
CS101 Win2015: Linguistics Language Evolution 4
• if number of inputs m small: only fixed point (stable)
• for larger m other fixed points appear (one language becomesdominant)
• for larger m uniform solution x1 = x2 = 1/2 becomes unstable
• the value of m where bifurcation occurs is a function ofparameter a
• can also keep m fixed and vary a:
a close to zero: only x1 = x2 = 1/2 (stable fixed point)
bifurcation when a grows: new stable fixed points andx1 = x2 = 1/2 becomes unstable
bifurcation occurs at a value of a dependent on m
CS101 Win2015: Linguistics Language Evolution 4
Stability of x1 = x2 = 1/2: more details
• derivative at the fixed point
f ′1,a,m(1/2, 1/2) =∑k1>k2
(m
k1k2k3
)am−k3(1−a)k3(k1−k2)(
1
2)k1+k2−1
similar for f ′2,a,m
• f ′1,a,m(1/2, 1/2)|a=0 = 0 so by continuity for small a have
|f ′1,a,m(1/2, 1/2)| < 1
stability while in this range
• also see that when a = 1, for sufficiently large m havef ′1,a,m(1/2, 1/2)|a=1 > 1 so in between will cross value 1: wherebifurcation occurs
• emergence of linguistic coherence in the population
CS101 Win2015: Linguistics Language Evolution 4
Case of n languages
• learner is exposed to a mixture of languages form theenvironment
• learner scans incoming data for cues and chooses the languagefrom which largest number of cues is received
• if multiple languages with same number of cues: pick one amongthem randomly
• same simplifying assumption as before Pi (Ci ) = a same for alllanguages
CS101 Win2015: Linguistics Language Evolution 4
Algorithm
1 Count cues
ki = number of cues in Ci out of m inputskn+1 = number of non-cues (in any of the languages)m = k1 + · · ·+ kn + kn+1
2 Find maximal languages: languages Li with ki = maxj kj :I = set of indices of Li maximal
3 Choose language: if |I| = 1 choose that language; if |I| > 1choose one language randomly in the set I with probability1/|I|
4 of naive version: just choose a language randomly among all nwith probability 1/n
CS101 Win2015: Linguistics Language Evolution 4
Population Dynamics in this model
• P =∑
i xi (t)Pi probability with which input is generated
• pi = pi (t) = axi (t) probability of receiving a cue from languageLi ; pn+1 = 1− a
• probability of receiving (strictly) more cues from language L1
than from any other
F1,m,a(x1, . . . , xn) =∑(
m
k1 · · · kn+1
)pk1
1 · · · pknn p
kn+1
n+1
sum over all (k1, . . . , kn+1) with m = k1 + · · ·+ kn+1 and k1 > kjfor all j 6= 1
• similar for other languages with symmetry
Fi ,m,a(· · · , xi , · · · , xj · · · ) = Fj ,m,a(· · · , xj , · · · , xi · · · )
CS101 Win2015: Linguistics Language Evolution 4
• in this model, probability that learner will choose Li after minput data
fi ,m,a(x1, . . . , xn) = Fi ,m,a(x1, . . . , xn)+(1−n∑
j=1
Fj ,m,a(x1, . . . , xn))1
n
(with naive version of choice in the cue-less case)
• Recursion relation for population distribution in next generation
xi (t + 1) = fi ,m,a(x1(t), . . . , xn(t))
CS101 Win2015: Linguistics Language Evolution 4
Fixed Points
• f = (fi ,m,a)ni=1 continuous map f : ∆n−1 → ∆n−1
• Results
1 f has finite number of fixed points: at most m2n
2 for small m only fixed point is ( 1n , . . . ,
1n ), stable
3 for fixed (sufficiently large) m number of fixed points varieswith a: small a only one fixed point (uniform distribution); asa increases bifurcation: other fixed points arise
4 large values of a ∼ 1: uniform distribution no longer stable,only the fixed points with one dominant language are
CS101 Win2015: Linguistics Language Evolution 4
Language Learning and Statistical Physics
• these bifurcations and emergence of linguistic coherencereminiscent of behavior of Ising model and spin glass systems inStatistical Physics
• an ensemble of interacting components
• degree of interaction governed by a thermodynamic parameterβ ∼ 1/T inverse temperature
• these systems often exhibit phase transitions between differentregimes, at some critical temperature T = Tc (different states ofmatter, loss of magnetization, etc.)
CS101 Win2015: Linguistics Language Evolution 4
Language Evolution in Locally Connected Societies
• two possible languages: {L0,L1} = {0, 1}
• Graph G representing linguistic agents and their interaction
each vertex v ∈ V (G ) has an associated random variableXv (t)
Xv (t) ∈ {0, 1}: language of agent occupying position v
Xv (t + 1) language occupying same position at next step(generation)
P(Xv (t + 1) = 1) = ga,m(µv (t))
µv =1
val(v)
∑e∈E(G):∂(e)={v ,v ′}
Xv ′(t)
• nearest neighbor interaction considered only
CS101 Win2015: Linguistics Language Evolution 4
• as before assuming a = Pi (Ci ) same for both languages
• a possible choice for the function ga,m : [0, 1]→ [0, 1]:
g(x) =1
2+
1
2(f1,a,m(x , 1− x)− f1,a,m(1− x , x))
with f1,a,m as before counting probability of set of cues k1 > k2
f1,a,m(x , 1− x) =∑(
m
k1k2k3
)(ax)k1(a(1− x))k2(1− a)k3
sum over (k1, k2, k3) with m = k1 + k2 + k3 and k1 > k2
CS101 Win2015: Linguistics Language Evolution 4
• study evolution of
αG (t) =1
#V (G )
∑v∈V (G)
Xv (t)
average number of L1-speakers at time/generation t
• for a complete graph have all language users connected to allothers: recover model in which learning from whole community
• can consider asymptotic behaviors when size of graph becomeslarge #V (G ) = N →∞
• can simplify the geometry making special assumptions on thegraph: e.g. a square lattice
CS101 Win2015: Linguistics Language Evolution 4
The Ising Model of spin systems on a graph G
• configurations of spins s : V (G )→ {±1}
• magnetic field B and correlation strength J: Hamiltonian
H(s) = −J∑
e∈E(G):∂(e)={v ,v ′}
sv sv ′ − B∑
v∈V (G)
sv
• first term measures degree of alignment of nearby spins
• second term measures alignment of spins with direction ofmagnetic field
CS101 Win2015: Linguistics Language Evolution 4
Equilibrium Probability Distribution
• Partition Function ZG (β)
ZG (β) =∑
s:V (G)→{±1}
exp(−βH(s))
• Probability distribution on the configuration space: Gibbsmeasure
PG ,β(s) =e−βH(s)
ZG (β)
• low energy states weight most
• at low temperature (large β): ground state dominates; at highertemperature (β small) higher energy states also contribute
CS101 Win2015: Linguistics Language Evolution 4
Average Spin Magnetization
MG (β) =1
#V (G )
∑s:V (G)→{±1}
∑v∈V (G)
sv P(s)
• Free energy FG (β,B) = logZG (β,B)
MG (β) =1
#V (G )
1
β
(∂FG (β,B)
∂B
)|B=0
• thermodynamic limit: #V (G ) = N →∞
m(β) = lim#V (G)→∞
MG (β)
• in these thermodynamic limits need to fix a way in which thegeometry of the graph grows: if it is a lattice, just grow size N oflattice; for other kinds of graphs, fix how smaller graphs embeddedin larger graphs
CS101 Win2015: Linguistics Language Evolution 4
Ising Model on a 2-dimensional lattice
• ∃ critical temperature T = Tc where phase transition occurs
• for T > Tc equilibrium state has m(T ) = 0 (computed withrespect to the equilibrium Gibbs measure PG ,β
• demagnetization: on average as many up as down spins
• for T < Tc have m(T ) > 0: spontaneous magnetization
• Warning: beware of thermodynamic limits!
• a lot of technical problems in these spin glass models go intohow one takes these limits where the size N of the graph N →∞(even for simple geometries like lattice case)
CS101 Win2015: Linguistics Language Evolution 4
A Spin Glass model of Language Learning
• a multilingual society = a graph G
• linguistic agents = vertices of the graph V (G )
• which agents interact with which others = edges E (G )
• possible languages = spin states- Ising models {±1}: two languages model- Potts models {1, . . . , q}: many languages model
• distribution of population across different languages = averagemagnetization
• previous analysis for “input from whole society” = mean fieldtheory for case of Ising model on complete graph
CS101 Win2015: Linguistics Language Evolution 4
Syntactic Parameters and Ising/Potts Models
• a different view on how to use spin glass models for languageevolution
• characterize set of n = 2N languages Li by binary strings of Nsyntactic parameters (Ising model)
• or by ternary strings (Potts model) if take values ±1 forparameters that are set and 0 for parameters that are not definedin a certain language
• a system of n interacting languages = graph G with n = #V (G )
• languages Li = vertices of the graph (though of as, for instance,the language that occupies a certain geographic area)
• languages that have interaction with each other = edges E (G )(geographical proximity, or high volume of exchange for otherreasons)
CS101 Win2015: Linguistics Language Evolution 4
graph of language interaction (detail) from Global LanguageNetwork of MIT Medialab, with interaction strengths Je on edgesbased on number of book translations
CS101 Win2015: Linguistics Language Evolution 4
• if only one syntactic parameter, would have an Ising model onthe graph G : configurations s : V (G )→ {±1} set the parameterat all the locations on the graph
• variable interaction energies along edges (some pairs oflanguages interact more than others) • magnetic field B andcorrelation strength J: Hamiltonian
H(s) = −∑
e∈E(G):∂(e)={v ,v ′}
N∑i=1
Je sv ,i sv ′,i
• if N parameters, configurations
s = (s1, . . . , sN) : V (G )→ {±1}N
• if all N parameters are independent, then it would be like havingN non-interacting copies of a Ising model on the same graph G (orN independent choices of an initial state in an Ising model on G )
CS101 Win2015: Linguistics Language Evolution 4
• an interesting problem in this model is the entailment ofparameters: it is known that flipping certain syntactic parameterscauses others to flip as well
• so in addition to the edge interaction, instead of alignment withexternal magnetic field term in H
−B∑
v∈V (G)
sv
should have a term that favors alignment of entailed parameters
• set of parameters P with N = #P; subset E ⊂ P × P ofentailments: pairs of parameters (Π,Π′) such that flipplng Πcauses Π′ to flip as well
• term in Hamiltonian favoring alignments of entailed parameters
−B∑
v∈V (G)
∑(Π,Π′)∈E
sv ,Π sv ,Π′
CS101 Win2015: Linguistics Language Evolution 4