Meaning as a Selective Pressure in the Evolution of Linguistic Structure
Mónica Tamariz [email protected]
Language as an Evolutionary System: A Multidisciplinary Approach. Edinburgh, 13 July 2010
LANGUAGESTRUCTURE
Socialnetworkstructureanddynamics
TheoryofmindConformitybias
MemoryProcessingPercepConProducCon
ParsimonyMinimumeffort
InformaCon/noise
Structureofmeanings
Communication & thought
Human fitness
Natural selection
Overview • CorrelaCon:FormstructuresystemaCcallyreflectsmeaningstructure(Synchroniccorpusstudy)
• Causality:Formstructureofformsgraduallycomestoreflectmeaningstructure(Diachroniccorpusofar/ficialminiature
languages)
1. Form-meaning systematicity in the lexicon
2. The evolution of frequency distributions in language FORMS�
3. FORMS and MEANINGS: The evolution of
compositionality�
4. Evolutionary dynamics �
Systematicity
• TherelaConshipbetweentwospacesissystemaCcifthestructureofonespacereflectsthestructureoftheother
• Thereforeknowingaboutonestructureprovidesuswithsomeknowledgeabouttheother
Systematicity in language • LexicalitemssystemaCcallyreflectsemanCcmeaning
• Wordorder,inflexion,derivaConetcreflectgrammaCcalmeaning
present past 3rd person talk talked talks play played plays study studied studies
Systematicity in the lexicon
Onomatopoeia&soundsymbolism
Glisten Glitter Glow Glimmer
Snout Sniff Snore Snout Sniffle Snarl
A systematic lexicon
Wordsthatsoundsimilartendtohavesimilarmeanings
Meaningspace
FuncConsofsystemaCcity:
• Helplearn/understandnewitems
• AllowcreaCvityandgeneralisaCon
DistribuConalsimilarity
Capturessyntax&seman/cs(Landauer&Dumais1997,MacDonald2000)
Phonologicalsimilarity
kasa
lima
nene
kita
kasa
kita lima
nene
Measuring systematicity
DATA
• A subset of the spoken part of the BNC (Shillcock et al. 2001)
1733 most frequent monosyllabic monomorphemic words
• Three subsets of a Spanish speech corpus (Tamariz 2008)
Measuring systematicity
252 CVCV words 146 CVCCV words of freq >= 20 148 CVCVCV words
Results
(Shillcock et al. 2001, Tamariz 2008)
Wordsthatsoundsimilardotendtohavesimilarmeanings
However,toomuchsystemaCcitymayposeaproblemforcomprehension!
The phonological correlates of systematicity
Which elements of word form systematically relate to word meaning?
In CVCV, CVCCV, CVCVCV words , measure correlation between meaning and form similarity terms of:
- Consonants, vowels, stress..
E.g. Do words that share the first consonant tend to have similar meanings?
Results TheimpactofphonologicalparametersonsystemaCcity
CVCV
Allvaluesp<0.01exceptwherestated
consonantsvowelsstress
Impact > 0 Words sharing “tc” tend to have similar meanings
(Tamariz 2008)
Impact < 0 Words sharing “v1” tend to have DISSIMILAR meanings
TheimpactofphonologicalparametersonsystemaCcity
Results
(Tamariz 2008)
consonantsvowelsstress• Consonants,mostlyposiCveimpact
• Vowels,mostlynegaCveimpact
• StressedvowelinthepenulCmatesyllable,negaCveimpact
• Otherstress,posiCveimpact
Allvaluesp<0.01exceptwherestated
CVCVCV CVCCV CVCV
The structure of the lexicon is an adaptation to two opposed pressures
Aspects of form selectively respond to pressures
One interpretation of results
To be systematic (for processing and learning)
To avoid ambiguities derived from systematicity
Consonant structure, stress pattern
Vowel structure, esp. stressed vowel in penultimate syllable
(ForSpanish;otherlanguagesmayhavefounddifferentsolu/onstothisconflict)
The structure of the lexicon is an adaptation to two opposed pressures
… and those pressures originate in the structure of meanings
-- Coarse-grained categories (consonants, stress) -- Fine-grained distinctions (stressed penultimate vowel)
One interpretation of results
FrequencydistribuConofall1,2and3‐gramsinasetofwords
risa tio suerte trabajo hotel enchufe caballo estudia joven dia fecha espana manos libro leo cuidado encanta azul autobus folleto imprimir toma
RANDOMWORDS
N-gram frequency
(Tamariz, in prep)
FrequencydistribuConofall1,2and3‐gramsinasetofwords
risa tio suerte trabajo hotel enchufe caballo estudia joven dia fecha espana manos libro leo cuidado encanta azul autobus folleto imprimir toma
RANDOMWORDS
Expect a power law distribution: Signature of natural languages, e.g. Zipf’s law, etc
N-gram frequency
(Tamariz, in prep)
FrequencydistribuConofall1,2and3‐gramsinasetofwords
risa tio suerte trabajo hotel enchufe caballo estudia joven dia fecha espana manos libro leo cuidado encanta azul autobus folleto imprimir toma
RANDOMWORDS
Expect a power law distribution: Signature of natural languages, e.g. Zipf’s law, etc
N-gram frequency
(Tamariz, in prep)
FrequencydistribuConofall1,2and3‐gramsinasetofwords
risa tio suerte trabajo hotel enchufe caballo estudia joven dia fecha espana manos libro leo cuidado encanta azul autobus folleto imprimir toma
RANDOMWORDS
amo amas ama ame amaste amo amaba amabas amaba amare amaras amara amaria amarias amaria amase amases amase amara amaras amara ame
VERBPARADIGM
2 4 6 18 24
6 18
24
N-gram frequency
(Tamariz, in prep)
FrequencydistribuConofall1,2and3‐gramsinasetofwords
risa tio suerte trabajo hotel enchufe caballo estudia joven dia fecha espana manos libro leo cuidado encanta azul autobus folleto imprimir toma
RANDOMWORDS
amo amas ama ame amaste amo amaba amabas amaba amare amaras amara amaria amarias amaria amase amases amase amara amaras amara ame
VERBPARADIGM
2 4 6 18 24
6 18
24
Frequency signature of structure
N-gram frequency
(Tamariz, in prep)
game order would that doing shit were of topping than don but and of don the it the is it eats for be
twentytwo twentythree twentyfour twentyfive twentysix twentyseven twentyeight twentynine thirty thirtyone thirtytwo thirtythree thirtyfour thirtyfive thirtysix thirtyseven thirtyeight thirtynine forty fortyone fortytwo fortythree fortyfour
FrequencydistribuConofall1,2and3‐gramsinasetofwords
RANDOMWORDS 1000NUMBERWORDS
100 300 80 10
900
1500
N-gram frequency
(Tamariz, in prep)
(aside)
6 18
24
• Each chemical element emits in specific frequencies
• Spectra used to identify elements
• Special subsets of a language show specific frequencies • Spectra used to identify the quantitative structure of a sample?
- Tell decimal from other numeral systems? - Classify morphological paradigms?
Sun
H
He
Hg
U
• Maybe the result of adaptation… maybe not
• Look at the process of adaptation
Adaptation of linguistic form to meaning?
Evolution of the n-gram freq distr
• Data from Kirby, Cornish & Smith (2008) • 8 diffusion chains of miniature artificial languages • Distinct, structured meaning space:
27COMPLEXMEANINGS:AllthepossiblecombinaConsoftheabove,e.g.
9SIMPLEXMEANINGS:3shapes3colours3moCons
Kirby, Cornish & Smith (2008)
kimako
koni
kanige
kuni
winige
komako
Genera4on0:randomsignals
Total10genera4ons
Evolution of the n-gram freq distr
Kirby, Cornish & Smith (2008)
• We know the meaning space • We know the whole history of the language
• OneoftheiniCalrandomlanguages
Evolution of the n-gram freq distr
Chain 7 Gen 0
Kirby, Cornish & Smith (2008)
kinimapi miwimi miwiniku pikuhemi mihe gepihemi nihepi wikima wimaku wikuki nipi pinipi kimaki winige
kunige wigemi nipikuge miniki kikumi wige kihemiwi pimikihe kinimage miki mahekuki hema gepinini
• AfewgeneraConslater
Evolution of the n-gram freq distr
Chain 7 Gen 10
Kirby, Cornish & Smith (2008)
miniku tupin tupim miniku tupin tupim miniku tupin tupim poi poi poi poi poi poi poi poi poi
tuge tuge tuge tuge tuge tuge tuge tuge tuge
1 3 5 7 9 12 15 18 21 24 27 30
-15
-10
-50
510
15
Gen 0
N-gram freq
De
via
tio
n f
rom
exp
ecte
d f
req
Evolution of the n-gram freq distr
(Tamariz, in prep)
Data from 4 different chains of languages
Evolution of the n-gram freq distr
1 3 5 7 9 12 15 18 21 24 27 30
-15
-10
-50
510
15
Gen 3
N-gram freq
Freq
(Tamariz, in prep)
Data from 4 different chains of languages
Evolution of the n-gram freq distr
1 3 5 7 9 12 15 18 21 24 27 30
-15
-10
-50
510
15
Gen 10
N-gram freq
De
via
tio
n f
rom
exp
ecte
d f
req
9 12
15 18 26
28 27
Signature of “x3” or “x9” structure
(Tamariz, in prep)
29
Data from 4 different chains of languages
• OneoftheiniCalrandomlanguages
Evolution of the n-gram freq distr
Chain 1 Gen 0
Chain 1 Gen 0 FILTEREDCONDITION(preventshomonymy)
Kirby, Cornish & Smith (2008)
huhunigu wakiki nihu kekewa huwa kowagu muwapo wako muko kemuniwa pokikehu niguki komuhuke hukike
kokihuko powa hukeko kokeguke kihupo waguhuki koni kopo ponikiko kiwanike hukinimu pohumu kimu
• AfewgeneraConslater
Evolution of the n-gram freq distr
Chain 1 Gen 4
Kirby, Cornish & Smith (2008)
winekuki winukuki wikekuki winekiko winekiko wikiko wineko wuneko wikeko kunkuki hunekuki kunekuki kunkiko hunekiko
kunekiko kuneko huneko kuneko ponekuki punekuki ponekuki pokiko puniko pokiko poneko puneko poneko
FORMS
FILTEREDCONDITION(preventshomonymy)
Evolution of the n-gram freq distr
1 3 5 7 9 11 14 17 20 25
-15
-10
-50
510
15
Gen 0
N-gram freq
De
via
tio
n f
rom
exp
ecte
d f
req
(Tamariz, in prep)
Data from 4 different chains of languages
Evolution of the n-gram freq distr
1 3 5 7 9 11 15 18 21 24 27 30 40
-15
-10
-50
510
15
Gen 5
N-gram freq
De
via
tio
n f
rom
exp
ecte
d f
req
(Tamariz, in prep)
Data from 4 different chains of languages
Evolution of the n-gram freq distr
1 3 5 7 9 11 14 17 20 24 27 32 46
-15
-10
-50
510
15
Gen 10
N-gram freq
De
via
tio
n f
rom
exp
ecte
d f
req
9
13 18 27 33 21 22
(Tamariz, in prep)
Data from 4 different chains of languages
Signature of “x9” structure
1 3 5 7 9 12 15 18 21 24 27 30
-15
-10
-50
510
15
Gen 10
N-gram freq
Dev
iatio
n fr
om e
xpec
ted
freq
9 12
15 18 26
28 27
29
1 3 5 7 9 11 14 17 20 24 27 32 46
-15
-10
-50
510
15
Gen 10
N-gram freq
Devia
tion fro
m e
xpecte
d fre
q
9
13 18 27 33 21 22
FILTEREDCONDITION
UNFILTEREDCONDITION
Provide convergent evidence of adaptation of forms to the quantitative structure of the meaning space
Evolution of the n-gram frequency distribution
(Tamariz, in prep)
Compositionality
Chain 1 Gen 0
• Thenweobtaintheregularityofthemappingsbetweeneachsegment(wordbeginning/middle/end)andeachmeaning(shape/moCon/colour)
wi ne kuki wi nu kuki wi ke kuki
wi ne kiko wi ne kiko wi kiko
wi ne ko wu ne ko wi ke ko
ku n kuki hu ne kuki ku ne kuki
ku n kiko hu ne kiko ku ne kiko
ku ne ko hu ne ko ku ne ko
po ne kuki pu ne kuki po ne kuki
po kiko pu niko po kiko
po ne ko pu ne ko po ne ko
Chain 1 Gen 0 Chain 1 Gen 4
“In a compositional system, the meaning of a complex form is a function of the meanings of the components of the form plus the rules used to combine them”
• Kirby, Cornish & Smith quantified increase in structure
• Intuitively compositional
• This knowledge helps segment linguistic forms into meaningful units
(Cornish, Tamariz & Kirby, 2010)
Compositionality
0 1 2 3 4 5 6 7 8 9 10
0.0
0.2
0.4
0.6
0.8
1.0
WORD BEGINNING
Generation
RegMap
SHAPE
MOTION
COLOUR
0 1 2 3 4 5 6 7 8 9 10
0.0
0.2
0.4
0.6
0.8
1.0
WORD MIDDLE
Generation
RegMap
SHAPE
MOTION
COLOUR
0 1 2 3 4 5 6 7 8 9 10
0.0
0.2
0.4
0.6
0.8
1.0
WORD END
Generation
RegMap
SHAPE
MOTION
COLOUR
Integrate 9 graphs into a measure of compositionality
(Tamariz, in prep)
Compositionality
Division of labour:
Adaptation of word segments to capture different meanings
0 1 2 3 4 5 6 7 8 9 10
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
Generation
Compositionality
WHOLE LANGUAGE (3 SEGMENTS x 3 MEANINGS)
N-gram frequencies
!"
#"
$!"
$#"
%!"
%#"
&!"
!" $" %" &" '" #" (" )" *" +" $!"
!"#$%#&'()'*%&+)
,#&#"-.*&)
/#'*&0)'*&1*&-&+)!
,"
-"
."
/"
0"
1"
2"
Neutral evolution or selection?
!"
#"
$!"
$#"
%!"
%#"
&!"
!" $" %" &" '" #" (" )" *" +" $!"
!"#$%#&'()'*%&+)
,#&#"-.*&)
/+0)%&12"-3!
,"
-"
."
/"
0"
1"
2"
3"
Second consonant Third consonant
3 adaptive niches? !"
#"
$"
%"
&"
'!"
'#"
'$"
'%"
'&"
!" '" #" (" $" )" %" *" &" +" '!"
!"#$%#&'()'*%&+)
,#&#"-.*&)
!/&-0)+"/1"-2! ,-." /-."
01," 2-."
23/" /4."
/5," 06,"
03/" .4."
.3/" 21,"
,3/" 0-2"
0-." ,4."
,-0" 2-0"
2-2" 05."
,6," 07."
.80" 0-0"
,-2" .-,"
!"
#"
$"
%"
&"
'!"
'#"
'$"
!" '" #" (" $" )" %" *" &" +" '!"
!"#$%#&'()'*%&+)
,#&#"-.*&)
/0+)+1"##)'*&0*&-&+0! ,-." /-"01" 0/1"1," -/1"1-," -1"1/1" 11-".1" 111",/2" 03-"1." -0"1.-" 10/"113" -11"/31" -1/"110" .0,"1./" ,0",1" 1/"011" ,11"--/" ,/1"103"
!"
#"
$!"
$#"
%!"
%#"
!" $" %" &" '" #" (" )" *" +" $!"!"#$%#&'()'*%&+)
,#&#"-.*&)
/&0)1(22-32#!,-"
./"
,0"
,1"
.1"
20"
.0"
.-"
31"
45"
N-gram frequencies
The evolution of meaningful segments Word-initial bigrams, KCS’08 chain 1
po
ni
hu
ko
ki
wa ke
mu
pi
po
hu
ku ko
ki
wi
wa
pi
po
hu
ku
ki
wi
po
hu
ku
wi
po pu
hu
ku
wu
wi
po pu
hu
ku
wi
wa
po pu
ho
ku
wi
po pu
ho
ku
wi
pi
po pu
ho
hu
ku
wi
po pu
hu
wi
pu
hu
wi
4
2
5
6
3
3
2 2
4
7
5
2 2
1
2
2
9
7
2
2 7
9
4
5
9
3
2 3
2
1
1
6
0 1 2 3 4 5 6 7 8 9 10
6
3
3
6
1
8
2
7
4
5
6
3
2
7
3
5
9
4
6
3
3
9
7
5
6
9
9
5
13
Generations
(Cornish, Tamariz & Kirby, 2010)
• Replication - memorability
• Variation - recombination
• Selection - 3 meaning “niches” - vs. directed mutation
Units of evolution
• Form frequency structure and form-meaning regularity reveal high-fitness (memorable, replicable) units
- Perceptually salient
- Distinct from each other
- Meaningful
wi ne kuki wi nu kuki wi ke kuki
wi ne kiko wi ne kiko wi kiko
wi ne ko wu ne ko wi ke ko
ku n kuki hu ne kuki ku ne kuki
ku n kiko hu ne kiko ku ne kiko
ku ne ko hu ne ko ku ne ko
po ne kuki pu ne kuki po ne kuki
po kiko pu niko po kiko
po ne ko pu ne ko po ne ko
Chain 1 Gen 4
wi = black hu / ku = blue po / pu = red
kuki = kiko = ko =
Units of evolution
• Form frequency structure and form-meaning regularity reveal high-fitness (memorable, replicable) units
(units of evolution with respect to meaning)
In natural language:
Meanings Forms
Phonemes Semantic Words & Grammatical Constructions Social prestige Intonation Group identity Accent Attitude Volume … …
- Perceptually salient
- Distinct from each other
- Meaningful