1
Cognitive Modeling:How Humans Learn Complex Linguistic Systems
Lisa Pearl, UC IrvineMar 10, 2008
AIML Seminar SeriesCenter for Machine Learning & Intelligent Systems
UC Irvine
ML, AI, & Cognitive Modeling
Machine Learning: development ofalgorithms and techniques thatallow machines to learn, motivatedby capabilities of computers
ML, AI, & Cognitive Modeling
Artificial Intelligence & Learning:development of algorithms andtechniques that allow machines to learnlike humans, motivated by humanbehavior
Cognitive ModelingCognitive Modeling: development ofmodels that allow understanding of howallow understanding of howhumans learnhumans learn, attempting to simulatehuman behavior by using techniqueshumans use
Machine Learning: development ofalgorithms and techniques thatallow machines to learn, motivatedby capabilities of computers
ML, AI, & Cognitive Modeling
Artificial Intelligence & Learning:development of algorithms andtechniques that allow machines to learnlike humans, motivated by humanbehavior
Cognitive ModelingCognitive Modeling: development ofmodels that allow understanding of howallow understanding of howhumans learnhumans learn, attempting to simulatehuman behavior by using techniqueshumans use
Extraction (wordsegmentation):Swingley, 2005;Goldwater,Griffiths,& Johnson 2007 Machine Learning: development of
algorithms and techniques thatallow machines to learn, motivatedby capabilities of computers
ML, AI, & Cognitive Modeling
Artificial Intelligence & Learning:development of algorithms andtechniques that allow machines to learnlike humans, motivated by humanbehavior
Cognitive ModelingCognitive Modeling: development ofmodels that allow understanding of howallow understanding of howhumans learnhumans learn, attempting to simulatehuman behavior by using techniqueshumans use
Extraction (wordsegmentation):Swingley, 2005;Goldwater,Griffiths,& Johnson 2007
Categorization(phonemes):Vallabha et al. 2007
Machine Learning: development ofalgorithms and techniques thatallow machines to learn, motivatedby capabilities of computers
ML, AI, & Cognitive Modeling
Artificial Intelligence & Learning:development of algorithms andtechniques that allow machines to learnlike humans, motivated by humanbehavior
Cognitive ModelingCognitive Modeling: development ofmodels that allow understanding of howallow understanding of howhumans learnhumans learn, attempting to simulatehuman behavior by using techniqueshumans use
Extraction (wordsegmentation):Swingley, 2005;Goldwater,Griffiths,& Johnson 2007
Categorization(phonemes):Vallabha et al. 2007
Semi-supervised learning(inductive biases in causation):Masinghka et al. 2006
Machine Learning: development ofalgorithms and techniques thatallow machines to learn, motivatedby capabilities of computers
2
Cognitive Modeling of LanguageDifferent problems: more and lessless easily discernible from dataCategorization/Clustering Ex: What are the contrastive sounds of a language?
Vowel categories in English & Japanese
Hypothesis space: 3 dimensions of variation
English relevant dimensions: 1 and 2
Japanese relevant dimensions: 2 and 3
Vallabha et al. 2007
Cognitive Modeling of LanguageDifferent problems: more and lessless easily discernible from data
Extraction Ex: Where are words in fluent speech?
Categorization/Clustering Ex: What are the contrastive sounds of a language?
Assumption from experimental work:Relevant unit of word segmentationfor infants is the syllableWho’s afraid of the big bad wolf?
huw z´ frej d´v D´ bIg bQd w´lfwho ‘sa frai dof the big bad wolf
húwz´ fréjd´vD´ bÍg bQ‘d w´‘lfwho‘sa fraidofthe big bad wolf
huw z´ frejd ´v D´ bIgbQdw´lfwho ‘sa fraid of the bigbadwolf
Gambell &Yang 2006
Swingley 2005
húwz ´fréjd ´v D´ bÍg bQ‘d w´‘lfwho‘s afraid of the big bad wolf
Cognitive Modeling of LanguageDifferent problems: more and lessless easily discernible from data
Mapping What are the word affixes that signal meaning (e.g. past tense in English)?
Categorization/Clustering Ex: What are the contrastive sounds of a language?
Extraction Ex: Where are words in fluent speech?
blink~blinked ping~pinged confide~confidedblINk blINkt pIN pINd k´nfajd k´nfajd´d
drink~drank sing-sang hide-hiddrINk drejNk sIN sejN hajd hId
think~thoughtTINk Tçt
regularity
irregularity
Cognitive Modeling of LanguageDifferent problems: more and lessless easily discernible from dataCategorization/Clustering Ex: What are the contrastive sounds of a language?
Extraction Ex: Where are words in fluent speech?Mapping What are the word affixes that signal meaning (e.g. past tense in English)?
Observable data: word orderGenerative system: syntax
Subject Verb Object
Complex systemsComplex systems: What is the generative system that creates the observed(structured) data of language (ex: syntax, metrical phonology)?
Cognitive Modeling of LanguageDifferent problems: more and lessless easily discernible from dataCategorization/Clustering Ex: What are the contrastive sounds of a language?
Extraction Ex: Where are words in fluent speech?Mapping What are the word affixes that signal meaning (e.g. past tense in English)?
Observable data: word orderGenerative system: syntax
Subject Verb Object
Subject Verb Object
Subject Verb tSubject Object tVerb
English
GermanKannada
Subject tObject Verb Object
Complex systemsComplex systems: What is the generative system that creates the observed(structured) data of language (ex: syntax, metrical phonology)?
Cognitive Modeling of LanguageDifferent problems: more and lessless easily discernible from dataCategorization/Clustering Ex: What are the contrastive sounds of a language?
Extraction Ex: Where are words in fluent speech?Mapping What are the word affixes that signal meaning (e.g. past tense in English)?
Observable data: stress contourGenerative system: metrical phonology
EMphasis
Complex systemsComplex systems: What is the generative system that creates the observed(structured) data of language (ex: syntax, metrical phonology)?
3
Cognitive Modeling of LanguageDifferent problems: more and lessless easily discernible from data
Complex systemsComplex systems: What is the generative system that creates the observed(structured) data of language (ex: syntax, metrical phonology)?
Categorization/Clustering Ex: What are the contrastive sounds of a language?
Extraction Ex: Where are words in fluent speech?Mapping What are the word affixes that signal meaning (e.g. past tense in English)?
Observable data: stress contourGenerative system: metrical phonology
EMphasis
EM pha sis( H L ) H EM pha sis
( S S ) S
EM pha sis( S S S )
EM pha sis( H L L )
Cognitive Modeling of LanguageDifferent problems: more and lessless easily discernible from data
Complex systemsComplex systems: What is the generative system that creates the observed(structured) data of language (ex: syntax, metrical phonologymetrical phonology)?
Categorization/Clustering Ex: What are the contrastive sounds of a language?
Extraction Ex: Where are words in fluent speech?Mapping What are the word affixes that signal meaning (e.g. past tense in English)?
Today’s focus
Road MapIntroduction to complex linguistic systemsIntroduction to complex linguistic systems General problems Parametric systems Parametric metrical phonology
Learnability Learnability of complex linguistic systemsof complex linguistic systems General learnability framework Case study: English metrical phonology Available data & associated woes Unconstrained probabilistic learning Constrained probabilistic learning
Where next? Implications & Extensions
Road MapIntroduction to complex linguistic systemsIntroduction to complex linguistic systems General problems Parametric systems Parametric metrical phonology
Learnability Learnability of complex linguistic systemsof complex linguistic systems General learnability framework Case study: English metrical phonology Available data & associated woes Unconstrained probabilistic learning Constrained probabilistic learning
Where next? Implications & Extensions
General Problemswith Learning Complex Linguistic Systems
What children encounter: the output ofthe generative linguistic system EMphasis
General Problemswith Learning Complex Linguistic Systems
What children encounter: the output ofthe generative linguistic system EMphasis
What children must learn: thecomponents of the system thatcombine to generate thisobservable output EM pha sis
Are syllablesAre syllablesdifferentiated?differentiated?
Are all syllablesAre all syllablesincluded?included?
Which syllableWhich syllableof aof a larger unitlarger unitis stressed?is stressed?
4
General Problemswith Learning Complex Linguistic Systems
What children encounter: the output ofthe generative linguistic system
Why this is trickyWhy this is tricky: There is often a non-transparent relationshipbetween the observable form of the data and theunderlying system that produced it. Hard toHard toknow what parameters of variation to consider.know what parameters of variation to consider.
Moreover, data are often ambiguousdata are often ambiguous, even ifparameters of variation are known.
What children must learn: thecomponents of the system thatcombine to generate thisobservable output
((HH L L)) H H EM pha sis
((SS S S S S)) EM pha sis
Levels ofabstractstructure
EM pha sis
Are syllablesAre syllablesdifferentiated?differentiated?
Are all syllablesAre all syllablesincluded?included?
Which syllableWhich syllableof aof a larger unitlarger unitis stressed?is stressed?
EMphasis
General Problemswith Learning Complex Linguistic Systems
Hypothesis for a language consists of acombination of generalizationscombination of generalizations aboutthat language (grammargrammar). But thisleads to a theoretically infinitehypothesis space.
General Problemswith Learning Complex Linguistic Systems
Hypothesis for a language consists of acombination of generalizationscombination of generalizations aboutthat language (grammargrammar). But thisleads to a theoretically infinitehypothesis space.
Are syllables differentiated?Are syllables differentiated?{No, Yes-2 distinctions, Yes-3 distinctions, {No, Yes-2 distinctions, Yes-3 distinctions, ……}}
Are all syllables included?Are all syllables included?{Yes, No-not leftmost, No-not rightmost, {Yes, No-not leftmost, No-not rightmost, ……}}
Which syllable of aWhich syllable of a larger unit is stressed?larger unit is stressed?{Leftmost, Rightmost,{Leftmost, Rightmost, SecondSecond from Left,from Left,……}}
Rhyming matters?Rhyming matters?{No, Yes-every other, {No, Yes-every other, ……}}
General Problemswith Learning Complex Linguistic Systems
Hypothesis for a language consists of acombination of generalizationscombination of generalizations aboutthat language (grammargrammar). But thisleads to a theoretically infinitehypothesis space.
Are all syllables included?Are all syllables included?{Yes, No-not leftmost, No-not rightmost, {Yes, No-not leftmost, No-not rightmost, ……}}
Which syllable of aWhich syllable of a larger unit is stressed?larger unit is stressed?{Leftmost, Rightmost,{Leftmost, Rightmost, SecondSecond from Left,from Left,……}}
Are syllables differentiated?Are syllables differentiated?{No, Yes-2 distinctions, Yes-3 distinctions, {No, Yes-2 distinctions, Yes-3 distinctions, ……}}Observation:
Languages only differ in constrainedways from each other. Not allgeneralizations are possible.
Rhyming matters?Rhyming matters?{No, Yes-every other, {No, Yes-every other, ……}}
General Problemswith Learning Complex Linguistic Systems
Hypothesis for a language consists of acombination of generalizationscombination of generalizations aboutthat language (grammargrammar). But thisleads to a theoretically infinitehypothesis space.
Are all syllables included?Are all syllables included?{Yes, No-not leftmost, No-not rightmost}{Yes, No-not leftmost, No-not rightmost}
Which syllable of aWhich syllable of a larger unit is stressed?larger unit is stressed?{Leftmost, Rightmost}{Leftmost, Rightmost}
Are syllables differentiated?Are syllables differentiated?{No, Yes-2 distinctions, Yes-3 distinctions}{No, Yes-2 distinctions, Yes-3 distinctions}Observation:
Languages only differ in constrainedways from each other. Not allgeneralizations are possible.
Idea: Children’s hypotheses areconstrained so they only considergeneralizations that are possible inthe world’s languages.
Linguistic parameters = finite (if large)hypothesis space of possible grammarsChomsky (1981), Halle & Vergnaud (1987)
Learning Parametric Linguistic Systems
Linguistic parameters gives the benefit of a finite hypothesis space. Still,the hypothesis space can be quite large.
For example, assuming there are n binaryparameters, there are 2n core grammars tochoose from.
(Clark 1994)
Exponentially growing hypothesis space
5
Parametric Metrical Phonology
Metrical phonology:What tells you to put the EMEMphasis on a particular SYLSYLlable
Process speakers use: Basic input unit: syllables
Larger units formed: metrical feet The way these are formed varies from language to language. Only syllables in metrical feet can be stressed.
Stress assigned within metrical feet The way this is done also varies from language to language.
Observable Data: stress contour of word
em pha sis
(em pha) sis
(EM pha) sis
EMphasis
systemparameters ofvariation - to bedetermined bylearner fromavailable data
Parametric Metrical Phonology
Metrical phonology system here: 5 main parameters, 4 sub-parameters(adapted from Dresher 1999 and Hayes 1995)
Sub-parameters: optionsthat become available ifmain parameter value is acertain one
Most parameters involvemetrical foot formation
All combine to generate stress contour output
A Brief Tour of Parametric Metrical Phonology
Are syllables differentiated?Are syllables differentiated?
NoNo: system is quantity-insensitive (QIQI)lu di crous
CVV CV CCVC S S S S S S
A Brief Tour of Parametric Metrical Phonology
Are syllables differentiated?Are syllables differentiated?
NoNo: system is quantity-insensitive (QIQI)lu di crous
CVV CV CCVC S S S S S S
YesYes: system is quantity-sensitive (QSQS)
Only allowed method: differ by rime weight SyllableSyllable
onset rimerime
nucleus coda
crous krkr´s´s
krkr
´́ sslu di crous
CVV CV CCVC
A Brief Tour of Parametric Metrical Phonology
Are syllables differentiated?Are syllables differentiated?
NoNo: system is quantity-insensitive (QIQI)lu di crous
CVV CV CCVC S S S S S S
YesYes: system is quantity-sensitive (QSQS)
Only allowed method: differ by rime weight Only allowed number of divisions: 2 HHeavy vs. LLight
lu di crousCVV CV CCVC
H H L L HH
VV always HeavyVV always Heavy VV always Lightalways Light
Option 1: VC Heavy (QS-VC-H) (QS-VC-H)
lu di crousCVV CV CCVC
H H L L LL Option 2: VC Light (QS-VC-L) (QS-VC-L)
narrowing ofhypothesis space
A Brief Tour of Parametric Metrical Phonology
Are all syllables included inAre all syllables included inmetrical feet?metrical feet?
YesYes: system has no extrametricality (Em-NoneEm-None) af ter noonVC VC VV
L L L L HH( ( …… ))
6
A Brief Tour of Parametric Metrical Phonology
Are all syllables included inAre all syllables included inmetrical feet?metrical feet?
YesYes: system has no extrametricality (Em-NoneEm-None) af ter noonVC VC VV
L L L L HH( ( …… ))
NoNo: system has extrametricality (Em-SomeEm-Some)
Only allowed # of exclusions: 1 Only allowed exclusions: LeftLeftmost or RightRightmost syllable
narrowing ofhypothesis space
A Brief Tour of Parametric Metrical Phonology
Are all syllables included inAre all syllables included inmetrical feet?metrical feet?
YesYes: system has no extrametricality (Em-NoneEm-None) af ter noonVC VC VV
L L L L HH( ( …… ))
NoNo: system has extrametricality (Em-SomeEm-Some)
Only allowed # of exclusions: 1 Only allowed exclusions: LeftLeftmost or RightRightmost syllable
narrowing ofhypothesis space
a gen daV VC V
L L H H LL ( ( …… ) )
Leftmost syllableexcluded: Em-LeftEm-Left
lu di crousVV V VC
H H L L HH
( ( …… ))
Rightmost syllableexcluded: Em-RightEm-Right
A Brief Tour of Parametric Metrical Phonology
WhatWhat direction are metrical feet constructed?direction are metrical feet constructed?
From the leftFrom the left:Metrical feet are constructed from theleft edge of the word (Ft Dir LeftFt Dir Left)
From the rightFrom the right:Metrical feet are constructed from theright edge of the word (Ft Dir RightFt Dir Right)
Two logical options
H H LL HH
H H LL HH
((
) )
lu di crous
lu di crous
VV V VC
VV V VC
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
narrowing ofhypothesis space
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
L L L L L L H H LL
Ft Dir LeftFt Dir Left
( ( L L L L L L H H LL
( ( L L L L L L )()(H LH L
( ( L L L L L L )()(H LH L))
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
Ft Dir LeftFt Dir Left
( ( L L L L L L )()(H LH L)) L L L L L L H H LL
L L L L L L H H LL))
L L LL L L HH)) ((LL))
( ( L L LL L L HH) ) ((LL))
Ft Dir RightFt Dir Right
7
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
Ft Dir LeftFt Dir Left
((L L L L L L L LL L
Ft Dir Left/RightFt Dir Left/Right
((L L L L L L L LL L))
S S S S SS S S S S))
((SS S S S S SS SS))
( ( L L LL L L HH) ) ((LL))( ( L L L L L L )()(H LH L))
Ft Dir RightFt Dir Right
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
( ( L L L L L L )()(H LH L))
( ( L L LL L L HH) ) ((LL))((L L L L L L L LL L))
((SS S S S S SS SS))
NoNo: Metrical feet are restricted (BoundedBounded).
The size is restricted to 2 options: 2 or 3. narrowing ofhypothesis space
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
( ( L L L L L L )()(H LH L))
( ( L L LL L L HH) ) ((LL))((L L L L L L L LL L))
((SS S S S S SS SS))
NoNo: Metrical feet are restricted (BoundedBounded).
The size is restricted to 2 options: 2 or 3.
x x x xx x x x
2 units per foot (Bounded-2Bounded-2)
(( x x x x ) () (x x xx
(( x x x x ) () (x xx x))
x x x x x x xx
3 units per foot (Bounded-3Bounded-3)
(( x x xx xx) () ( x x
(( x x xx xx) () ( x x ))
Ft Dir LeftFt Dir Left
narrowing ofhypothesis space
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
( ( L L L L L L )()(H LH L))
( ( L L LL L L HH) ) ((LL))((L L L L L L L LL L))
((SS S S S S SS SS))
NoNo: Metrical feet are restricted (BoundedBounded).
The size is restricted to 2 options: 2 or 3.The counting units are restricted to 2 options:syllables or moras.
narrowing ofhypothesis space
(( x x x x ) () (x xx x))(( x x xx xx) () ( x x ))
B-2B-2
B-3B-3
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
( ( L L L L L L )()(H LH L))
( ( L L LL L L HH) ) ((LL))((L L L L L L L LL L))
((SS S S S S SS SS))
NoNo: Metrical feet are restricted (BoundedBounded).
The size is restricted to 2 options: 2 or 3.The counting units are restricted to 2 options:syllables or moras.
( ( H H LL)()(L HL H))Count by syllables(Bounded-SyllabicBounded-Syllabic)(( L L L L ) () (L L HH))
(( S SS S) () ( S S S S))
Ft Dir LeftFt Dir LeftBounded-2Bounded-2
narrowing ofhypothesis space
(( x x x x ) () (x xx x))(( x x xx xx) () ( x x ))
B-2B-2
B-3B-3
x xx x
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
( ( L L L L L L )()(H LH L))
( ( L L LL L L HH) ) ((LL))((L L L L L L L LL L))
((SS S S S S SS SS))
NoNo: Metrical feet are restricted (BoundedBounded).
The size is restricted to 2 options: 2 or 3.The counting units are restricted to 2 options:syllables or moras.
( ( H H LL)()(L HL H))
Count by syllables(Bounded-SyllabicBounded-Syllabic)
xx x xx x x x xxxx
Count by moras(Bounded-MoraicBounded-Moraic)
Ft Dir LeftFt Dir LeftBounded-2Bounded-2
narrowing ofhypothesis space
(( x x x x ) () (x xx x))(( x x xx xx) () ( x x ))
( ( H H ) ( ) ( LL LL) ( ) ( H H ))
H H LL L L HHMoras Moras (unit of weight):(unit of weight):HH = 2 moras xxxxLL = 1 mora xx
B-2B-2
B-3B-3
x xx x
8
A Brief Tour of Parametric Metrical Phonology
Are metrical feet unrestricted in size?Are metrical feet unrestricted in size?
YesYes: Metrical feet are unrestricted,delimited only by Heavy syllables ifthere are any (UnboundedUnbounded).
( ( L L L L L L )()(H LH L))
( ( L L LL L L HH) ) ((LL))((L L L L L L L LL L))
((SS S S S S SS SS))
NoNo: Metrical feet are restricted (BoundedBounded).
The size is restricted to 2 options: 2 or 3.The counting units are restricted to 2 options:syllables or moras.
( ( H H LL)()(L HL H))
Count by syllables(Bounded-SyllabicBounded-Syllabic)
Count by moras(Bounded-MoraicBounded-Moraic)
compare compare
narrowing ofhypothesis space
(( x x x x ) () (x xx x))(( x x xx xx) () ( x x ))
( ( H H ) ( ) ( LL LL) ( ) ( H H ))
Ft Dir LeftFt Dir LeftBounded-2Bounded-2
B-2B-2
B-3B-3
A Brief Tour of Parametric Metrical Phonology
Within aWithin a metrical foot, which syllable is stressed?metrical foot, which syllable is stressed?
LeftmostLeftmost:Stress the leftmost syllable (FtFt Hd Hd LeftLeft)
RightmostRightmost:Stress the rightmost syllable (FtFt Hd Hd RightRight)
Two options, hypothesis space restriction
( ( HH ) ( ) ( LL LL) ( ) ( HH ))
( ( HH ) () (L L LL) ( ) ( HH ))
( ( H H ) ( ) ( LL LL) ( ) ( H H ))
Generating a Stress Contour
em pha sis
Are syllablesAre syllablesdifferentiated?differentiated?
Yes.Yes.
VC syllables areVC syllables areHeavy.Heavy. VC CV CVC
Process speaker usesto generate stresscontour
HH LL HH
Generating a Stress Contour
em pha sis VC CV CVC
Process speaker usesto generate stresscontour
Are any syllablesAre any syllablesextrametrical?extrametrical?
Yes.Yes.
Rightmost syllable isRightmost syllable isnot included in metricalnot included in metricalfoot.foot.
HH LL HH( ( …… ))
Generating a Stress Contour
em pha sis VC CV CVC
Process speaker usesto generate stresscontour
Which direction areWhich direction arefeet constructed from?feet constructed from?
From the right.From the right.
HH LL) ) HH
Generating a Stress Contour
em pha sis VC CV CVC
Process speaker usesto generate stresscontour
Are feet unrestricted?Are feet unrestricted?
No.No.
2 syllables per foot.2 syllables per foot.
((HH LL) ) HH
9
Generating a Stress Contour
em pha sis VC CV CVC
Process speaker usesto generate stresscontour
Which syllable of theWhich syllable of thefoot is stressed?foot is stressed?
Leftmost.Leftmost.
((HH LL) ) HH ((HH LL) ) HH
Generating a Stress Contour
EM pha sis VC CV CVC
Process speaker usesto generate stresscontour
Learner’s task: Figureout which parametervalues were used togenerate this contour.
Road MapIntroduction to complex linguistic systemsIntroduction to complex linguistic systems General problems Parametric systems Parametric metrical phonology
Learnability Learnability of complex linguistic systemsof complex linguistic systems General learnability framework Case study: English metrical phonology Available data & associated woes Unconstrained probabilistic learning Constrained probabilistic learning
Where next? Implications & Extensions
Choosing among grammars
Human learning seems to be gradualand somewhat robust to noise - needsome probabilistic learning componentprobabilistic learning component
Since grammars are parameterizedparameterized, child canmake use of this information to constrainconstrainhypothesis spacehypothesis space. Learn over parameters, notentire parameter value sets.
or
or
or
?
?
?
probabilisticprobabilistic learninglearningover parameterover parametervaluesvalues
A caveat about learning parameters separately
Parameters are system components that combinetogether to generate output.
Choice of one parameter may influence choice ofsubsequent parameters.
or
or
or
?
?
?
A caveat about learning parameters separately
or
or
or
?
?
?
1 Parameters are system components that combinetogether to generate output.
Choice of one parameter may influence choice ofsubsequent parameters.
10
A caveat about learning parameters separately
or
or
or
?
?
?1
Parameters are system components that combinetogether to generate output.
Choice of one parameter may influence choice ofsubsequent parameters.
A caveat about learning parameters separately
or
or
or
?
?
?
Point: The order in which parametersare set may determine if they are setcorrectly from the data.
Dresher 1999
Parameters are system components that combinetogether to generate output.
Choice of one parameter may influence choice ofsubsequent parameters.
The learning framework: 3 components
(1) Hypothesis spaceHypothesis space
(2) DataData
(3) Update procedureUpdate procedure
input
0.5 0.5
0.5 0.5
0.5 0.5
d d d
dd
ddd
dd
dd
d
0.3 0.7
0.6 0.4
0.5 0.5
Key point for cognitive modeling: psychological plausibility
Any probabilistic update procedure must, at the very least, beincremental/online.
Why? Humans (especially human children) don’t have infinite memory.
Unlikely: human children can hold awhole corpus worth of data in theirminds for analysis later on
Models that do this are AI (notcognitive modeling) - they cansimulate human behavior, but notnecessarily the way humans produceit
(ex: Foraker et al. 2007, Goldwater etal. 2007)
inputd d d
dd
ddd
dd
dd
Two psychologically plausibleprobabilistic update procedures
Naïve Parameter Learner (NParLearnerNParLearner)
Probabilistic generation & testing of parameter valuecombinations. (incremental)
Hypothesis update: Linear reward-penaltyLinear reward-penalty(Bush & Mosteller 1951)
Yang (2002)
Two psychologically plausibleprobabilistic update procedures
Naïve Parameter Learner (NParLearnerNParLearner)
Probabilistic generation & testing of parameter valuecombinations. (incremental)
Hypothesis update: Linear reward-penaltyLinear reward-penalty(Bush & Mosteller 1951)
Yang (2002)
Bayesian Learner (BayesLearnerBayesLearner)
Probabilistic generation & testing of parameter valuecombinations. (incremental)
Hypothesis update: Bayesian updatingBayesian updating(Chew 1971: binomial distribution)
11
Case study: English metrical phonology Adult English system values:
QSQS, QSVCHQSVCH, Em-SomeEm-Some, Em-RightEm-Right, Ft Dir RightFt Dir Right, BoundedBounded,Bounded-2Bounded-2, Bounded-SyllabicBounded-Syllabic, Ft Ft Hd Hd LeftLeft
Estimate of child input: caretaker speech to childrenbetween the ages of 6 months and 2 years (CHILDES[Brent & Bernstein-Ratner corpora]: MacWhinney 2000)
Total Words: 540505 Mean Length of Utterance: 3.5
Words parsed into syllables using the MRCPsycholinguistic database (Wilson, 1988) and assignedlikely stress contours using the American EnglishCALLHOME database of telephone conversation(Canavan et al., 1997)
English Data
Em-SomeEm-Some
Em-SomeEm-Some
QSQSEm-SomeEm-Some
QSQS
BoundedBounded
Em-SomeEm-SomeEm-SomeEm-Some Em-SomeEm-Some
BoundedBounded
Ft Ft Hd Hd LeftLeft
Ft Ft Hd Hd LeftLeft
Ft Dir Ft Dir RtRt
B-2B-2
B-SylB-Syl
English Data
QIQIEm-SomeEm-Some
Em-NoneEm-None
Em-SomeEm-Some
QSQS
UnbUnb
Em-SomeEm-Some
Em-NoneEm-None
QIQI QSQS
BoundedBounded
Em-SomeEm-SomeEm-SomeEm-Some Em-SomeEm-Some
Em-NoneEm-NoneBoundedBounded
Ft Ft Hd Hd LeftLeft
Ft Ft Hd Hd LeftLeft
Ft Dir Ft Dir RtRt
Ft Dir LeftFt Dir Left B-2B-2
B-SylB-Syl
B-MorB-Mor
Case study: English metrical phonology
Non-trivial language: English (full of exceptionsexceptions) Noisy data: 27%27% incompatible with correct English grammar on at least
one parameter value
Exceptions:QIQI, QSVCLQSVCL, Em-NoneEm-None, Ft Dir LeftFt Dir Left, UnboundedUnbounded,Bounded-3Bounded-3, Bounded-MoraicBounded-Moraic, Ft Ft Hd Hd RightRight
Adult English system values:QSQS, QSVCHQSVCH, Em-SomeEm-Some, Em-RightEm-Right, Ft Dir RightFt Dir Right, BoundedBounded,Bounded-2Bounded-2, Bounded-SyllabicBounded-Syllabic, Ft Ft Hd Hd LeftLeft
Hard - therefore interesting!
Probabilistic learning for EnglishProbabilistic generation and testing of parameter values (Yang 2002)
For each parameter, the learner associates a probability with each ofthe competing parameter values.
QI = 0.5QI = 0.5 QS = 0.5QS = 0.5QSVCL = 0.5QSVCL = 0.5 QSVCH = 0.5QSVCH = 0.5Em-Some Em-Some = 0.5= 0.5 Em-None Em-None = 0.5= 0.5Em-Left Em-Left = 0.5= 0.5 Em-Right Em-Right = 0.5= 0.5Ft Dir Left = 0.5Ft Dir Left = 0.5 Ft Dir Ft Dir Rt Rt = 0.5= 0.5BoundedBounded = 0.5= 0.5 Unbounded = 0.5Unbounded = 0.5Bounded-2 = 0.5Bounded-2 = 0.5 Bounded-3 = 0.5Bounded-3 = 0.5Bounded-Syl Bounded-Syl = 0.5= 0.5 Bounded-Mor Bounded-Mor = 0.5= 0.5Ft Ft Hd Hd Left = 0.5Left = 0.5 Ft Ft Hd Rt Hd Rt = 0.5= 0.5
Initially all are equiprobable
Probabilistic learning for English
For each data point encountered, the learner probabilistically generates a setof parameter values (grammar).
Probabilistic generation and testing of parameter values (Yang 2002)
AFterNOONQI = 0.5QI = 0.5 QS = 0.5QS = 0.5QSVCL = 0.5QSVCL = 0.5 QSVCH = 0.5QSVCH = 0.5Em-Some Em-Some = 0.5= 0.5 Em-None Em-None = 0.5= 0.5Em-Left Em-Left = 0.5= 0.5 Em-Right Em-Right = 0.5= 0.5Ft Dir Left = 0.5Ft Dir Left = 0.5 Ft Dir Ft Dir Rt Rt = 0.5= 0.5BoundedBounded = 0.5= 0.5 Unbounded = 0.5Unbounded = 0.5Bounded-2 = 0.5Bounded-2 = 0.5 Bounded-3 = 0.5Bounded-3 = 0.5Bounded-Syl Bounded-Syl = 0.5= 0.5 Bounded-Mor Bounded-Mor = 0.5= 0.5Ft Ft Hd Hd Left = 0.5Left = 0.5 Ft Ft Hd Rt Hd Rt = 0.5= 0.5
QI/QS?QI/QS?……if QS, QSVCL or QSVCH?if QS, QSVCL or QSVCH?Em-None/Em-SomeEm-None/Em-Some??…………
QSQS, , QSVCLQSVCL, , Em-NoneEm-None, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2, Bounded-SylBounded-Syl, Ft Hd RightFt Hd Right
12
Probabilistic learning for English
The learner then uses this grammar to generate a stress contour for theobserved data point.
Probabilistic generation and testing of parameter values (Yang 2002)
AFterNOON
QSQS, , QSVCLQSVCL, , Em-NoneEm-None, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2, Bounded-SylBounded-Syl, Ft Hd RightFt Hd Right
If the generated stress contour matches the observed stresscontour, the grammar successfully “parses” the data point. Allparticipating parameter values are rewarded.
((LL) ) ((L L HH))
AF ter NOON
VC CVC CVVC
reward all
Probabilistic learning for English
The learner then uses this grammar to generate a stress contour for theobserved data point.
Probabilistic generation and testing of parameter values (Yang 2002)
AFterNOONQSQS, , QSVCLQSVCL, , Em-NoneEm-None,Ft Dir RightFt Dir Right, BoundedBounded,Bounded-2Bounded-2, Bounded-SylBounded-Syl,Ft Hd RightFt Hd Right
((LL) ) ((L L HH))
AF ter NOON VC CVC CVVC
If the generated stress contour does not match the observed stress contour, thegrammar does not successfully “parse” the data point. All participatingparameter values are punished.
reward all
QSQS, , QSVCLQSVCL, , Em-NoneEm-None, Ft Dir LeftFt Dir Left, BoundedBounded, Bounded-2Bounded-2, Bounded-SylBounded-Syl, Ft Hd RightFt Hd Right
((LL LL)) ((HH))
af TER NOON
VC CVC CVVC
punish all
Probabilistic learning for English
The learner then uses this grammar to generate a stress contour for theobserved data point.
Probabilistic generation and testing of parameter values (Yang 2002)
AFterNOONQSQS, , QSVCLQSVCL, , Em-NoneEm-None,Ft Dir RightFt Dir Right, BoundedBounded,Bounded-2Bounded-2, Bounded-SylBounded-Syl,Ft Hd RightFt Hd Right
((LL) ) ((L L HH))
AF ter NOON VC CVC CVVC
reward all
QSQS, , QSVCLQSVCL, , Em-NoneEm-None,Ft Dir LeftFt Dir Left, BoundedBounded,Bounded-2Bounded-2, Bounded-Bounded-SylSyl, Ft Hd RightFt Hd Right
((LL LL)) ((HH))
af TER NOON VC CVC CVVC
punish all
Probabilistic learning for EnglishProbabilistic generation and testing of parameter values (Yang 2002)
Update parameter value probabilities
NParLearner (Yang 2002): Linear Reward-Penalty
Learning rate γ:small = small changeslarge = large changes
!
pv1 = pv1 + (1- pv1)
pv2 = 1- pv1
!
pv1 = (1- ")pv1
pv2 = 1- pv1
Parameter values v1 vs. v2
reward v1 punish v1
Probabilistic learning for EnglishProbabilistic generation and testing of parameter values (Yang 2002)
Update parameter value probabilities
NParLearner (Yang 2002): Linear Reward-Penalty
Learning rate γ:small = small changeslarge = large changes
!
pv1 = pv1 + "(1- pv1)
pv2 = 1- pv1
!
pv1 = (1- ")pv1
pv2 = 1- pv1
Parameter values v1 vs. v2
reward v1 punish v1
BayesLearner: Bayesian update of binomial distribution (Chew 1971)
!
pv =" +1+ successes
" + # + 2 + total data seen
Parameter value v
reward: success + 1 punish: success + 0
Parameters α, β:
α = β: initial bias at p = 0.5α, β < 1: initial bias towardendpoints (p = 0.0, 1.0)
here: α = β = 0.5
Probabilistic learning for EnglishProbabilistic generation and testing of parameter values (Yang 2002)
Update parameter value probabilities
After learning: expect probabilities of parameter values to convergenear endpoints (above/below some threshold).
QI = 0.3QI = 0.3 QS = 0.7QS = 0.7QSVCL = 0.6QSVCL = 0.6 QSVCH = 0.4QSVCH = 0.4Em-Some Em-Some = 0.1= 0.1 Em-None Em-None = 0.9= 0.9
…
13
Once set, a parameter value is always used during generation,since its probability is 1.0.
Probabilistic learning for EnglishProbabilistic generation and testing of parameter values (Yang 2002)
Update parameter value probabilities
After learning: expect probabilities of parameter values to convergenear endpoints (above/below some threshold).
QI = 0.3QI = 0.3 QS = 0.7QS = 0.7QSVCL = 0.6QSVCL = 0.6 QSVCH = 0.4QSVCH = 0.4Em-Some Em-Some = 0.1= 0.1 Em-None Em-None = 0.9= 0.9
…
Em-None Em-None = 1.0= 1.0
QI/QS?QI/QS?……if QS, QSVCL or QSVCH?if QS, QSVCL or QSVCH?Em-NoneEm-None……
QSQS, , QSVCLQSVCL, , Em-NoneEm-None, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2, Bounded-SylBounded-Syl, Ft Hd RightFt Hd Right
Probabilistic learning for EnglishGoal: Converge on Englishvalues after learning period isover
Learning Period Length: 1,160,000 words(based on estimates of words heard in a 6month period, using Akhtar et al. (2004)).
QSQS, QSVCHQSVCH, Em-SomeEm-Some, Em-RightEm-Right, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2,Bounded-SyllabicBounded-Syllabic, Ft Ft Hd Hd LeftLeft
Probabilistic learning for EnglishGoal: Converge on Englishvalues after learning period isover
Learning Period Length: 1,160,000 words(based on estimates of words heard in a 6month period, using Akhtar et al. (2004)).
QSQS, QSVCHQSVCH, Em-SomeEm-Some, Em-RightEm-Right, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2,Bounded-SyllabicBounded-Syllabic, Ft Ft Hd Hd LeftLeft
Success rate (1000 runs)Model
0.0%BayesLearner1.2%NParLearner, 0.01 ≤ γ ≤ 0.05
Examples of incorrect target grammars NParLearner: Em-NoneEm-None, Ft Hd Left, UnbUnb, Ft Dir LeftFt Dir Left, QIQI QS, Em-NoneEm-None, QSVCH, Ft Dir Rt, Ft Hd Left, B-MorB-Mor, Bounded, Bounded-2
BayesLearner: QS, Em-Some, Em-Right, QSVCH, Ft Hd Left, Ft Dir Rt, UnbUnb Bounded, B-Syl, QIQI, Ft Hd Left, Em-NoneEm-None, Ft Dir Left, Ft Dir Left, B-2
Probabilistic learning for English: ModificationsProbabilistic generation and testing of parameter values (Yang 2002)
Update parameter value probabilities
Batch-learning (for very small batch sizes): smooth out some of theirregularities in the data
Implementation (Yang 2002): Success = increase parameter value’s batch counter by 1 Failure = decrease parameter value’s batch counter by 1
Invoke update procedure (Linear Reward-Penalty or BayesianUpdating) when batch limit b is reached. Then, reset parameter’sbatch counters.
Probabilistic generation and testing of parameter values (Yang 2002)
Update parameter value probabilities + Batch Learning
NParLearner (Yang 2002): Linear Reward-Penalty
Invoke when the batchcounter for pv1 or pv2equals b.
!
pv1 = pv1 + "(1- pv1)
pv2 = 1- pv1
!
pv1 = (1- ")pv1
pv2 = 1- pv1
Parameter values v1 vs. v2
reward v1 punish v1
BayesLearner: Bayesian update of binomial distribution (Chew 1971)
!
pv =" +1+ successes
" + # + 2 + total data seen
Parameter value v
reward: success + 1 punish: success + 0
Invoke when the batchcounter for pv1 or pv2 equals b.
Note: total data seen + 1
Probabilistic learning for English: ModificationsGoal: Converge on Englishvalues after learning period isover
Learning Period Length: 1,160,000 words(based on estimates of words heard in a 6month period, using Akhtar et al. (2004)).
QSQS, QSVCHQSVCH, Em-SomeEm-Some, Em-RightEm-Right, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2,Bounded-SyllabicBounded-Syllabic, Ft Ft Hd Hd LeftLeft
Probabilistic learning for English
Success rate (1000 runs)Model
0.0%BayesLearner1.2%NParLearner, 0.01 ≤ γ ≤ 0.05
14
Goal: Converge on Englishvalues after learning period isover
Learning Period Length: 1,160,000 words(based on estimates of words heard in a 6month period, using Akhtar et al. (2004)).
QSQS, QSVCHQSVCH, Em-SomeEm-Some, Em-RightEm-Right, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2,Bounded-SyllabicBounded-Syllabic, Ft Ft Hd Hd LeftLeft
Probabilistic learning for English
1.0%BayesLearner + Batch,2 ≤ b ≤ 10
0.8%NParLearner + Batch,0.01 ≤ γ ≤ 0.05, 2 ≤ b ≤ 10
Success rate (1000 runs)Model
0.0%BayesLearner1.2%NParLearner, 0.01 ≤ γ ≤ 0.05
Probabilistic learning for English: ModificationsProbabilistic generation and testing of parameter values (Yang 2002)
Update parameter value probabilities + Batch Learning
Learner bias: metrical phonology relies in part on knowledge of rhythmicalproperties of the language
Human infants may already have knowledge of Ft Ft Hd Hd LeftLeft (Jusczyk,Cutler, & Redanz (1993) and QS QS (Turk, Jusczyk, & Gerken (1995).
Build this bias into a modelBuild this bias into a model: set probability of QS = Ft Hd Left = 1.0.These will always be chosen during generation.
QSQS……QSVCL or QSVCH?QSVCL or QSVCH?……Ft Ft Hd Hd LeftLeft
QSQS, , QSVCLQSVCL, , Em-NoneEm-None, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2, Bounded-SylBounded-Syl, Ft Hd LeftFt Hd Left
Probabilistic learning for EnglishGoal: Converge on Englishvalues after learning period isover
Learning Period Length: 1,160,000 words(based on estimates of words heard in a 6month period, using Akhtar et al. (2004)).
QSQS, QSVCHQSVCH, Em-SomeEm-Some, Em-RightEm-Right, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2,Bounded-SyllabicBounded-Syllabic, Ft Ft Hd Hd LeftLeft
1.0%BayesLearner + Batch,2 ≤ b ≤ 10
0.8%NParLearner + Batch,0.01 ≤ γ ≤ 0.05, 2 ≤ b ≤ 10
Success rate (1000 runs)Model
0.0%BayesLearner
1.2%NParLearner, 0.01 ≤ γ ≤ 0.05
Goal: Converge on Englishvalues after learning period isover
Learning Period Length: 1,160,000 words(based on estimates of words heard in a 6month period, using Akhtar et al. (2004)).
QSQS, QSVCHQSVCH, Em-SomeEm-Some, Em-RightEm-Right, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2,Bounded-SyllabicBounded-Syllabic, Ft Ft Hd Hd LeftLeft
Probabilistic learning for English
1.0%BayesLearner + Batch + Bias,2 ≤ b ≤ 10
5.0%NParLearner + Batch + Bias,0.01 ≤ γ ≤ 0.05, 2 ≤ b ≤ 10
1.0%BayesLearner + Batch,2 ≤ b ≤ 10
0.8%NParLearner + Batch,0.01 ≤ γ ≤ 0.05, 2 ≤ b ≤ 10
Success rate (1000 runs)Model
0.0%BayesLearner
1.2%NParLearner, 0.01 ≤ γ ≤ 0.05
Goal: Converge on Englishvalues after learning period isover
Learning Period Length: 1,160,000 words(based on estimates of words heard in a 6month period, using Akhtar et al. (2004)).
QSQS, QSVCHQSVCH, Em-SomeEm-Some, Em-RightEm-Right, Ft Dir RightFt Dir Right, BoundedBounded, Bounded-2Bounded-2,Bounded-SyllabicBounded-Syllabic, Ft Ft Hd Hd LeftLeft
Probabilistic learning for English
1.0%BayesLearner + Batch + Bias,2 ≤ b ≤ 10
5.0%NParLearner + Batch + Bias,0.01 ≤ γ ≤ 0.05, 2 ≤ b ≤ 10
1.0%BayesLearner + Batch,2 ≤ b ≤ 10
0.8%NParLearner + Batch,0.01 ≤ γ ≤ 0.05, 2 ≤ b ≤ 10
Success rate (1000 runs)Model
0.0%BayesLearner
1.2%NParLearner, 0.01 ≤ γ ≤ 0.05
The bestisn’t sogreat
Where else can we modify?
(1) Hypothesis spaceHypothesis space
(2) DataData
(3) Update procedureUpdate procedure
input
0.5 0.5
0.5 0.5
0.5 0.5
d d d
dd
ddd
dd
dd
d
0.3 0.7
0.6 0.4
0.5 0.5
15
Where else can we modify?
(1) Hypothesis spaceHypothesis space
(2) DataData
(3) Update procedureUpdate procedure
input
0.5 0.5
0.5 0.5
0.5 0.5
d d d
dd
ddd
dd
dd
d
0.3 0.7
0.6 0.4
0.5 0.5
Linear Reward-Penalty,Bayesian, Batch…
Where else can we modify?
(1) Hypothesis spaceHypothesis space
(2) DataData
(3) Update procedureUpdate procedure
input
1.0 0.0
0.5 0.5
1.0 0.0
d d d
dd
ddd
dd
dd
d
0.3 0.7
0.6 0.4
0.5 0.5
Prior knowledge, biases:QS, Ft Hd Left known…
Linear Reward-Penalty,Bayesian, Batch…
Where else can we modify?
(1) Hypothesis spaceHypothesis space
(2) DataData
(3) Update procedureUpdate procedure
input
1.0 0.0
0.5 0.5
1.0 0.0
d d d
dd
ddd
dd
dd
d
0.3 0.7
0.6 0.4
0.5 0.5
Prior knowledge, biases:QS, Ft Hd Left known…
Linear Reward-Penalty,Bayesian, Batch…
What about the data thelearner uses?
Data Intake Filtering“Selective Learning”
“Equal Opportunity” Intuition: Use allavailable data to uncover a full range ofsystematicity, and allow probabilisticmodel enough data to converge.
intake
inputd d d
d
dd
dd
dd
inputd
d dd
d
dd d
“Selective” Intuition: Use the really good data only.
One instantiation of “really good” = highly informative.
One instantiation of “highly informative” = data viewed bythe learner as unambiguous (Fodor, 1998; Dresher,1999; Lightfoot, 1999; Pearl & Weinberg, 2007)
Where else can we modify?
(1) Hypothesis spaceHypothesis space
(2) DataData
(3) Update procedureUpdate procedure
input
1.0 0.0
0.5 0.5
1.0 0.0
d d d
dd
ddd
dd
dd
d
0.3 0.7
0.6 0.4
0.5 0.5
Prior knowledge, biases:QS, Ft Hd Left known…
Linear Reward-Penalty,Bayesian, Batch…
What about the data thelearner uses?
Where else can we modify?
(1) Hypothesis spaceHypothesis space
(2) DataData
(3) Update procedureUpdate procedure
input
1.0 0.0
0.5 0.5
1.0 0.0
d d d
dd
ddd
dd
dd
d
0.3 0.7
0.6 0.4
0.5 0.5
Prior knowledge, biases:QS, Ft Hd Left known…
Linear Reward-Penalty,Bayesian, Batch…
Data intake filter
intakeinput
dd d
d
d
dd d
16
Practical matters:Feasibility of unambiguous data
“It is unlikely that any example … wouldshow the effect of only a single parametervalue; rather, each example is the result ofthe interaction of several different principlesand parameters”
(S S) (S) af ter noon
AFterNOON
(L L) (H) af ter noon
(L) (L H) af ter noon
Clark 1994
Existence?
Even if unambiguous data existed, how could achild identify them?
Identification?
What’s the same here,other than the output?
Practical matters:Feasibility of unambiguous data
Existence? Depends on data set (empirically determined).
Practical matters:Feasibility of unambiguous data
Existence? Depends on data set (empirically determined).
Identification?
Identifying unambiguous data: CuesCues (Dresher 1999; Lightfoot 1999): heuristic pattern-matching to observableform of the data. Cues are available for each parameter value, known already bythe learner.
S…S af ter noon Em-None
Practical matters:Feasibility of unambiguous data
Existence? Depends on data set (empirically determined).
Identification?
Identifying unambiguous data: CuesCues (Dresher 1999; Lightfoot 1999): heuristic pattern-matching to observableform of the data. Cues are available for each parameter value, known already bythe learner.
ParsingParsing (Fodor 1998; Sakas & Fodor 2001): extract necessary parameter valuesfrom all successful parses of data point
(QIQI, , Em-NoneEm-None, Ft Dir LeftFt Dir Left, Ft Hd LeftFt Hd Left, BB, B-2B-2, B-SylB-Syl)(QSQS, , QSVCLQSVCL, , Em-NoneEm-None, Ft Dir LeftFt Dir Left, Ft Ft Hd Hd LeftLeft, BB, B-2B-2, B-SylB-Syl)
S…S af ter noon Em-None
Em-NoneEm-None, , Ft DirFt DirLeftLeft, , Ft Ft Hd Hd LeftLeft,,Bounded, Bounded-Bounded, Bounded-2, 2, Bounded-SylBounded-Syl
Practical matters:Feasibility of unambiguous data
Existence? Depends on data set (empirically determined).
Identification?
Identifying unambiguous data: CuesCues (Dresher 1999; Lightfoot 1999): heuristic pattern-matching to observableform of the data. Cues are available for each parameter value, known already bythe learner
ParsingParsing (Fodor 1998; Sakas & Fodor 2001): extract necessary parameter valuesfrom all successful parses of data point
(QIQI, , Em-NoneEm-None, Ft Dir LeftFt Dir Left, Ft Hd LeftFt Hd Left, BB, B-2B-2, B-SylB-Syl)(QSQS, , QSVCLQSVCL, , Em-NoneEm-None, Ft Dir LeftFt Dir Left, Ft Ft Hd Hd LeftLeft, BB, B-2B-2, B-SylB-Syl)
S…S af ter noon Em-None
Both operate over a single data point at a time:compatible with incremental learning
Em-NoneEm-None, , Ft DirFt DirLeftLeft, , Ft Ft Hd Hd LeftLeft,,Bounded, Bounded-Bounded, Bounded-2, 2, Bounded-SylBounded-Syl
Probabilistic learning from unambiguous data(Pearl 2008)
Each parameter has 2 values.
17
Probabilistic learning from unambiguous data(Pearl 2008)
Each parameter has 2 values.
Advantage in data: How much more unambiguousdata there is for one value over the other in the datadistribution.
Assumption (Yang 2002):The value with the greater advantage will be theone a probabilistic learner will converge on overtime.
has advantage
inputintake intake
Allows us to be fairly agnostic about the exact natureof the probabilistic learning, provided it has thisbehavior.
Probabilistic learning from unambiguous data(Pearl 2008)
The order in which parameters are set may determine ifthey are set correctly from the data.
Dresher 1999
Probabilistic learning from unambiguous data(Pearl 2008)
The order in which parameters are set may determine ifthey are set correctly from the data.
Dresher 1999
ParsingGroup 1:QSQS, Ft Ft Hd Hd LeftLeft, BoundedBoundedGroup 2:Ft Dir RightFt Dir Right, QS-VC-HeavyQS-VC-HeavyGroup 3:Em-Some, Em-RightEm-Some, Em-Right, Bounded-2,Bounded-2,
Bounded-SylBounded-Syl
The parameters are freely orderedw.r.t. each other within each group.
Cues(a)(a) QS-VC-HeavyQS-VC-Heavy
before Em-RightEm-Right(b)(b) Em-RightEm-Right
before Bounded-SylBounded-Syl(c)(c) Bounded-2Bounded-2
before Bounded-SylBounded-Syl
The rest of the parameters are freelyordered w.r.t. each other.
Success guaranteed as long as parameter-setting order constraints are followed.
Road MapIntroduction to complex linguistic systemsIntroduction to complex linguistic systems General problems Parametric systems Parametric metrical phonology
Learnability Learnability of complex linguistic systemsof complex linguistic systems General learnability framework Case study: English metrical phonology Available data & associated woes Unconstrained probabilistic learning Constrained probabilistic learning
Where next? Implications & Extensions
Where we are nowCognitive modeling: aimed at understanding howhumans solve problems, generating human behaviorby using psychologically plausible methods
Language: learning complex systems isdifficult. Success comes from integratingbiases into probabilistic learning models. Bias on hypothesis space:
linguistic parameters alreadyknown, some values already known
Bias on data:interpretive bias to usehighly informative data 0.7 0.3
0.5 0.5
0.8 0.2
inputintake intake
Where we can go(1) Interpretive bias: How successful on other difficult learning cases (noisy datasets, other complex systems)? Are there other methods of implementing interpretative biasesthat lead to successful learning? How necessary is an interpretive bias? Are there clevererprobabilistic learning methods than can succeed?
+ biases?
18
Where we can go(1) Interpretive bias: How successful on other difficult learning cases (noisy datasets, other complex systems)? Are there other methods of implementing interpretative biasesthat lead to successful learning? How necessary is an interpretive bias? Are there clevererprobabilistic learning methods than can succeed?
(2) Hypothesis space bias: Is it possible to infer the correct parameters of variation givenless structured information a priori (e.g. larger units thansyllables are required)? [Model Selection]
+ biases?
+ fewer biases?
Where we can go(1) Interpretive bias: How successful on other difficult learning cases (noisy datasets, other complex systems)? Are there other methods of implementing interpretative biasesthat lead to successful learning? How necessary is an interpretive bias? Are there clevererprobabilistic learning methods than can succeed?
(2) Hypothesis space bias: Is it possible to infer the correct parameters of variation givenless structured information a priori (e.g. larger units thansyllables are required)? [Model Selection]
(3) Informing AI/ML: Can we import the necessary biases for learning complex
systems into language applications (ex: speech generation)?
+ biases?
+ fewer biases?
necessary biases
The big idea
Complex linguistic systems may wellrequire something beyond probabilisticmethods in order to be learned, andlearned as well as humans learn them.
What this likely is: learner biases inhypothesis space and data intake(how to deploy probabilistic learning)
What we can do: take insights fromcognitive modeling and apply them toproblems in artificial intelligence andmachine learning, & vice versa
Thank YouAmy Weinberg Jeff LidzBill Idsardi Charles YangBill Sakas Janet Fodor
The audiences at
University of California, Los Angeles Linguistics DepartmentUniversity of Southern California Linguistics DepartmentBUCLD 32UC Irvine Language Learning GroupUC Irvine Department of Cognitive SciencesCUNY Psycholinguistics Supper ClubUDelaware Linguistics DepartmentYale Linguistics DepartmentUMaryland Cognitive Neuroscience of Language Lab