Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | dennis-anthony |
View: | 223 times |
Download: | 0 times |
[TACL] Modeling Word Forms Using Latent Underlying Morphs and Phonology
Ryan Cotterell and Nanyun Peng and Jason Eisner
1
What is Phonology?
2
What is Phonology?
3
What is Phonology?
4
What is Phonology?
5
What is Phonology?
6
What is Phonology?
7
What is Phonology?
[kæt]Phonology:
Orthography: cat
• Phonology explains regular sound patterns
8
What is Phonology?
[kæt]
Phonetics:
Phonology:
Orthography: cat
• Phonology explains regular sound patterns • Not phonetics, which deals with acoustics
9
Q: What do phonologists do?
A: They find sound patterns in sets of words!
10
A Phonological Exercise
[tɔk] [tɔks] [tɔkt] [tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
11
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]
Tenses
Verb
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
12
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]
Tenses
Verb
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
13
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALK
HACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Tenses
Verb
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
14
THANK
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALK
HACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Tenses
Verb
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAP
[kɹæks] [kɹækt][slæp] [slæpt]
15
THANK
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALKTHANKHACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Suffixes
Stem
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAP
[kɹæks] [kɹækt][slæp] [slæpt]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/slæp//kɹæk/
16
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALKTHANKHACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Suffixes
Stem
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAP
[kɹæks] [kɹækt][slæp] [slæpt]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/slæp//kɹæk/
17
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALK
HACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Suffixes
Stem
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAP
[kɹæk] [kɹæks] [kɹækt] [kɹækt][slæp] [slæps] [slæpt] [slæpt]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/slæp//kɹæk/
Prediction!18
THANK
A Model of Phonology
tɔk s
tɔks
Concatenate
“talks”19
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALK
HACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Suffixes
Stem
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAPCODEBAT
[kɹæks] [kɹækt][slæp] [slæpt]
[koʊdz] [koʊdɪd][bæt] [bætɪd]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/bæt//koʊd//slæp//kɹæk/
20
THANK
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALK
HACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Suffixes
Stem
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAPCODEBAT
[kɹæks] [kɹækt][slæp] [slæpt]
[koʊdz] [koʊdɪd][bæt] [bætɪd]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/bæt//koʊd//slæp//kɹæk/
z instead of st ɪ instead of t21
THANK
A Phonological Exercise
[tɔk] [tɔks] [tɔkt]TALKTHANKHACK
1P Pres. Sg. 3P Pres. Sg. Past Tense Past Part.
Suffixes
Stem
s
[tɔkt][θeɪŋk] [θeɪŋks] [θeɪŋkt] [θeɪŋkt][hæk] [hæks] [hækt] [hækt]
CRACKSLAPCODEBATEAT
[kɹæks] [kɹækt][slæp] [slæpt]
[koʊdz] [koʊdɪd][bæt] [bætɪd][it] [eɪt] [itən]
/Ø/ /s/ /t/ /t/
/tɔk//θeɪŋk//hæk/
/it//bæt//koʊd//slæp//kɹæk/
eɪt instead of itɪt 22
A Model of Phonology
koʊd s
koʊd#s
koʊdz
Concatenate
Phonology (stochastic)
“codes”
23
Modeling word forms using latent underlying morphs and phonology.Cotterell et. al. TACL 2015
A Model of Phonology
rizaign ation
rizaign#ation
rεzɪgneɪʃn
“resignation”
Concatenate
24
Phonology (stochastic)
Generative Phonology
• A system that generates exactly those attested forms
• Primary research program in phonology since the 1950s
• Example: [rezɪɡneɪʃən] “resignation” and [rizainz] “resigns”
25
Why this matters
• Linguists hand engineer phonological grammars
• Linguistically Interesting: can we create an automated phonologist?
• Cognitively Interesting: can we model how babies learn phonology?
• “Engineeringly” Interesting: can we analyze and generate words we haven’t heard before? (i.e., matrix completion for large vocabularies)
26
A Probability Model
• Describes the generating process of the observed surface words:– We model the morpheme M (a) ∈M as an IID
sample from a probability distribution Mφ (m).
– We model the surface form S(u) as a sample from a conditional distribution Sθ(s | u)
27
The Generative Story
28
• The process of generating a surface word:– Sample the parameters and from priors. φ θ– For each abstract morpheme a A∈ , Sample the
morph M(a) M∼ φ.
– Whenever a new abstract word =a1,a2··· must be pronounced for the first time, construct its underlying form u by concatenating the morphs M(a1),M(a2) ··· , and sample the surface word S(u) ∼Sθ(· | u).
– Reuse this S(u) in future.
Why Probability?
• A language’s morphology and phonology are deterministic
• Advantages:– Soft models admit efficient learning and
inference– Quantification of irregularity (“sing” and “sang”)
• Our use is orthogonal to phonologists’ use of probability, e.g., to explain gradient phenomena
29
Lower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r i z a i g n s
30
Upper Left Context
Lower Left Context
Upper Right Context
Phonology as an Edit Process
r i z a i g n s
rCOPY
31
Upper Left Context
Lower Left Context
Upper Right Context
Phonology as an Edit Process
r i z a i g n s
r iCOP
Y
32
Upper Left Context
Lower Left Context
Upper Right Context
Phonology as an Edit Process
r i z a i g n s
r iCOP
Y
z
33
Upper Left Context
Lower Left Context
i
i
Upper Right Context
Phonology as an Edit Process
r z a i g n s
r zCOP
Y
a
34
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iCOP
Y
35
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
36
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a i ɛCOP
Y
n
37
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a i ɛ nSUB
z
38
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a i ɛ nSUB
z
39
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a i ɛCOP
Y
n
40
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
41
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
42
Action ProbDEL .75COPY .01SUB(A) .05SUB(B) .03......INS(A) .02INS(B) .01......
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
43
Action ProbDEL .75COPY .01SUB(A) .05SUB(B) .03......INS(A) .02INS(B) .01......
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
44
Action ProbDEL .75COPY .01SUB(A) .05SUB(B) .03......INS(A) .02INS(B) .01......
Feature Function
Weights
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
45
Feature Function
Weights
Features
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
46
Feature Function
Weights
Features
Surface Form
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
47
Feature Function
Weights
Features
Surface FormTransduction
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
48
Feature Function
Weights
Features
Surface FormTransductionUpper String
Phonological Attributes
Binary Attributes (+ and -)
49
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
50
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
51
Faithfulness Features
EDIT(g, ɛ)EDIT(+cons, ɛ)EDIT(+voiced, ɛ)
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
52
Markedness Features
BIGRAM(a, i)BIGRAM(-high, -low)BIGRAM(+back, -back)
i
iLower Left Context
Upper Left Context Upper Right Context
Phonology as an Edit Process
r z a i g n s
r z a iDEL
ɛ
53
Markedness Features
BIGRAM(a, i)BIGRAM(-high, -low)BIGRAM(+back, -back)
Inspired by Optimality Theory:
A popular Constraint Based Phonology Formalism
Outline
• A generative model for phonology– Generative Phonology– A Probabilistic Model– Stochastic Edit Process for Phonology
• Inference and Learning– A Hill Climbing Example– EM Algorithm with Finite State Operations
• Evaluation and Results
54
A Generative Model of Phonology
rizˈajnz
rizajgnz
rizajgnz
55
rizˈajnz
A Generative Model of Phonology
rizajgnz
rizajgnz
56
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
dæmnz
dˈæmz rizˈajnz
rizajgnz
rizajgnz
57
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
dæmnz
dˈæmz rizˈajnz
rizajgnz
rizajgnz
58
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
dæmnz
dæmn
dˈæmz rizˈajnz
rizajgnz
rizajgnz
59
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
rˌɛzɪgnˈeɪʃən
dæmnz rizajgneɪʃən
dæmn
dˈæmz rizˈajnz
rizajgnz
rizajgnz
60
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
rˌɛzɪgnˈeɪʃən
dæmnz rizajgneɪʃən
dæmn
dˈæmz rizˈajnz
rizajgnz
rizajgnz
61
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
rˌɛzɪgnˈeɪʃən
dæmnz rizajgneɪʃən
eɪʃən dæmn
dˈæmz rizˈajnz
rizajgnz
rizajgnz
62
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
dæmneɪʃən
rˌɛzɪgnˈeɪʃən
dæmnz rizajgneɪʃən
eɪʃən dæmn
dˈæmz rizˈajnz
rizajgnz
rizajgnz
63
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
dˌæmnˈeɪʃən
dæmneɪʃən
rˌɛzɪgnˈeɪʃən
dæmnz rizajgneɪʃən
eɪʃən dæmn
dˈæmz rizˈajnz
rizajgnz
rizajgnz
64
Graphical models are flexible
gəliːpt
gəliːbt
tgə
65
liːb
“geliebt” (German: loved)
• Matrix completion: each word built from one stem (row) + one suffix (column). WRONG
• Graphical model: a word can be built from any # of morphemes (parents). RIGHT
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
rˌɛzɪgnˈeɪʃəndˈæmz rizˈajnz
66
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
rˌɛzɪgnˈeɪʃəndˈæmz rizˈajnz
67
(Approximate) Inference
MCMC – Bouchard-Côté (2007)
Belief Propagation – Dreyer and Eisner (2009)
Expectation Propagation – Cotterell and Eisner (2015)
Dual Decomposition – Peng et al. (2015)
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
rˌɛzɪgnˈeɪʃəndˈæmz rizˈajnz
68
(Approximate) Inference
MCMC – Bouchard-Côté (2007)
Belief Propagation – Dreyer and Eisner (2009)
Expectation Propagation – Cotterell and Eisner (2015)
Dual Decomposition – Peng et al. (2015)
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
rˌɛzɪgnˈeɪʃəndˈæmz rizˈajnz
69
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
rˌɛzɪgnˈeɪʃəndˈæmz rizˈajnz
70
Distribution Over Surface Form:
UR Probdæmeɪʃən .80dæmneɪʃən .10dæmineɪʃən. .001dæmiineɪʃən .0001… … chomsky .000001… …
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
rˌɛzɪgnˈeɪʃəndˈæmz rizˈajnz
71
Distribution Over Surface Form:
UR Probdæmeɪʃən .80dæmneɪʃən .10dæmineɪʃən. .001dæmiineɪʃən .0001… … chomsky .000001… …r i n g
ue ε
s e ha
Encoded as
Weighted Finite-
State Automaton
Discovering the Underlying Forms = Inference in a Graphical Model
???? ???? ????????
???????? ????????
???? rˌɛzɪgnˈeɪʃəndˈæmz rizˈajnz
72
Discovering the Underlying Forms = Inference in a Graphical Model
???? ???? ????????
???????? ????????
???? rˌɛzɪgnˈeɪʃəndˈæmz rizˈajnz
73
Belief Propagation (BP) in a Nutshell
d æmnˌe nˈ ɪʃə riz ajnzˈ r z gnˌɛ ɪ
e nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
e nɪʃəz rizajgndæmn
d æmzˈ
74
Belief Propagation (BP) in a Nutshell
d æmnˌe nˈ ɪʃə riz ajnzˈ r z gnˌɛ ɪ
e nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
e nɪʃəz rizajgndæmn
d æmzˈ
Factor to Variable Messages
75
Belief Propagation (BP) in a Nutshell
d æmnˌe nˈ ɪʃə riz ajnzˈ r z gnˌɛ ɪ
e nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
e nɪʃəz rizajgndæmn
d æmzˈ
Variable to Factor Messages
76
Belief Propagation (BP) in a Nutshell
d æmnˌe nˈ ɪʃə riz ajnzˈ r z gnˌɛ ɪ
e nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
e nɪʃəz rizajgndæmn
d æmzˈ
Encoded as Finite-State Machines
r i n gue εs e ha
r i n gue ε ee
s e har i n g
ue ε ees e ha
r i n gue ε ee
s e ha r i n gue εs e ha
r i n gue εs e ha
77
Belief Propagation (BP) in a Nutshell
d æmnˌe nˈ ɪʃə riz ajnzˈ r z gnˌɛ ɪ
e nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
e nɪʃəz rizajgndæmn
d æmzˈ
78
Belief Propagation (BP) in a Nutshell
d æmnˌe nˈ ɪʃə riz ajnzˈ r z gnˌɛ ɪ
e nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
e nɪʃəz rizajgndæmn
d æmzˈ
r i n gue ε ee
s e ha
r i n gue ε ee
s e ha
79
Belief Propagation (BP) in a Nutshell
d æmnˌe nˈ ɪʃə riz ajnzˈ r z gnˌɛ ɪ
e nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
e nɪʃəz rizajgndæmn
d æmzˈ
r i n gue ε ee
s e har i n gue ε ee
s e ha
r i n gue ε ee
s e ha
Point-wise product (finite-state
intersection) yields marginal belief
80
Belief Propagation (BP) in a Nutshell
d æmnˌe nˈ ɪʃə riz ajnzˈ r z gnˌɛ ɪ
e nˈ ɪʃə
dæmnz rizajgnz rizajgne nɪʃədæmne nɪʃə
e nɪʃəz rizajgndæmn
d æmnzˈ
Distribution Over Underlying Forms:
UR Probrizajgnz .95rezajnz .02rezigz .02rezgz .0001… … chomsky .000001… …
r i n gue ε ee
s e har i n gue ε ee
s e ha
r i n gue ε ee
s e ha
81
Training the Model
• Trained with EM (Dempster et al. 1977)• E-Step:– Finite-State Belief Propagation
• M-Step:– Train stochastic phonology with gradient
descent
ii
r z a i g n sr z a i
COPY
r i n gue ε ee
s e har i n gue εs e ha
82
Datasets
• Experiments on 7 languages from different families– English (CELEX)– Dutch (CELEX)– German (CELEX)– Maori (Kenstowicz)– Tangale (Kenstowicz)– Indonesian (Kenstowicz)– Catalan(Kenstowicz)
83
A Generative Model of Phonology
????
dæmneɪʃən
rˌɛzɪgnˈeɪʃən
dæmnz rizajgneɪʃən
eɪʃən dæmn
dˈæmz rizˈajnz
rizajgnz
rizajgnz
How do you pronounce this word?
84
A Generative Model of Phonology
dˈæmnˈeɪʃən
dæmneɪʃən
rˌɛzɪgnˈeɪʃən
dæmnz rizajgneɪʃən
eɪʃən dæmn
dˈæmz rizˈajnz
rizajgnz
rizajgnz
How do you pronounce this word?
85
Evaluation• Metrics: (Lower is Always Better) – 1-best error rate (did we get it right?)
– cross-entropy (what probability did we give the right answer?)
– expected edit-distance (how far away on average are we?)
– Average each metric over many training-test splits
• Comparisons: – Lower Bound: Phonology as noisy
concatenation – Upper Bound: Oracle URs from linguists
86
Distribution Over Surface Form:
UR Probdæmeɪʃən .80dæmneɪʃən .10dæmineɪʃən. .001dæmiineɪʃən .0001… … chomsky .000001… …
Exploring the Evaluation Metrics
87
• 1-best error rate– Is the 1-best correct?
Distribution Over Surface Form:
UR Probdæmeɪʃən .80dæmneɪʃən .10dæmineɪʃən. .001dæmiineɪʃən .0001… … chomsky .000001… …
Exploring the Evaluation Metrics
88
• 1-best error rate– Is the 1-best correct?
• Cross Entropy– What is the
probability of the correct answer?
Distribution Over Surface Form:
UR Probdæmeɪʃən .80dæmneɪʃən .10dæmineɪʃən. .001dæmiineɪʃən .0001… … chomsky .000001… …
Exploring the Evaluation Metrics
89
• 1-best error rate– Is the 1-best correct?
• Cross Entropy– What is the
probability of the correct answer?
• Expected Edit Distance– How close am I on
average?
Distribution Over Surface Form:
UR Probdæmeɪʃən .80dæmneɪʃən .10dæmineɪʃən. .001dæmiineɪʃən .0001… … chomsky .000001… …
Exploring the Evaluation Metrics
90
• 1-best error rate– Is the 1-best correct?
• Cross Entropy– What is the
probability of the correct answer?
• Expected Edit Distance– How close am I on
average?• Average over many
training-test splits
German Results
91
Error Bars with bootstrap
resampling!
CELEX Results
92
Phonological Exercise Results
93
Conclusion
• We presented a novel framework for computational phonology
• New datasets for research in the area• A fair evaluation strategy for phonological
learners
94
Fin
Thank you for your attention!
95
A Generative Model of Phonology• A Directed Graphical Model of the lexicon
dˌæmnˈeɪʃən
dæmneɪʃən
rˌɛzɪgnˈeɪʃən
dæmnz rizajgneɪʃən
eɪʃən dæmn
dˈæmz rizˈajnz
rizajgnz
rizajgnz
96
Gold UR Recovery
97