IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Methods in Unsupervised Dependency Parsing
Mohammad Sadegh Rasooli
Candidacy examDepartment of Computer Science
Columbia University
April 1st, 2016
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Overview
1 IntroductionDependency GrammarDependency Parsing
2 Fully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
3 Syntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
4 Conclusion
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Dependency GrammarDependency Parsing
Dependency Grammar
I A formal grammar introduced by[Tesniere, 1959] inspired from thevalency theory in Chemistry
A
B
C
CH3
D
E
F
G
I In a dependency tree, each word has exactly one parentand can have as many dependents
I Benefit: explicit representation of syntactic roles
Economic news had little effect on financial markets .
nmodsbj nmod
obj
nmodnmod
pc
punc
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Dependency GrammarDependency Parsing
Dependency Grammar
I A formal grammar introduced by[Tesniere, 1959] inspired from thevalency theory in Chemistry
A
B
C
CH3
D
E
F
G
I In a dependency tree, each word has exactly one parentand can have as many dependents
I Benefit: explicit representation of syntactic roles
Economic news had little effect on financial markets .
nmodsbj nmod
obj
nmodnmod
pc
punc
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Dependency GrammarDependency Parsing
Dependency Parsing
I State-of-the-art parsing models are very accurateI Requirement: large amounts of annotated trees
I ≤50 treebanks available, '7000 languages without anytreebank
I Treebank development: an expensive and time-consumingtask
I Five years of work for the Penn Chinese Treebank[Hwa et al., 2005]
I Unsupervised dependency parsing is an alternativeapproach when no treebank is available
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Dependency GrammarDependency Parsing
Dependency Parsing
I State-of-the-art parsing models are very accurateI Requirement: large amounts of annotated trees
I ≤50 treebanks available, '7000 languages without anytreebank
I Treebank development: an expensive and time-consumingtask
I Five years of work for the Penn Chinese Treebank[Hwa et al., 2005]
I Unsupervised dependency parsing is an alternativeapproach when no treebank is available
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Overview
1 IntroductionDependency GrammarDependency Parsing
2 Fully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
3 Syntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
4 Conclusion
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing
I Goal: Develop an accurate parser without annotated dataI Common assumptions
I Part-of-speech (POS) information is availableI Raw data is available
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Initial Attempts
I The seminal work of[Carroll and Charniak, 1992] and[Paskin, 2002] tried differenttechniques and achieved interestingresults
I Their models could not beat thebaseline of attaching every word tothe next word
Learning Language
Supervised NLP Unsupervised NLP
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
DMV: the First Breakthrough
I Dependency model with valence (DMV)[Klein and Manning, 2004] is the first model that could beatthe baseline
I Most papers extended the DMV either in the inferencemethod or parameter definition
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
The Dependency Model with Valence
I Input x, output y, p(x, y|θ) = p(y(0)|$, θ)I θc for dependency attachmentI θs for stopping getting dependentsI adj(j): true iff xj is adjacent to its parentI depdir(j) set of dependents for xj in direction dir
Recursive calculation
P (y(i)|xi, θ) =∏
dir∈{←,→}
θs(stop|xi, dir, [depdir(i)?= ∅])
×∏
j∈ydir(i)
(1− θs(stop|xi, dir, adj(j)))
× θc(xj |xi, dir)× P (y(j), θ)
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
DMV: A Running Example
ROOT PRN VB DT NN
P (y(0)
) = θc(VB|ROOT,→)× P (y(2)|VB, θ)
P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))
×θc(PRN|VB,←)× P (y(1)|PRN, θ)
×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))
×θc(NN|VB,→)× P (y(4)|NN,θ)
P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)
P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))
×θc(DT |NN,←)× P (y(3)|DT, θ)
×θs(stop|NN,→, false)
P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
DMV: A Running Example
ROOT PRN VB DT NN
P (y(0)
) = θc(VB|ROOT,→)× P (y(2)|VB, θ)
P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))
×θc(PRN|VB,←)× P (y(1)|PRN, θ)
×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))
×θc(NN|VB,→)× P (y(4)|NN,θ)
P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)
P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))
×θc(DT |NN,←)× P (y(3)|DT, θ)
×θs(stop|NN,→, false)
P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
DMV: A Running Example
ROOT PRN VB DT NN
P (y(0)
) = θc(VB|ROOT,→)× P (y(2)|VB, θ)
P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))
×θc(PRN|VB,←)× P (y(1)|PRN, θ)
×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))
×θc(NN|VB,→)× P (y(4)|NN,θ)
P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)
P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))
×θc(DT |NN,←)× P (y(3)|DT, θ)
×θs(stop|NN,→, false)
P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
DMV: A Running Example
ROOT PRN VB DT NN
P (y(0)
) = θc(VB|ROOT,→)× P (y(2)|VB, θ)
P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))
×θc(PRN|VB,←)× P (y(1)|PRN, θ)
×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))
×θc(NN|VB,→)× P (y(4)|NN, θ)
P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)
P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))
×θc(DT |NN,←)× P (y(3)|DT, θ)
×θs(stop|NN,→, false)
P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
DMV: A Running Example
ROOT PRN VB DT NN
P (y(0)
) = θc(VB|ROOT,→)× P (y(2)|VB, θ)
P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))
×θc(PRN|VB,←)× P (y(1)|PRN, θ)
×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))
×θc(NN|VB,→)× P (y(4)|NN, θ)
P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)
P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))
×θc(DT |NN,←)× P (y(3)|DT, θ)
×θs(stop|NN,→, false)
P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
DMV: A Running Example
ROOT PRN VB DT NN
P (y(0)
) = θc(VB|ROOT,→)× P (y(2)|VB, θ)
P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))
×θc(PRN|VB,←)× P (y(1)|PRN, θ)
×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))
×θc(NN|VB,→)× P (y(4)|NN, θ)
P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)
P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))
×θc(DT |NN,←)× P (y(3)|DT, θ)
×θs(stop|NN,→, false)
P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
DMV: A Running Example
ROOT PRN VB DT NN
P (y(0)
) = θc(VB|ROOT,→)× P (y(2)|VB, θ)
P (y(2)|V B, θ) =θs(stop|VB,←, true)× (1− θs(stop|VB,←, false))
×θc(PRN|VB,←)× P (y(1)|PRN, θ)
×θs(stop|VB,→, true)× (1− θs(stop|VB,→, false))
×θc(NN|VB,→)× P (y(4)|NN, θ)
P (y(1)|PRN, θ) =θs(stop|PRN,←, false)× θs(stop|PRN,→, false)
P (y(4)|NN, θ) =θs(stop|NN,←, true)× (1− θs(stop|NN,←, false))
×θc(DT |NN,←)× P (y(3)|DT, θ)
×θs(stop|NN,→, false)
P (y(3)|DT, θ) =θs(stop|DT,←, false)× θs(stop|DT,→, false)
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
DMV: Parameter Estimation
I Parameter estimation based on occurrence counts; e.g.
θc(wj |wi,→) =count(wi → wj)∑w′∈V count(wi → w′)
I In an unsupervised setting, we can use dynamicprogramming (the Inside-Outside algorithm[Lari and Young, 1990]) to estimate model parameters θ
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Problems with DMV
I A non-convex optimization problem forDMV
I Local optima is not necessarily a globaloptima
I Very sensitive to the initialization
I Encoding constraints is not embedded in the original model
I Lack of expressiveness
I Low supervised accuracy (upperbound)
I Needs inductive biasI Post-processing the DMV output by
fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]
DET NOUN
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Problems with DMV
I A non-convex optimization problem forDMV
I Local optima is not necessarily a globaloptima
I Very sensitive to the initialization
I Encoding constraints is not embedded in the original model
I Lack of expressiveness
I Low supervised accuracy (upperbound)
I Needs inductive biasI Post-processing the DMV output by
fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]
DET NOUN
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Problems with DMV
I A non-convex optimization problem forDMV
I Local optima is not necessarily a globaloptima
I Very sensitive to the initialization
I Encoding constraints is not embedded in the original model
I Lack of expressiveness
I Low supervised accuracy (upperbound)
I Needs inductive biasI Post-processing the DMV output by
fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]
DET NOUN
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Problems with DMV
I A non-convex optimization problem forDMV
I Local optima is not necessarily a globaloptima
I Very sensitive to the initialization
I Encoding constraints is not embedded in the original model
I Lack of expressiveness
I Low supervised accuracy (upperbound)
I Needs inductive biasI Post-processing the DMV output by
fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]
DET NOUN
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Problems with DMV
I A non-convex optimization problem forDMV
I Local optima is not necessarily a globaloptima
I Very sensitive to the initialization
I Encoding constraints is not embedded in the original model
I Lack of expressiveness
I Low supervised accuracy (upperbound)
I Needs inductive biasI Post-processing the DMV output by
fixing the determiner-noun directiongave a huge improvement[Klein and Manning, 2004]
DET NOUN
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Extensions to DMV
I Changing the learning algorithm from EMI Contrastive estimation [Smith and Eisner, 2005]
I Bayesian models [Headden III et al., 2009, Cohen and Smith, 2009a,
Blunsom and Cohn, 2010, Naseem et al., 2010,
Marecek and Straka, 2013]
I Local optima problemI Switching between different objectives [Spitkovsky et al., 2013]
I Lack of expressivenessI Lexicalization [Headden III et al., 2009]
I Parameter tying [Cohen and Smith, 2009b, Headden III et al., 2009]
I Tree substitution grammars [Blunsom and Cohn, 2010]
I Rereanking with a richer model [Le and Zuidema, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Extensions to DMV
I Changing the learning algorithm from EMI Contrastive estimation [Smith and Eisner, 2005]
I Bayesian models [Headden III et al., 2009, Cohen and Smith, 2009a,
Blunsom and Cohn, 2010, Naseem et al., 2010,
Marecek and Straka, 2013]
I Local optima problemI Switching between different objectives [Spitkovsky et al., 2013]
I Lack of expressivenessI Lexicalization [Headden III et al., 2009]
I Parameter tying [Cohen and Smith, 2009b, Headden III et al., 2009]
I Tree substitution grammars [Blunsom and Cohn, 2010]
I Rereanking with a richer model [Le and Zuidema, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Extensions to DMV
I Changing the learning algorithm from EMI Contrastive estimation [Smith and Eisner, 2005]
I Bayesian models [Headden III et al., 2009, Cohen and Smith, 2009a,
Blunsom and Cohn, 2010, Naseem et al., 2010,
Marecek and Straka, 2013]
I Local optima problemI Switching between different objectives [Spitkovsky et al., 2013]
I Lack of expressivenessI Lexicalization [Headden III et al., 2009]
I Parameter tying [Cohen and Smith, 2009b, Headden III et al., 2009]
I Tree substitution grammars [Blunsom and Cohn, 2010]
I Rereanking with a richer model [Le and Zuidema, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Extensions to DMV
I Inductive biasI Adding constraints
I Posterior regularization [Gillenwater et al., 2010]I Forcing unambiguity [Tu and Honavar, 2012]I Universal knowledge [Naseem et al., 2010]
I Stop probability estimation from raw text[Marecek and Straka, 2013]
I Alternatives to DMVI Convex objective based on convex hull of plausible trees
[Grave and Elhadad, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Extensions to DMV
I Inductive biasI Adding constraints
I Posterior regularization [Gillenwater et al., 2010]I Forcing unambiguity [Tu and Honavar, 2012]I Universal knowledge [Naseem et al., 2010]
I Stop probability estimation from raw text[Marecek and Straka, 2013]
I Alternatives to DMVI Convex objective based on convex hull of plausible trees
[Grave and Elhadad, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Common Learning Algorithms for DMV
I Expectation maximization (EM) [Dempster et al., 1977]
I Posterior regularization (PR) [Ganchev et al., 2010]
I Variational Bayes (VB) [Beal, 2003]
I PR + VB [Naseem et al., 2010]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Common Learning Algorithms for DMV
I Expectation maximization (EM) [Dempster et al., 1977]
I Posterior regularization (PR) [Ganchev et al., 2010]
I Variational Bayes (VB) [Beal, 2003]
I PR + VB [Naseem et al., 2010]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Expectation Maximization (EM) Algorithm
I Start with initial parameters θ(t) in iteration t = 1
I Repeat until θ(t) ' θ(t+1)
I E step: Maximize the posterior probability
∀i = 1 . . . N ; ∀y ∈ Yxi
q(t)i ← pθ(t)(y|x) =
pθ(t)(xi, y)∑y′∈Yxi
pθ(t)(xi, y′)
I M step: Maximize the parameter values θ
θ(t+1) ← arg maxθ
N∑i=1
∑y∈Yxi
q(t)i (y) log pθ(xi, y)
I t← t+ 1
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Expectation Maximization (EM) Algorithm
I Start with initial parameters θ(t) in iteration t = 1
I Repeat until θ(t) ' θ(t+1)
I E step: Maximize the posterior probability
∀i = 1 . . . N ; ∀y ∈ Yxi
q(t)i ← pθ(t)(y|x) =
pθ(t)(xi, y)∑y′∈Yxi
pθ(t)(xi, y′)
Another interpretation of the E step [Neal and Hinton, 1998]
q(t) ← arg minq
KL(q(Y ) || pθ(t)(Y |X))
I t← t+ 1
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Expectation Maximization (EM) Algorithm
I Start with initial parameters θ(t) in iteration t = 1I Repeat until θ(t) ' θ(t+1)
M step
Optimal parameters for a categorical distribution is achieved bynormalization:
θ(t+1)(y|x) =
∑Ni=1 q
(t)i (y|x)∑
y′∑Ni=1 q
(t)i (y′|x)
I M step: Maximize the parameter values θ
θ(t+1) ← arg maxθ
N∑i=1
∑y∈Yxi
q(t)i (y) log pθ(xi, y)
I t← t+ 1
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Common Learning Algorithms for DMV
I Expectation maximization (EM) [Dempster et al., 1977]
I Posterior regularization (PR) [Ganchev et al., 2010]
I Variational Bayes (VB) [Beal, 2003]
I PR + VB [Naseem et al., 2010]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Posterior Regularization
I Prior knowledge as constraint
I Just affects the E step and the M step remains unchanged
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Posterior Regularization
Original objective
q(t) ← arg minq
KL(q(Y ) || pθ(t)(Y |X))
Modified objective
q(t) ← arg minq
KL(q(Y ) || pθ(t)(Y |X)) + σ∑i
bi
s.t. ||Eq[φi(X,Y )]||β ≤ bi
σ is the regularization coefficient and bi is the proposed numericalconstraint for sentence i.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Posterior Regularization Constraints
Modified objective
q(t) ← arg minq
KL(q(Y ) || pθ(t)(Y |X)) + σ∑i
bi
Types of constraints:
I Number of unique child-head tag pairs in a sentence (less isbetter) [Gillenwater et al., 2010]
I Number of preserved pre-defined linguistic rules in a tree(more is better) [Naseem et al., 2010]
I Information entropy of the sentence (less is better)[Tu and Honavar, 2012]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Common Learning Algorithms for DMV
I Expectation maximization (EM) [Dempster et al., 1977]
I Posterior regularization (PR) [Ganchev et al., 2010]
I Variational Bayes (VB) [Beal, 2003]
I PR + VB [Naseem et al., 2010]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Variational Bayes
I A Bayesian model that encodes prior information
I Just affects the M step and the E step remains unchanged
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Variational Bayes
M step
θ(t+1)(y|x) =
∑Ni=1 q
(t)i (y|x)∑
y′∑N
i=1 q(t)i (y′|x)
Modified M step in VB
θ(t+1)(y|x) =F(αy +
∑Ni=1 q
(t)i (y|x))
F(∑
y′ αy′ +∑N
i=1 q(t)i (y′|x))
α is the priorF(v) = eΨ(v)
Ψ is the digamma function
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Common Learning Algorithms for DMV
I Expectation maximization (EM) [Dempster et al., 1977]
I Posterior regularization (PR) [Ganchev et al., 2010]
I Variational Bayes (VB) [Beal, 2003]
I PR + VB [Naseem et al., 2010]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
VB + PR
I Makes use of both methods [Naseem et al., 2010]:I E step as in PRI M step as in VB
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Discussion
I Significant improvements?I Yes!
I Satisfying performance?I No!
I Mostly optimized for EnglishI Far less than a supervised model
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Discussion
I Significant improvements?I Yes!
I Satisfying performance?I No!
I Mostly optimized for EnglishI Far less than a supervised model
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Discussion
I Significant improvements?I Yes!
I Satisfying performance?I No!
I Mostly optimized for EnglishI Far less than a supervised model
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Discussion
I Significant improvements?I Yes!
I Satisfying performance?I No!
I Mostly optimized for EnglishI Far less than a supervised model
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
un
lab
eled
dep
end
ency
acc
ura
cyo
nW
SJ
test
ing
da
ta
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Klein and Manning, 2004]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
40.5
un
lab
eled
dep
end
ency
acc
ura
cyo
nW
SJ
test
ing
da
ta
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Cohen et al., 2008]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
40.5 41.4
un
lab
eled
dep
end
ency
acc
ura
cyo
nW
SJ
test
ing
da
ta
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Cohen and Smith, 2009a]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
40.5 41.4
55.7
un
lab
eled
dep
end
ency
acc
ura
cyo
nW
SJ
test
ing
da
ta
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Blunsom and Cohn, 2010]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
40.5 41.4
55.759.1
un
lab
eled
dep
end
ency
acc
ura
cyo
nW
SJ
test
ing
da
ta
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Spitkovsky et al., 2011]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
40.5 41.4
55.759.1
61.2
un
lab
eled
dep
end
ency
acc
ura
cyo
nW
SJ
test
ing
da
ta
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Spitkovsky et al., 2012]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
40.5 41.4
55.759.1
61.264.4
un
lab
eled
dep
end
ency
acc
ura
cyo
nW
SJ
test
ing
da
ta
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Spitkovsky et al., 2013]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
40.5 41.4
55.759.1
61.264.4
66.2
un
lab
eled
dep
end
ency
acc
ura
cyo
nW
SJ
test
ing
da
ta
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Le and Zuidema, 2015]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
40.5 41.4
55.759.1
61.264.4
66.2
76.3
un
lab
eled
dep
end
ency
acc
ura
cyo
nW
SJ
test
ing
da
ta
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
15 minutes of programming to write down rules gives ' 60% accuracy!
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Unsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
Unsupervised Parsing Improvement Over Time
Ra
nd
om
Ad
jace
nt
DM
V
20
08
20
09
20
10
20
11
20
12
20
13
20
15
DM
V-s
up
ervi
sed
Su
per
vise
d
30
40
50
60
70
80
90
100
30.1
33.635.9
40.5 41.4
55.759.1
61.264.4
66.2
76.3
94.4u
nla
bel
edd
epen
den
cya
ccu
racy
on
WS
Jte
stin
gd
ata
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Overview
1 IntroductionDependency GrammarDependency Parsing
2 Fully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
3 Syntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
4 Conclusion
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Syntactic Transfer Models
I Transfer Learning: learn a problem X and apply to a similar(but not the same) problem Y
I Challenges: feature mismatch, domain mismatch, and lack ofsufficient similarity between the two problems
I Syntactic transfer: Learn a parser for languages L1 . . .Lmand use them for parsing language Lm+1
I Challenges: mismatch in lexical features, difference in wordorder
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Syntactic Transfer Models
I Transfer Learning: learn a problem X and apply to a similar(but not the same) problem Y
I Challenges: feature mismatch, domain mismatch, and lack ofsufficient similarity between the two problems
I Syntactic transfer: Learn a parser for languages L1 . . .Lmand use them for parsing language Lm+1
I Challenges: mismatch in lexical features, difference in wordorder
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Approaches in Syntactic Transfer
I Direct transfer: train directly on treebanks for languagesL1 . . .Lm and apply it to language Lm+1
I Annotation projection: use parallel data and projectsupervised parse trees in language Ls to target languagethrough word alignment
I Treebank translation: develop an SMT system, translatesource treebanks to the target language, and train on thetranslated treebank [Tiedemann et al., 2014]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Approaches in Syntactic Transfer
I Direct transfer: train directly on treebanks for languagesL1 . . .Lm and apply it to language Lm+1
I Annotation projection: use parallel data and projectsupervised parse trees in language Ls to target languagethrough word alignment
I Treebank translation: develop an SMT system, translatesource treebanks to the target language, and train on thetranslated treebank [Tiedemann et al., 2014]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Approaches in Syntactic Transfer
I Direct transfer: train directly on treebanks for languagesL1 . . .Lm and apply it to language Lm+1
I Annotation projection: use parallel data and projectsupervised parse trees in language Ls to target languagethrough word alignment
I Treebank translation: develop an SMT system, translatesource treebanks to the target language, and train on thetranslated treebank [Tiedemann et al., 2014]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Direct Syntactic Transfer
I A supervised parser gets input x and outputs the best treey∗, using lexical features φ(l)(x, y) and unlexicalizedfeatures φ(p)(x, y):
y∗(x) = arg maxy∈Y(x)
θl · φ(l)(x,y) + θp · φ(p)(x,y)
I A direct transfer model cannot make use of lexical features.
I Direct delexicalized transfer only uses unlexicalized features[Cohen et al., 2011, McDonald et al., 2011]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Direct Syntactic Transfer
I A supervised parser gets input x and outputs the best treey∗, using lexical features φ(l)(x, y) and unlexicalizedfeatures φ(p)(x, y):
y∗(x) = arg maxy∈Y(x)
θl · φ(l)(x,y) + θp · φ(p)(x,y)
I A direct transfer model cannot make use of lexical features.
I Direct delexicalized transfer only uses unlexicalized features[Cohen et al., 2011, McDonald et al., 2011]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Direct Syntactic Transfer
I A supervised parser gets input x and outputs the best treey∗, using lexical features φ(l)(x, y) and unlexicalizedfeatures φ(p)(x, y):
y∗(x) = arg maxy∈Y(x)
θl · φ(l)(x,y) + θp · φ(p)(x,y)
I A direct transfer model cannot make use of lexical features.
I Direct delexicalized transfer only uses unlexicalized features[Cohen et al., 2011, McDonald et al., 2011]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Direct Delexicalized Transfer: Pros and Cons
Pros
I Simplicity: can employ any supervised parser
I More accurate than fully unsupervised models
Cons
I No treatment for word order difference
I Lack of lexical features
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Direct Delexicalized Transfer: Pros and Cons
Pros
I Simplicity: can employ any supervised parser
I More accurate than fully unsupervised models
Cons
I No treatment for word order difference
I Lack of lexical features
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Addressing Problems in Direct Delexicalized Transfer
Addressing problems in direct delexicalized transfer
I Word order difference
I Lack of lexical features
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Addressing Problems in Direct Delexicalized Transfer
Addressing problems in direct delexicalized transfer
I Word order difference
I Lack of lexical features
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
The World Atlas of Language Structures (WALS)
I The World Atlas of Language Structures (WALS)[Dryer and Haspelmath, 2013] is a large database of structural(phonological, grammatical, lexical) properties for near 3000languages
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Selective Sharing: Addressing Words Order Problem
I Use typological features such as the subject-verb order foreach source and target language.
I In addition to the original parameters, share typologicalfeatures for languages that have specific orderings in common
I Added features: original features conjoined with eachtypological feature
I Discriminative models with selective sharing gain very highaccuracies [Tackstrom et al., 2013, Zhang and Barzilay, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Selective Sharing: Addressing Words Order Problem
I Use typological features such as the subject-verb order foreach source and target language.
I In addition to the original parameters, share typologicalfeatures for languages that have specific orderings in common
I Added features: original features conjoined with eachtypological feature
I Discriminative models with selective sharing gain very highaccuracies [Tackstrom et al., 2013, Zhang and Barzilay, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Addressing Problems in Direct Delexicalized Transfer
Addressing problems in direct delexicalized transfer
I Word order difference
I Lack of lexical features
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Addressing the Lack of Lexical Features
I Using bilingual dictionaries to transfer lexical features[Durrett et al., 2012, Xiao and Guo, 2015]
I Creating cross-lingual word representationsI without parallel text [Duong et al., 2015]
I using parallel text [Zhang and Barzilay, 2015, Guo et al., 2016]
I Successful models use cross-lingual word representationsusing parallel text
I Could we leverage more if we have parallel text?I Yes!
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Addressing the Lack of Lexical Features
I Using bilingual dictionaries to transfer lexical features[Durrett et al., 2012, Xiao and Guo, 2015]
I Creating cross-lingual word representationsI without parallel text [Duong et al., 2015]
I using parallel text [Zhang and Barzilay, 2015, Guo et al., 2016]
I Successful models use cross-lingual word representationsusing parallel text
I Could we leverage more if we have parallel text?I Yes!
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Addressing the Lack of Lexical Features
I Using bilingual dictionaries to transfer lexical features[Durrett et al., 2012, Xiao and Guo, 2015]
I Creating cross-lingual word representationsI without parallel text [Duong et al., 2015]
I using parallel text [Zhang and Barzilay, 2015, Guo et al., 2016]
I Successful models use cross-lingual word representationsusing parallel text
I Could we leverage more if we have parallel text?I Yes!
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Annotation Projection
I Steps in annotation projection
1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Annotation Projection
I Steps in annotation projection
1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Annotation Projection
I Steps in annotation projection
1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Annotation Projection
I Steps in annotation projection
1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Annotation Projection
I Steps in annotation projection
1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Annotation Projection
I Steps in annotation projection
1 Prepare bitext2 Align bitext3 Parse source sentence with a supervised parser4 Project dependencies5 Train on the projected dependencies
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Projecting Dependencies from Parallel Data
Bitext
Prepare bitext
The political priorities must be set by this House and the MEPs . ROOT
Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Projecting Dependencies from Parallel Data
Align
Align bitext (e.g. via Giza++)
The political priorities must be set by this House and the MEPs . ROOT
Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Projecting Dependencies from Parallel Data
Parse
Parse source sentence with a supervised parser
The political priorities must be set by this House and the MEPs . ROOT
Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Projecting Dependencies from Parallel Data
Project
Project dependencies
The political priorities must be set by this House and the MEPs . ROOT
Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Projecting Dependencies from Parallel Data
Train
Train on the projected dependencies
Die politischen Prioritaten mussen von diesem Parlament und den Europaabgeordneten abgesteckt werden . ROOT
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Practical Problems
I Most translations are not word-to-wordI Partial alignments
I Alignment errors
I Supervised parsers are not perfect
I Difference in syntactic behavior across languages
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Approaches in Annotation Projection
I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]
I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]
I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]
I Entropy regularization on projected trees [Ma and Xia, 2014]
I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Approaches in Annotation Projection
I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]
I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]
I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]
I Entropy regularization on projected trees [Ma and Xia, 2014]
I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Approaches in Annotation Projection
I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]
I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]
I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]
I Entropy regularization on projected trees [Ma and Xia, 2014]
I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Approaches in Annotation Projection
I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]
I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]
I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]
I Entropy regularization on projected trees [Ma and Xia, 2014]
I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Approaches in Annotation Projection
I Post-processing alignments with rules and filtering sparsetrees [Hwa et al., 2005]
I Use projected dependencies as constraints in posteriorregularization [Ganchev et al., 2009]
I Use projected dependencies to lexicalize a direct model[McDonald et al., 2011]
I Entropy regularization on projected trees [Ma and Xia, 2014]
I Start with fully projected trees and self-train on partialtrees [Rasooli and Collins, 2015]
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Discussion
I Significant improvements?I Yes!
I Satisfying performance?I Yes!
I Mostly optimized for rich-resource languages
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Discussion
I Significant improvements?I Yes!
I Satisfying performance?I Yes!
I Mostly optimized for rich-resource languages
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Discussion
I Significant improvements?I Yes!
I Satisfying performance?I Yes!
I Mostly optimized for rich-resource languages
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Discussion
I Significant improvements?I Yes!
I Satisfying performance?I Yes!
I Mostly optimized for rich-resource languages
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Unsupervised Parsing Best Models Comparison
Un
sup
ervi
sed
Dir
ect
An
n.
Pro
j.
Su
per
vise
d
50
60
70
80
90
56.1
ave
rag
eu
nla
bel
edd
epen
den
cya
ccu
racy
on
6E
Ula
ng
ua
ges
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Grave and Elhadad, 2015]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Unsupervised Parsing Best Models Comparison
Un
sup
ervi
sed
Dir
ect
An
n.
Pro
j.
Su
per
vise
d
50
60
70
80
90
56.1
77.8
ave
rag
eu
nla
bel
edd
epen
den
cya
ccu
racy
on
6E
Ula
ng
ua
ges
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Ammar et al., 2016]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Unsupervised Parsing Best Models Comparison
Un
sup
ervi
sed
Dir
ect
An
n.
Pro
j.
Su
per
vise
d
50
60
70
80
90
56.1
77.8
82.2
ave
rag
eu
nla
bel
edd
epen
den
cya
ccu
racy
on
6E
Ula
ng
ua
ges
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
[Rasooli and Collins, 2015]
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Approaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
Unsupervised Parsing Best Models Comparison
Un
sup
ervi
sed
Dir
ect
An
n.
Pro
j.
Su
per
vise
d
50
60
70
80
90
56.1
77.8
82.2
87.5a
vera
ge
un
lab
eled
dep
end
ency
acc
ura
cyo
n6
EU
lan
gu
ag
es
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Overview
1 IntroductionDependency GrammarDependency Parsing
2 Fully Unsupervised Parsing ModelsUnsupervised ParsingDepndency Model with Valence (DMV)Common Learning Algorithms for DMVDiscussion
3 Syntactic Transfer ModelsApproaches in Syntactic TransferDirect Syntactic TransferAnnotation ProjectionDiscussion
4 Conclusion
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Conclusion
I Read 28+ papers aboutI Unsupervised dependency parsingI Direct cross-lingual transfer of dependency parsersI Annotation projection for cross-lingual transfer
I Seems that more effort may decrease the need for newtreebanks!
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Conclusion
I Read 28+ papers aboutI Unsupervised dependency parsingI Direct cross-lingual transfer of dependency parsersI Annotation projection for cross-lingual transfer
I Seems that more effort may decrease the need for newtreebanks!
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
Thanks
Thanks a lot
Danke sehr
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References I
Ammar, W., Mulcaire, G., Ballesteros, M., Dyer, C., and Smith, N. A. (2016).One parser, many languages.arXiv preprint arXiv:1602.01595.
Beal, M. J. (2003).Variational algorithms for approximate Bayesian inference.University of London London.
Blunsom, P. and Cohn, T. (2010).Unsupervised induction of tree substitution grammars for dependency parsing.In Proceedings of the 2010 Conference on Empirical Methods in NaturalLanguage Processing, pages 1204–1213, Cambridge, MA. Association forComputational Linguistics.
Carroll, G. and Charniak, E. (1992).Two experiments on learning probabilistic dependency grammars from corpora.Department of Computer Science, Univ.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References II
Cohen, S. B., Das, D., and Smith, N. A. (2011).Unsupervised structure prediction with non-parallel multilingual guidance.In Proceedings of the 2011 Conference on Empirical Methods in NaturalLanguage Processing, pages 50–61, Edinburgh, Scotland, UK. Association forComputational Linguistics.
Cohen, S. B., Gimpel, K., and Smith, N. A. (2008).Logistic normal priors for unsupervised probabilistic grammar induction.In Advances in Neural Information Processing Systems, pages 321–328.
Cohen, S. B. and Smith, N. A. (2009a).Shared logistic normal distributions for soft parameter tying in unsupervisedgrammar induction.In Proceedings of Human Language Technologies: The 2009 Annual Conferenceof the North American Chapter of the Association for Computational Linguistics,NAACL ’09, pages 74–82, Stroudsburg, PA, USA. Association for ComputationalLinguistics.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References III
Cohen, S. B. and Smith, N. A. (2009b).Shared logistic normal distributions for soft parameter tying in unsupervisedgrammar induction.In Proceedings of Human Language Technologies: The 2009 Annual Conferenceof the North American Chapter of the Association for Computational Linguistics,pages 74–82. Association for Computational Linguistics.
Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977).Maximum likelihood from incomplete data via the em algorithm.Journal of the royal statistical society. Series B (methodological), pages 1–38.
Dryer, M. S. and Haspelmath, M., editors (2013).WALS Online.Max Planck Institute for Evolutionary Anthropology, Leipzig.
Duong, L., Cohn, T., Bird, S., and Cook, P. (2015).Cross-lingual transfer for unsupervised dependency parsing without parallel data.In Proceedings of the Nineteenth Conference on Computational NaturalLanguage Learning, pages 113–122, Beijing, China. Association forComputational Linguistics.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References IV
Durrett, G., Pauls, A., and Klein, D. (2012).Syntactic transfer using a bilingual lexicon.In Proceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning, pages1–11, Jeju Island, Korea. Association for Computational Linguistics.
Ganchev, K., Gillenwater, J., and Taskar, B. (2009).Dependency grammar induction via bitext projection constraints.In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACLand the 4th International Joint Conference on Natural Language Processing ofthe AFNLP, pages 369–377, Suntec, Singapore. Association for ComputationalLinguistics.
Ganchev, K., Graca, J., Gillenwater, J., and Taskar, B. (2010).Posterior regularization for structured latent variable models.The Journal of Machine Learning Research, 11:2001–2049.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References V
Gillenwater, J., Ganchev, K., Graca, J., Pereira, F., and Taskar, B. (2010).Sparsity in dependency grammar induction.In Proceedings of the ACL 2010 Conference Short Papers, pages 194–199.Association for Computational Linguistics.
Grave, E. and Elhadad, N. (2015).A convex and feature-rich discriminative approach to dependency grammarinduction.In Proceedings of the 53rd Annual Meeting of the Association for ComputationalLinguistics and the 7th International Joint Conference on Natural LanguageProcessing (Volume 1: Long Papers), pages 1375–1384, Beijing, China.Association for Computational Linguistics.
Guo, J., Che, W., Yarowsky, D., Wang, H., and Liu, T. (2016).A representation learning framework for multi-source transfer parsing.In The Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), Phoenix,Arizona, USA.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References VI
Headden III, W. P., Johnson, M., and McClosky, D. (2009).Improving unsupervised dependency parsing with richer contexts and smoothing.In Proceedings of Human Language Technologies: The 2009 Annual Conferenceof the North American Chapter of the Association for Computational Linguistics,pages 101–109, Boulder, Colorado. Association for Computational Linguistics.
Hwa, R., Resnik, P., Weinberg, A., Cabezas, C., and Kolak, O. (2005).Bootstrapping parsers via syntactic projection across parallel texts.Natural language engineering, 11(03):311–325.
Klein, D. and Manning, C. D. (2004).Corpus-based induction of syntactic structure: Models of dependency andconstituency.In Proceedings of the 42Nd Annual Meeting on Association for ComputationalLinguistics, ACL ’04, Stroudsburg, PA, USA. Association for ComputationalLinguistics.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References VII
Lari, K. and Young, S. J. (1990).The estimation of stochastic context-free grammars using the inside-outsidealgorithm.Computer speech & language, 4(1):35–56.
Le, P. and Zuidema, W. (2015).Unsupervised dependency parsing: Let’s use supervised parsers.In Proceedings of the 2015 Conference of the North American Chapter of theAssociation for Computational Linguistics: Human Language Technologies,pages 651–661, Denver, Colorado. Association for Computational Linguistics.
Ma, X. and Xia, F. (2014).Unsupervised dependency parsing with transferring distribution via parallelguidance and entropy regularization.In Proceedings of the 52nd Annual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers), pages 1337–1348, Baltimore, Maryland.Association for Computational Linguistics.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References VIII
Marecek, D. and Straka, M. (2013).Stop-probability estimates computed on a large corpus improve unsuperviseddependency parsing.In Proceedings of the 51st Annual Meeting of the Association for ComputationalLinguistics (Volume 1: Long Papers), pages 281–290, Sofia, Bulgaria.Association for Computational Linguistics.
McDonald, R., Petrov, S., and Hall, K. (2011).Multi-source transfer of delexicalized dependency parsers.In Proceedings of the 2011 Conference on Empirical Methods in NaturalLanguage Processing, pages 62–72, Edinburgh, Scotland, UK. Association forComputational Linguistics.
Naseem, T., Chen, H., Barzilay, R., and Johnson, M. (2010).Using universal linguistic knowledge to guide grammar induction.In Proceedings of the 2010 Conference on Empirical Methods in NaturalLanguage Processing, pages 1234–1244, Cambridge, MA. Association forComputational Linguistics.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References IX
Neal, R. M. and Hinton, G. E. (1998).A view of the em algorithm that justifies incremental, sparse, and other variants.In Learning in graphical models, pages 355–368. Springer.
Paskin, M. A. (2002).Grammatical digrams.Advances in Neural Information Processing Systems, 14(1):91–97.
Rasooli, M. S. and Collins, M. (2015).Density-driven cross-lingual transfer of dependency parsers.In Proceedings of the 2015 Conference on Empirical Methods in NaturalLanguage Processing, pages 328–338, Lisbon, Portugal. Association forComputational Linguistics.
Smith, N. A. and Eisner, J. (2005).Guiding unsupervised grammar induction using contrastive estimation.In Proceedings of IJCAI Workshop on Grammatical Inference Applications, pages73–82.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References X
Spitkovsky, V. I., Alshawi, H., Chang, A. X., and Jurafsky, D. (2011).Unsupervised dependency parsing without gold part-of-speech tags.In Proceedings of the conference on empirical methods in natural languageprocessing, pages 1281–1290. Association for Computational Linguistics.
Spitkovsky, V. I., Alshawi, H., and Jurafsky, D. (2012).Bootstrapping dependency grammar inducers from incomplete sentencefragments via austere models.In ICGI, pages 189–194.
Spitkovsky, V. I., Alshawi, H., and Jurafsky, D. (2013).Breaking out of local optima with count transforms and model recombination: Astudy in grammar induction.In Proceedings of the 2013 Conference on Empirical Methods in NaturalLanguage Processing, pages 1983–1995, Seattle, Washington, USA. Associationfor Computational Linguistics.
Tackstrom, O., McDonald, R., and Nivre, J. (2013).Target language adaptation of discriminative transfer parsers.Transactions for ACL.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References XI
Tesniere, L. (1959).Elements de syntaxe structurale.Librairie C. Klincksieck.
Tiedemann, J., Agic, v., and Nivre, J. (2014).Treebank translation for cross-lingual parser induction.In Proceedings of the Eighteenth Conference on Computational NaturalLanguage Learning, pages 130–140, Ann Arbor, Michigan. Association forComputational Linguistics.
Tu, K. and Honavar, V. (2012).Unambiguity regularization for unsupervised learning of probabilistic grammars.In Proceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning, pages1324–1334. Association for Computational Linguistics.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing
IntroductionFully Unsupervised Parsing Models
Syntactic Transfer ModelsConclusion
References XII
Xiao, M. and Guo, Y. (2015).Annotation projection-based representation learning for cross-lingual dependencyparsing.In Proceedings of the Nineteenth Conference on Computational NaturalLanguage Learning, pages 73–82, Beijing, China. Association for ComputationalLinguistics.
Zhang, Y. and Barzilay, R. (2015).Hierarchical low-rank tensors for multilingual transfer parsing.In Conference on Empirical Methods in Natural Language Processing (EMNLP),Lisbon, Portugal.
Mohammad Sadegh Rasooli Methods in Unsupervised Dependency Parsing