Recent Advances in Dependency Parsing - cl.lingfil.uu.senivre/docs/eacl3.pdfRecent Advances in...

Post on 21-Jun-2020

4 views 0 download

transcript

Recent Advances in Dependency Parsing

Tutorial, EACL, April 27th, 2014

Ryan McDonald1 Joakim Nivre2

1Google Inc., USA/UKE-mail: ryanmcd@google.com

2Uppsala University, SwedenE-mail: joakim.nivre@lingfil.uu.se

Recent Advances in Dependency Parsing 1(54)

Introduction

Overview of the Tutorial

I Introduction to Dependency Parsing (Joakim)

I Graph-based parsing post-2008 (Ryan)

I Transition-based parsing post-2008 (Joakim)

I Summary and final thoughts (Ryan)

Recent Advances in Dependency Parsing 2(54)

Introduction

Transition-Based Dependency Parsing

Configuration: (S ,B,A)

Initial: ([ ], [0, 1, . . . , n], { })Terminal: (S , [ ],A)

Shift: (S , i |B,A) ⇒ (S |i ,B,A)

Reduce: (S |i ,B,A) ⇒ (S ,B,A)

Right-Arc(k): (S |i , j |B,A) ⇒ (S |i |j ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i , j |B,A) ⇒ (S , j |B,A ∪ {(j , i , k)})

⇔Economic news had little effect on financial markets .

adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

Recent Advances in Dependency Parsing 3(54)

Introduction

Overview

I Improved learning and inferenceI Beam search and structured predictionI Dynamic programmingI Easy-first parsingI Dynamic oracles

I Non-projective parsingI Online reorderingI Multiplanar parsing

I Joint morphological and syntactic analysis

Recent Advances in Dependency Parsing 4(54)

Transition-Based Dependency Parsing

The Basic Idea

I Define a transition system for dependency parsing

I Learn a model for scoring possible transitions

I Parse by searching for the optimal transition sequence

Recent Advances in Dependency Parsing 5(54)

Transition-Based Dependency Parsing

Arc-Eager Transition System [Nivre 2003]

Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]

Initial: ([ ], [0, 1, . . . , n], { })Terminal: (S , [ ],A)

Shift: (S , i |B,A) ⇒ (S |i ,B,A)

Reduce: (S |i ,B,A) ⇒ (S ,B,A) h(i ,A)

Right-Arc(k): (S |i , j |B,A) ⇒ (S |i |j ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i , j |B,A) ⇒ (S , j |B,A ∪ {(j , i , k)}) ¬h(i ,A) ∧ i 6= 0

Notation: S|i = stack with top i and remainder S

j |B = buffer with head j and remainder B

h(i ,A) = i has a head in A

Recent Advances in Dependency Parsing 6(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT]S [Economic, news, had, little, effect, on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

proot

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, Economic]S [news, had, little, effect, on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

proot

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT]S [news, had, little, effect, on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod

nsubj

dobj

amod prep

pmod

amod

proot

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, news]S [had, little, effect, on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod

nsubj

dobj

amod prep

pmod

amod

proot

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT]S [had, little, effect, on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

proot

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had]S [little, effect, on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had, little]S [effect, on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had]S [effect, on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod

prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had, effect]S [on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod

prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had, effect, on]S [financial, markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had, effect, on, financial]S [markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had, effect, on]S [markets, .]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had, effect, on, markets]S [.]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had, effect, on]S [.]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had, effect]S [.]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had]S [.]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

p

root

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Example Transition Sequence

[ROOT, had, .]S [ ]B

ROOT Economic news had little effect on financial markets .adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod prep

pmod

amod

proot

Recent Advances in Dependency Parsing 7(54)

Transition-Based Dependency Parsing

Arc-Standard Transition System [Nivre 2004]

Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]

Initial: ([ ], [0, 1, . . . , n], { })Terminal: ([0], [ ],A)

Shift: (S , i |B,A) ⇒ (S |i ,B,A)

Right-Arc(k): (S |i |j ,B,A) ⇒ (S |i ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,A) ⇒ (S |j ,B,A ∪ {(j , i , k)}) i 6= 0

Recent Advances in Dependency Parsing 8(54)

Transition-Based Dependency Parsing

Greedy Inference

I Given an oracle o that correctly predicts the next transitiono(c), parsing is deterministic:

Parse(w1, . . . ,wn)1 c ← ([ ]S , [0, 1, . . . , n]B , { })2 while Bc 6= [ ]3 t ← o(c)4 c ← t(c)5 return G = ({0, 1, . . . , n},Ac)

I Complexity given by upper bound on number of transitions

I Parsing in O(n) time for the arc-eager transition system

Recent Advances in Dependency Parsing 9(54)

Transition-Based Dependency Parsing

From Oracles to Classifiers

I An oracle can be approximated by a (linear) classifier:

o(c) = argmaxt

w · f(c, t)

I History-based feature representation f(c, t)

I Weight vector w learned from treebank data

Recent Advances in Dependency Parsing 10(54)

Transition-Based Dependency Parsing

Feature Representation

I Features over input tokens relative to S and B

I Features over the (partial) dependency graph defined by A

I Features over the (partial) transition sequence

Configuration Features

[ROOT, had, effect]S [on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod

root

pos(S2) = ROOT

pos(S1) = verbpos(S0) = nounpos(B0) = preppos(B1) = adjpos(B2) = noun

I Feature representation unconstrained by parsing algorithm

Recent Advances in Dependency Parsing 11(54)

Transition-Based Dependency Parsing

Feature Representation

I Features over input tokens relative to S and B

I Features over the (partial) dependency graph defined by A

I Features over the (partial) transition sequence

Configuration Features

[ROOT, had, effect]S [on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod

root

word(S2) = ROOT

word(S1) = hadword(S0) = effectword(B0) = onword(B1) = financialword(B2) = markets

I Feature representation unconstrained by parsing algorithm

Recent Advances in Dependency Parsing 11(54)

Transition-Based Dependency Parsing

Feature Representation

I Features over input tokens relative to S and B

I Features over the (partial) dependency graph defined by A

I Features over the (partial) transition sequence

Configuration Features

[ROOT, had, effect]S [on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod

root

dep(S1) = rootdep(lc(S1)) = nsubjdep(rc(S1)) = dobjdep(S0) = dobjdep(lc(S0) = amoddep(rc(S0) = NIL

I Feature representation unconstrained by parsing algorithm

Recent Advances in Dependency Parsing 11(54)

Transition-Based Dependency Parsing

Feature Representation

I Features over input tokens relative to S and B

I Features over the (partial) dependency graph defined by A

I Features over the (partial) transition sequence

Configuration Features

[ROOT, had, effect]S [on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod

root

ti−1 = Right-Arc(dobj)ti−2 = Left-Arc(amod)ti−3 = Shiftti−4 = Right-Arc(root)ti−5 = Left-Arc(nsubj)ti−6 = Shift

I Feature representation unconstrained by parsing algorithm

Recent Advances in Dependency Parsing 11(54)

Transition-Based Dependency Parsing

Feature Representation

I Features over input tokens relative to S and B

I Features over the (partial) dependency graph defined by A

I Features over the (partial) transition sequence

Configuration Features

[ROOT, had, effect]S [on, financial, markets, .]B

ROOT Economic news had little effect on financial markets .ROOT adj noun verb adj noun prep adj noun .

amod nsubj

dobj

amod

root

ti−1 = Right-Arc(dobj)ti−2 = Left-Arc(amod)ti−3 = Shiftti−4 = Right-Arc(root)ti−5 = Left-Arc(nsubj)ti−6 = Shift

I Feature representation unconstrained by parsing algorithm

Recent Advances in Dependency Parsing 11(54)

Transition-Based Dependency Parsing

Local Learning

I Given a treebank:I Reconstruct oracle transition sequence for each sentenceI Construct training data set D = {(c, t) | o(c) = t}I Maximize accuracy of local predictions o(c) = t

I Any (unstructured) classifier will do (SVMs are popular)

I Training is local and restricted to oracle configurations

Recent Advances in Dependency Parsing 12(54)

Transition-Based Dependency Parsing

Greedy, Local, Transition-Based Parsing

I Advantages:I Highly efficient parsing – linear time complexity with constant

time oracles and transitionsI Rich history-based feature representations – no rigid

constraints from inference algorithm

I Drawback:I Sensitive to search errors and error propagation due to greedy

inference and local learning

I The major question in transition-based parsing has been howto improve learning and inference, while maintaining highefficiency and rich feature models

Recent Advances in Dependency Parsing 13(54)

Improved Learning and Inference

Beam Search

I Maintain the k best hypotheses [Johansson and Nugues 2006]:

Parse(w1, . . . ,wn)1 Beam ← {([ ]S , [0, 1, . . . , n]B , { })}2 while ∃c ∈ Beam [Bc 6= [ ]]3 foreach c ∈ Beam4 foreach t5 Add(t(c), NewBeam)6 Beam ← Top(k, NewBeam)7 return G = ({0, 1, . . . , n},ATop(1, Beam))

I Note:I Score(c0, . . . , cm) =

∑mi=1 w · f(ci−1, ti )

I Simple combination of locally normalized classifier scoresI Marginal gains in accuracy

Recent Advances in Dependency Parsing 14(54)

Improved Learning and Inference

Structured Prediction

I Parsing as structured prediction [Zhang and Clark 2008]:I Minimize loss over entire transition sequenceI Use beam search to find highest-scoring sequence

I Factored feature representations:

f(c0, . . . , cm) =m∑i=1

f(ci−1, ti )

I Online learning from oracle transition sequences:I Structured perceptron [Collins 2002]I Early update [Collins and Roark 2004]I Max-violation update [Huang et al. 2012]

Recent Advances in Dependency Parsing 15(54)

Improved Learning and Inference

Beam Size and Training Iterations

[Zhang and Clark 2008]

Recent Advances in Dependency Parsing 16(54)

Improved Learning and Inference

The Best of Two Worlds?

I Like graph-based dependency parsing (MSTParser):I Global learning – minimize loss over entire sentenceI Non-greedy search – accuracy increases with beam size

I Like (old school) transition-based parsing (MaltParser):I Highly efficient – complexity still linear for fixed beam sizeI Rich features – no constraints from parsing algorithm

Recent Advances in Dependency Parsing 17(54)

Improved Learning and Inference

Precision by Dependency Length

2 4 6 8 10 12 14

0.4

0.5

0.6

0.7

0.8

0.9 MSTMaltZPar

[Zhang and Nivre 2012]

Recent Advances in Dependency Parsing 18(54)

Improved Learning and Inference

Even Richer Feature Models

ZPar Malt

Baseline 92.18 89.37+distance +0.07 –0.14+valency +0.24 0.00+unigrams +0.40 –0.29+third-order +0.18 0.00+label set +0.07 +0.06Extended 93.14 89.00

[Zhang and Nivre 2011, Zhang and Nivre 2012]

I Adding graph-based features may require special techniques[Zhang and Clark 2008, Bohnet and Kuhn 2012]

Recent Advances in Dependency Parsing 19(54)

Improved Learning and Inference

Dynamic Programming

I If beam search reduces search errors, why not exact inference?I Dynamic programming for transition-based parsers:

I Using a graph-structured stack [Huang and Sagae 2010]I Using push-computations [Kuhlmann et al. 2011]

I Adds constraints on feature representations

Recent Advances in Dependency Parsing 20(54)

Improved Learning and Inference

Deduction System for Arc-Eager Parsing

Items: [ib, j ] ⇔ (S , i |B,A)⇒∗ (S |i , j |B ′,A′)

b =

{1 if [[h(i) ∈ A′]]

0 otherwise

Goal: [00, n + 1]

Axiom: [00, 1]

Rules: Shift: [ib, j ]⇒ [j0, j + 1]

Reduce: [ib,m] ∧ [m1, j ]⇒ [ib, j ]

Right-Arc: [ib, j ]⇒ [j1, j + 1]

Left-Arc: [ib,m] ∧ [m0, j ]⇒ [ib, j ]

[Kuhlmann et al. 2011]

Recent Advances in Dependency Parsing 21(54)

Improved Learning and Inference

Theory and Practice

I Theoretical results:I Arc-eager parsing in O(n3) time (cf. Eisner)I Arc-standard parsing in O(n5) time (cf. CKY)

I In practice:I Results hold only for very simplistic feature modelsI Practical implementations use beam searchI Benefits from ambiguity packing

2370

2373

2376

2379

2382

2385

2388

2391

2394

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

avg.

model

sco

re

b=16 b=64

DPnon-DP

92.2

92.3

92.4

92.5

92.6

92.7

92.8

92.9

93

93.1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

dep

enden

cy a

ccura

cy

b=16 b=64

DPnon-DP

(a) search quality vs. time (full model) (b) parsing accuracy vs. time (full model)

2290

2295

2300

2305

2310

2315

2320

2325

2330

2335

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

avg. m

odel

sco

re

b=16b=64

DPnon-DP

88.5

89

89.5

90

90.5

91

91.5

92

92.5

93

93.5

2280 2300 2320 2340 2360 2380 2400

dep

enden

cy a

ccura

cy

full, DPfull, non-DP

edge-factor, DPedge-factor, non-DP

(c) search quality vs. time (edge-factored model) (d) correlation b/w parsing (y) and search (x)

Figure 5: Speed comparisons between DP and non-DP, with beam size b ranging 2�16 for DP and 2�64for non-DP. Speed is measured by avg. parsing time (secs) per sentence on x axis. With the same levelof search quality or parsing accuracy, DP (at b=16) is �4.8 times faster than non-DP (at b=64) with thefull model in plots (a)-(b), or�8 times faster with the simplified edge-factored model in plot (c). Plot (d)shows the (roughly linear) correlation between parsing accuracy and search quality (avg. model score).

100

102

104

106

108

1010

1012

0 10 20 30 40 50 60 70

num

ber

of

tree

s ex

plo

red

sentence length

DP forestnon-DP (16)

93

94

95

96

97

98

99

64 32 16 8 4 1

ora

cle

pre

cisi

on

k

DP forest (98.15)DP k-best in forest

non-DP k-best in beam

(a) sizes of search spaces (b) oracle precision on dev

Figure 6: DP searches over a forest of exponentially many trees, which also produces better and longerk-best lists with higher oracles, while non-DP only explores b trees allowed in the beam (b = 16 here).

1083

2370

2373

2376

2379

2382

2385

2388

2391

2394

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

avg.

model

sco

re

b=16 b=64

DPnon-DP

92.2

92.3

92.4

92.5

92.6

92.7

92.8

92.9

93

93.1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

dep

enden

cy a

ccura

cy

b=16 b=64

DPnon-DP

(a) search quality vs. time (full model) (b) parsing accuracy vs. time (full model)

2290

2295

2300

2305

2310

2315

2320

2325

2330

2335

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35

avg. m

odel

sco

re

b=16b=64

DPnon-DP

88.5

89

89.5

90

90.5

91

91.5

92

92.5

93

93.5

2280 2300 2320 2340 2360 2380 2400

dep

enden

cy a

ccura

cy

full, DPfull, non-DP

edge-factor, DPedge-factor, non-DP

(c) search quality vs. time (edge-factored model) (d) correlation b/w parsing (y) and search (x)

Figure 5: Speed comparisons between DP and non-DP, with beam size b ranging 2�16 for DP and 2�64for non-DP. Speed is measured by avg. parsing time (secs) per sentence on x axis. With the same levelof search quality or parsing accuracy, DP (at b=16) is �4.8 times faster than non-DP (at b=64) with thefull model in plots (a)-(b), or�8 times faster with the simplified edge-factored model in plot (c). Plot (d)shows the (roughly linear) correlation between parsing accuracy and search quality (avg. model score).

100

102

104

106

108

1010

1012

0 10 20 30 40 50 60 70

num

ber

of

tree

s ex

plo

red

sentence length

DP forestnon-DP (16)

93

94

95

96

97

98

99

64 32 16 8 4 1

ora

cle

pre

cisi

on

k

DP forest (98.15)DP k-best in forest

non-DP k-best in beam

(a) sizes of search spaces (b) oracle precision on dev

Figure 6: DP searches over a forest of exponentially many trees, which also produces better and longerk-best lists with higher oracles, while non-DP only explores b trees allowed in the beam (b = 16 here).

1083

[Huang and Sagae 2010]

Recent Advances in Dependency Parsing 22(54)

Improved Learning and Inference

The Need for Speed

I Beam search helps but slows down the parser

I Dynamic programming in addition constrains feature modelI What can we do to maintain the highest speed?

I Easy-first parsing – give up left-to-right incremental searchI Dynamic oracles – learn how to recover from errors

I These two ideas can be combined

Recent Advances in Dependency Parsing 23(54)

Improved Learning and Inference

Easy-First Non-Directional Parsing

I Process dependencies from easy to hard (not left to right) andfrom local to global (bottom up) [Goldberg and Elhadad 2010]

Configuration: (L,A) [L = List, A = Arcs]

Initial: ([0, 1, . . . , n], { })Terminal: ([0],A)

Attach-Right(i , k):([v1, . . . , vm],A) ⇒ ([v1, . . . , vi−1, vi+1, . . . , vm],A ∪ {(vi+1, vi , k)})Attach-Left(i , k):([v1, . . . , vm],A) ⇒ ([v1, . . . , vi , vi+2, . . . , vm],A ∪ {(vi , vi+1, k)})

Recent Advances in Dependency Parsing 24(54)

Improved Learning and Inference

Parsing Algorithm

I Given an oracle o that selects the highest-confidencetransition o(c), parsing is deterministic:

Parse(w1, . . . ,wn)1 c ← ([0, 1, . . . , n], { })2 while length(Lc) > 13 t ← o(c)4 c ← t(c)5 return G = ({0, 1, . . . , n},Ac)

I Number of possible transitions grows with sentence length

I Parsing in O(n log n) time with priority heap

Recent Advances in Dependency Parsing 25(54)

Improved Learning and Inference

Parsing Example

(1) ATTACHRIGHT(2)

a brown fox jumped with joy

-157

-27

-68

403

-197

-47

-152

-243

231

3

(2) ATTACHRIGHT(1)

a fox

brown

jumped with joy-52

314

-159

0

-176

-146

246

12

(3) ATTACHRIGHT(1)

fox

a brown

jumped with joy

-133

270

-149

-154

246

10

(4) ATTACHLEFT(2)

jumped

fox

a brown

with joy

-161

-435

186

-2

(5) ATTACHLEFT(1)

jumped

fox

a brown

with

joy

430

-232

(6)

jumped

fox

a brown

with

joy

Figure 1: Parsing the sentence “a brown fox jumped with joy”. Rounded arcs represent possible actions.

tionally intensive sampling-based methods (Naka-gawa, 2007). As a result, these models, while accu-rate, are slow (O(n3) for projective, first-order mod-els, higher polynomials for higher-order models, andworse for richer tree-feature models).

We propose a new category of dependency pars-ing algorithms, inspired by (Shen et al., 2007): non-directional easy-first parsing. This is a greedy, de-terministic parsing approach, which relaxes the left-to-right processing order of transition-based pars-ing algorithms. By doing so, we allow the ex-plicit incorporation of rich structural features de-rived from both sides of the attachment point, andimplicitly take into account the entire previously de-rived structure of the whole sentence. This exten-sion allows the incorporation of much richer featuresthan those available to transition- and especially tograph-based parsers, and greatly reduces the local-ity of transition-based algorithm decisions. On theother hand, it is still a greedy, best-first algorithmleading to an efficient implementation.

We present a concrete O(nlogn) parsing algo-rithm, which significantly outperforms state-of-the-art transition-based parsers, while closing the gap tograph-based parsers.2 Easy-first parsingWhen humans comprehend a natural language sen-tence, they arguably do it in an incremental, left-to-

right manner. However, when humans consciouslyannotate a sentence with syntactic structure, theyhardly ever work in fixed left-to-right order. Rather,they start by building several isolated constituentsby making easy and local attachment decisions andonly then combine these constituents into biggerconstituents, jumping back-and-forth over the sen-tence and proceeding from easy to harder phenom-ena to analyze. When getting to the harder decisionsa lot of structure is already in place, and this struc-ture can be used in deciding a correct attachment.

Our parser follows a similar kind of annotationprocess: starting from easy attachment decisions,and proceeding to harder and harder ones. Whenmaking later decisions, the parser has access to theentire structure built in earlier stages. During thetraining process, the parser learns its own notion ofeasy and hard, and learns to defer specific kinds ofdecisions until more structure is available.

3 Parsing algorithm

Our (projective) parsing algorithm builds the parsetree bottom up, using two kinds of actions: AT-TACHLEFT(i) and ATTACHRIGHT(i) . Theseactions are applied to a list of partial structuresp1, . . . , pk, called pending, which is initialized withthe n words of the sentence w1, . . . , wn. Each ac-

[Goldberg and Elhadad 2010]

Recent Advances in Dependency Parsing 26(54)

Improved Learning and Inference

Oracles Revisited

I How do we train the easy-first parser?I Recall our training procedure for greedy parsers:

I Reconstruct oracle transition sequence for each sentenceI Construct training data set D = {(c, t) | o(c) = t}I Maximize accuracy of local predictions o(c) = t

I Presupposes a unique optimal transition for each configurationI Does not make sense for the easy-first parserI Turns out to be a bad idea in general

Recent Advances in Dependency Parsing 27(54)

Improved Learning and Inference

Online Learning with a Conventional Oracle

Learn({T1, . . . ,TN})1 w← 0.02 for i in 1..K3 for j in 1..N4 c ← ([ ], [0, 1, . . . , nj ], { })5 while Bc 6= [ ]6 t∗ ← argmaxt w · f(c, t)7 to ← o(c,Ti )8 if t∗ 6= to9 w← w + f(c, to)− f(c, t∗)

10 c ← to(c)11 return w

I Oracle o(c,Ti ) returns the optimal transition for c and Ti

Recent Advances in Dependency Parsing 28(54)

Improved Learning and Inference

Online Learning with a Conventional Oracle

Learn({T1, . . . ,TN})1 w← 0.02 for i in 1..K3 for j in 1..N4 c ← ([ ], [0, 1, . . . , nj ], { })5 while Bc 6= [ ]6 t∗ ← argmaxt w · f(c, t)7 to ← o(c,Ti )8 if t∗ 6= to9 w← w + f(c, to)− f(c, t∗)

10 c ← to(c)11 return w

I Oracle o(c,Ti ) returns the optimal transition for c and Ti

Recent Advances in Dependency Parsing 28(54)

Improved Learning and Inference

Conventional Oracle for Arc-Eager Parsing

o(c,T ) =

Left-Arc if top(Sc) ← first(Bc) in TRight-Arc if top(Sc) → first(Bc) in TReduce if ∃v < top(Sc) : v ↔ first(Bc) in TShift otherwise

I Correct:I Derives T in a configuration sequence Co,T = c0, . . . , cm

I Problems:I Deterministic: Ignores other derivations of TI Incomplete: Valid only for configurations in Co,T

Recent Advances in Dependency Parsing 29(54)

Improved Learning and Inference

Oracle Parse

Transitions:

Stack Buffer Arcs

[ ] [ROOT, He, sent, her, a, letter, .]

ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH

Stack Buffer Arcs

[ROOT] [He, sent, her, a, letter, .]

ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA

Stack Buffer Arcs

[ROOT, He] [sent, her, a, letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA-LA

Stack Buffer Arcs

[ROOT] [sent, her, a, letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA-LA-SH

Stack Buffer Arcs

[ROOT, sent] [her, a, letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA-LA-SH-RA

Stack Buffer Arcs

[ROOT, sent, her] [a, letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA-LA-SH-RA-SH

Stack Buffer Arcs

[ROOT, sent, her, a] [letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA-LA-SH-RA-SH-LA

Stack Buffer Arcs

[ROOT, sent, her] [letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA-LA-SH-RA-SH-LA-RE

Stack Buffer Arcs

[ROOT, sent] [letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA-LA-SH-RA-SH-LA-RE-RA

Stack Buffer Arcs

[ROOT, sent, letter] [.] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA-LA-SH-RA-SH-LA-RE-RA-RE

Stack Buffer Arcs

[ROOT, sent] [.] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Oracle Parse

Transitions: SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

Stack Buffer Arcs

[ROOT, sent, .] [ ] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT He sent her a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 30(54)

Improved Learning and Inference

Non-Determinisim

Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-RA

Stack Buffer Arcs

[ROOT, sent, her] [a, letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 31(54)

Improved Learning and Inference

Non-Determinisim

Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-RA-RE

Stack Buffer Arcs

[ROOT, sent] [a, letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 31(54)

Improved Learning and Inference

Non-Determinisim

Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-RA-RE-SH

Stack Buffer Arcs

[ROOT, sent, a] [letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 31(54)

Improved Learning and Inference

Non-Determinisim

Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-RA-RE-SH-LA

Stack Buffer Arcs

[ROOT, sent] [letter, .] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 31(54)

Improved Learning and Inference

Non-Determinisim

Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-RA-RE-SH-LA-RA

Stack Buffer Arcs

[ROOT, sent, letter] [.] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 31(54)

Improved Learning and Inference

Non-Determinisim

Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-RA-RE-SH-LA-RA-RE

Stack Buffer Arcs

[ROOT, sent] [.] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 31(54)

Improved Learning and Inference

Non-Determinisim

Transitions:SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-RA-RE-SH-LA-RA-RE-RA

Stack Buffer Arcs

[ROOT, sent, .] [ ] ROOTroot−→ sent

Hesbj←− sent

sentiobj−→ her

adet←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 31(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH

Stack Buffer Arcs

[ROOT, sent] [her, a, letter, .] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH

Stack Buffer Arcs

[ROOT, sent, her] [a, letter, .] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH-SH

Stack Buffer Arcs

[ROOT, sent, her, a] [letter, .] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH-SH-LA

Stack Buffer Arcs

[ROOT, sent, her] [letter, .] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH-SH-LA-SH

Stack Buffer Arcs

[ROOT, sent, her, letter] [.] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]

Stack Buffer Arcs

[ROOT, sent, letter, .] [ ] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]

SH-RA-LA-SH-SH-SH-LA

Stack Buffer Arcs

[ROOT, sent, her] [letter, .] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]

SH-RA-LA-SH-SH-SH-LA-LA

Stack Buffer Arcs

[ROOT, sent] [letter, .] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]

SH-RA-LA-SH-SH-SH-LA-LA-RA

Stack Buffer Arcs

[ROOT, sent, letter] [.] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]

SH-RA-LA-SH-SH-SH-LA-LA-RA-RE

Stack Buffer Arcs

[ROOT, sent] [.] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Non-Optimality

Transitions:

SH-RA-LA-SH-RA-SH-LA-RE-RA-RE-RA

SH-RA-LA-SH-SH-SH-LA-SH-SH [3/6]

SH-RA-LA-SH-SH-SH-LA-LA-RA-RE-RA [5/6]

Stack Buffer Arcs

[ROOT, sent, .] [ ] ROOTroot−→ sent

Hesbj←− sent

adet←− letter

her?←− letter

sentdobj−→ letter

sentp−→ .

ROOT She sent him a letter .ROOT pron verb pron det noun .

root

nsubj iobj det

dobj

p

Recent Advances in Dependency Parsing 32(54)

Improved Learning and Inference

Dynamic Oracles

I Optimality:I A transition is optimal if the best tree remains reachableI Best tree = argminT ′ L(T ,T ′)

I Oracle:I Boolean function o(c, t,T ) = true if t is optimal for c and TI Non-deterministic: More than one transition can be optimalI Complete: Correct for all configurations

I New problem:I How do we know which trees are reachable?

Recent Advances in Dependency Parsing 33(54)

Improved Learning and Inference

Reachability for Arcs and Trees

I Arc reachability:I An arc wi → wj is reachable in c iff wi → wj ∈ Ac ,

or wi ∈ Sc ∪ Bc and wj ∈ Bc (same for wi ← wj)

I Tree reachability:I A (projective) tree T is reachable in c iff every arc in T is

reachable in c

I Arc-decomposable systems [Goldberg and Nivre 2013]:I Tree reachability reduces to arc reachabilityI Holds for some transition systems but not all

I Arc-eager and easy-first are arc-decomposableI Arc-standard is not decomposable

Recent Advances in Dependency Parsing 34(54)

Improved Learning and Inference

Oracles for Arc-Decomposable Systems

o(c, t,T ) =

{true if [R(c) − R(t(c))] ∩ T = ∅false otherwise

where R(c) ≡ {a | a is an arc reachable in c }

Arc-Eager

o(c, LA,T ) =

{false if ∃w ∈ Bc : s ↔ w ∈ T (except s ← b)true otherwise

o(c,RA,T ) =

{false if ∃w ∈ Sc : w ↔ b ∈ T (except s → b)true otherwise

o(c,RE,T ) =

{false if ∃w ∈ Bc : s → w ∈ Ttrue otherwise

o(c, SH,T ) =

{false if ∃w ∈ Sc : w ↔ b ∈ Ttrue otherwise

Notation: s = node on top of the stack S

b = first node in the buffer B

Recent Advances in Dependency Parsing 35(54)

Improved Learning and Inference

Online Learning with a Dynamic Oracle

Learn({T1, . . . ,TN})1 w← 0.02 for i in 1..K3 for j in 1..N4 c ← ([ ]S , [w1, . . . ,wnj ]B , { })5 while Bc 6= [ ]6 t∗ ← argmaxt w · f(c, t)7 to ← argmaxt∈{t|o(c,t,Ti )}w · f(c, t)8 if t∗ 6= to9 w← w + f(c, to)− f(c, t∗)

10 c ← choice(to(c), t∗(c))11 return w

I Ambiguity: use model score to break ties

I Exploration: follow model prediction even if not optimal

Recent Advances in Dependency Parsing 36(54)

Improved Learning and Inference

Online Learning with a Dynamic Oracle

Learn({T1, . . . ,TN})1 w← 0.02 for i in 1..K3 for j in 1..N4 c ← ([ ]S , [w1, . . . ,wnj ]B , { })5 while Bc 6= [ ]6 t∗ ← argmaxt w · f(c, t)7 to ← argmaxt∈{t|o(c,t,Ti )}w · f(c, t)8 if t∗ 6= to9 w← w + f(c, to)− f(c, t∗)

10 c ← choice(to(c), t∗(c))11 return w

I Ambiguity: use model score to break ties

I Exploration: follow model prediction even if not optimal

Recent Advances in Dependency Parsing 36(54)

Improved Learning and Inference

[Goldberg and Nivre 2012]

Recent Advances in Dependency Parsing 37(54)

Improved Learning and Inference

Ambiguity and Exploration

I Lessons from dynamic oracles:I Do not hide spurious ambiguity from the parser – exploit itI Let the parser explore the consequences of its own mistakes

I Related work:I Bootstrapping [Choi and Palmer 2011]I Selectional branching [Choi and McCallum 2013]I Non-monotonic parsing [Honnibal et al. 2013]I Dynamic parsing strategy [Sartorio et al. 2013]

Recent Advances in Dependency Parsing 38(54)

Improved Learning and Inference

Summary: Learning and Inference

I Beam search and structured prediction:I Explores a larger search space at training and parsing timeI Can be combined with dynamic programming

I Dynamic oracles:I Explores a larger search space only at training timeI Can be combined with selectional branching and with flexible

transition systems (easy-first, dynamic, non-monotonic)

Recent Advances in Dependency Parsing 39(54)

Non-Projective Parsing

Non-Projective Parsing

I So far only projective parsing modelsI Non-projective parsing harder even with greedy inference

I Non-projective: n(n − 1) arcs to consider – O(n2)

I Projective: at most 2(n − 1) arcs to consider – O(n)

I Also harder to construct dynamic oraclesI Conjecture: arc-decomposability presupposes projectivity

Recent Advances in Dependency Parsing 40(54)

Non-Projective Parsing

Previous Approaches

I Pseudo-projective parsing [Nivre and Nilsson 2005]

I Preprocess training data, post-process parser outputI Approximate encoding with incomplete coverageI Relatively high precision but low recall

I Extended arc transitions [Attardi 2006]

I Transitions that add arcs between non-adjacent subtreesI Upper bound on arc degree (limited to local relations)I Exact dynamic programming algorithm [Cohen et al. 2011]

I List-based algorithms [Covington 2001, Nivre 2007]

I Consider all word pairs instead of adjacent subtreesI Increases parsing complexity (and training time)I Improved accuracy and efficiency by adding “projective

transitions” [Choi and Palmer 2011]

Recent Advances in Dependency Parsing 41(54)

Non-Projective Parsing

Novel Approaches

I Online reordering [Nivre 2009, Nivre et al. 2009]:I Reorder words during parsing to make tree projectiveI Add a special transition for swapping adjacent wordsI Quadratic time in the worst case but linear in the best case

I Multiplanar parsing [Gomez-Rodrıguez and Nivre 2010]:I Factor dependency trees into k planes without crossing arcsI Use k stacks to parse each plane separatelyI Linear time parsing with constant k

Recent Advances in Dependency Parsing 42(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Projectivity and Word Order

I Projectivity is a property of a dependency tree only in relationto a particular word order

I Words can always be reordered to make the tree projectiveI Given a dependency tree T = (V ,A, <), let the projective

order <p be the order defined by an inorder traversal of T withrespect to < [Vesela et al. 2004]

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

0 1 2 6 7 3 4 5 8 9

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 43(54)

Non-Projective Parsing

Transition System for Online Reordering

Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]

Initial: ([ ], [0, 1, . . . , n], { })Terminal: ([0], [ ],A)

Shift: (S , i |B,A) ⇒ (S |i ,B,A)

Right-Arc(k): (S |i |j ,B,A) ⇒ (S |i ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,A) ⇒ (S |j ,B,A ∪ {(j , i , k)}) i 6= 0

Swap: (S |i |j ,B,A) ⇒ (S |j , i |B,A) 0 < i < j

I Transition-based parsing with two interleaved processes:

1. Sort words into projective order <p

2. Build tree T by connecting adjacent subtrees

I T is projective with respect to <p but not (necessarily) <

Recent Advances in Dependency Parsing 44(54)

Non-Projective Parsing

Transition System for Online Reordering

Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]

Initial: ([ ], [0, 1, . . . , n], { })Terminal: ([0], [ ],A)

Shift: (S , i |B,A) ⇒ (S |i ,B,A)

Right-Arc(k): (S |i |j ,B,A) ⇒ (S |i ,B,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,A) ⇒ (S |j ,B,A ∪ {(j , i , k)}) i 6= 0

Swap: (S |i |j ,B,A) ⇒ (S |j , i |B,A) 0 < i < j

I Transition-based parsing with two interleaved processes:

1. Sort words into projective order <p

2. Build tree T by connecting adjacent subtrees

I T is projective with respect to <p but not (necessarily) <

Recent Advances in Dependency Parsing 44(54)

Non-Projective Parsing

Example Transition Sequence

[ ]S [ROOT, A, hearing, is, scheduled, on, the, issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT]S [A, hearing, is, scheduled, on, the, issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A]S [hearing, is, scheduled, on, the, issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing]S [is, scheduled, on, the, issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing]S [is, scheduled, on, the, issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, is]S [scheduled, on, the, issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, is, scheduled]S [on, the, issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, scheduled]S [on, the, issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, scheduled, on]S [the, issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, scheduled, on, the]S [issue, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, scheduled, on, the, issue]S [today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, scheduled, on, issue]S [today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, scheduled, on]S [today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

pobj

det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, on]S [scheduled, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

pobj

det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing]S [scheduled, today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

prep

pobj

det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, scheduled]S [today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

prep

pobj

det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled]S [today, .]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

nsubj

prep

pobj

det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled, today]S [.]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

nsubj

prep

pobj

det

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled]S [.]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

nsubj

prep

pobj

det

tmod

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled, .]S [ ]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

nsubj

prep

pobj

det

tmod

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled]S [ ]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT]S [ ]B

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

tmod

Recent Advances in Dependency Parsing 45(54)

Non-Projective Parsing

Analysis

I Correctness:I Sound and complete for the class of non-projective trees

I Complexity for greedy or beam search parsing:I Quadratic running time in the worst caseI Linear running time in the average case

I Works well with beam search and structured prediction

Czech GermanLAS UAS LAS UAS

Projective 80.8 86.3 86.2 88.5Reordering 83.9 89.1 88.7 90.9

[Bohnet and Nivre 2012]

Recent Advances in Dependency Parsing 46(54)

Non-Projective Parsing

Multiplanarity

I Multiplanarity is based on the notion of planarity:I A dependency graph is planar if it has no crossing arcsI A dependency graph is k-planar if it can be decomposed into

(at most) k planar graphs [Yli-Jyra 2003]

I In most treebanks, well over 99% of the trees are at most2-planar [Gomez-Rodrıguez and Nivre 2010]

I We can parse k-planar graphs in linear time using k stacks

Recent Advances in Dependency Parsing 47(54)

Non-Projective Parsing

1-Planar Transition System

Configuration: (S ,B,A) [S = Stack, B = Buffer, A = Arcs]

Initial: ([ ], [0, 1, . . . , n], { })Terminal: (S , [ ],A)

Shift: (S , i |B,A) ⇒ (S |i ,B,A)

Reduce: (S |i ,B,A) ⇒ (S ,B,A)

Right-Arc(k): (S |i , j |B,A) ⇒ (S |i , j |B,A ∪ {(i , j , k)}) ¬h(j ,A)

Left-Arc(k): (S |i , j |B,A) ⇒ (S |i , j |B,A ∪ {(j , i , k)}) ¬h(i ,A) ∧ i 6= 0

I Similar to the arc-eager system except:I Reduce does not require popped node to have a headI Left-Arc/Right-Arc do not affect S or B

Recent Advances in Dependency Parsing 48(54)

Non-Projective Parsing

2-Planar Transition System

Configuration: (S1,S2,B,A) [S1 = Stack 1, S2 = Stack 2]

Initial: ([ ], [ ], [0, 1, . . . , n], { })Terminal: (S1,S2, [ ],A)

Shift: (S1,S2, i |B,A) ⇒ (S1|i ,S2|i ,B,A)

Reduce: (S1|i ,S2,B,A) ⇒ (S1, S2,B,A)

Right-Arc(k): (S1|i ,S2, j |B,A) ⇒ (S1|i ,S2, j |B,A ∪ {(i , j , k)}) ¬h(j ,A)

Left-Arc(k): (S1|i ,S2, j |B,A) ⇒ (S1|i ,S2, j |B,A ∪ {(j , i , k)}) ¬h(i ,A) ∧ i 6= 0

Switch: (S1, S2,B,A) ⇒ (S2, S1,B,A)

I Similar to 1-planar system except:I Shift pushes a node to both stacksI Left-Arc/Right-Arc/Reduce only affect S1

I Switch swaps S1 and S2

Recent Advances in Dependency Parsing 49(54)

Non-Projective Parsing

Example Transition Sequence

[ ]S1 [ROOT, A, hearing, is, scheduled, on, the, issue, today, .]B

[ ]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT]S1 [A, hearing, is, scheduled, on, the, issue, today, .]B

[ROOT]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A]S1 [hearing, is, scheduled, on, the, issue, today, .]B

[ROOT, A]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A]S1 [hearing, is, scheduled, on, the, issue, today, .]B

[ROOT, A]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT]S1 [hearing, is, scheduled, on, the, issue, today, .]B

[ROOT, A]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing]S1 [is, scheduled, on, the, issue, today, .]B

[ROOT, A, hearing]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, is]S1 [scheduled, on, the, issue, today, .]B

[ROOT, A, hearing, is]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing, is]S1 [scheduled, on, the, issue, today, .]B

[ROOT, A, hearing, is]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing]S1 [scheduled, on, the, issue, today, .]B

[ROOT, A, hearing, is]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, hearing]S1 [scheduled, on, the, issue, today, .]B

[ROOT, A, hearing, is]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

nsubj

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT]S1 [scheduled, on, the, issue, today, .]B

[ROOT, A, hearing, is]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

det aux

nsubj

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT]S1 [scheduled, on, the, issue, today, .]B

[ROOT, A, hearing, is]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled]S1 [on, the, issue, today, .]B

[ROOT, A, hearing, is, scheduled]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing, is, scheduled]S1 [on, the, issue, today, .]B

[ROOT, scheduled]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing, is]S1 [on, the, issue, today, .]B

[ROOT, scheduled]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing]S1 [on, the, issue, today, .]B

[ROOT, scheduled]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing]S1 [on, the, issue, today, .]B

[ROOT, scheduled]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing, on]S1 [the, issue, today, .]B

[ROOT, scheduled, on]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing, on, the]S1 [issue, today, .]B

[ROOT, scheduled, on, the]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing, on, the]S1 [issue, today, .]B

[ROOT, scheduled, on, the]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing, on]S1 [issue, today, .]B

[ROOT, scheduled, on, the]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing, on]S1 [issue, today, .]B

[ROOT, scheduled, on, the]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, A, hearing, on, issue]S1 [today, .]B

[ROOT, scheduled, on, the, issue]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled, on, the, issue]S1 [today, .]B

[ROOT, A, hearing, on, issue]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled, on, the]S1 [today, .]B

[ROOT, A, hearing, on, issue]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled, on]S1 [today, .]B

[ROOT, A, hearing, on, issue]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled]S1 [today, .]B

[ROOT, A, hearing, on, issue]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled]S1 [today, .]B

[ROOT, A, hearing, on, issue]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

tmod

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled, today]S1 [.]B

[ROOT, A, hearing, on, issue, today]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

tmod

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled]S1 [.]B

[ROOT, A, hearing, on, issue, today]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

tmod

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled]S1 [.]B

[ROOT, A, hearing, on, issue, today]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 50(54)

Non-Projective Parsing

Example Transition Sequence

[ROOT, scheduled, .]S1 [ ]B

[ROOT, A, hearing, on, issue, today, .]S2

ROOT A hearing is scheduled on the issue today .ROOT det noun verb verb prep det noun adv .

root

det aux

nsubj

prep

pobj

det

tmod

p

Recent Advances in Dependency Parsing 50(54)

Joint Morphological and Syntactic Analysis

Morphology and Syntax

I Morphological analysis in dependency parsing:I Crucially assumed as input, not predicted by the parserI Pipeline approach may lead to error propagationI Most PCFG-based parsers at least predict their own tags

I Recent interest in joint models for morphology and syntax:I Graph-based [McDonald 2006, Lee et al. 2011, Li et al. 2011]I Transition-based [Hatori et al. 2011, Bohnet and Nivre 2012]

I Can improve both morphology and syntax

Recent Advances in Dependency Parsing 51(54)

Joint Morphological and Syntactic Analysis

Transition System for Morphology and Syntax

Configuration: (S ,B,M,A) [M = Morphology]

Initial: ([ ], [0, 1, . . . , n], { }, { })Terminal: ([0], [ ],M,A)

Shift(p): (S , i |B,M,A) ⇒ (S |i ,B,M ∪ {(i ,m)},A)

Right-Arc(k): (S |i |j ,B,M,A) ⇒ (S |i ,B,M,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,M,A) ⇒ (S |j ,B,M,A ∪ {(j , i , k)}) i 6= 0

Swap: (S |i |j ,B,M,A) ⇒ (S |j , i |B,M,A) 0 < i < j

I Transition-based parsing with three interleaved processes:I Assign morphology when words are shifted onto the stackI Optionally sort words into projective order <p

I Build dependency tree T by connecting adjacent subtrees

Recent Advances in Dependency Parsing 52(54)

Joint Morphological and Syntactic Analysis

Transition System for Morphology and Syntax

Configuration: (S ,B,M,A) [M = Morphology]

Initial: ([ ], [0, 1, . . . , n], { }, { })Terminal: ([0], [ ],M,A)

Shift(p): (S , i |B,M,A) ⇒ (S |i ,B,M ∪ {(i ,m)},A)

Right-Arc(k): (S |i |j ,B,M,A) ⇒ (S |i ,B,M,A ∪ {(i , j , k)})Left-Arc(k): (S |i |j ,B,M,A) ⇒ (S |j ,B,M,A ∪ {(j , i , k)}) i 6= 0

Swap: (S |i |j ,B,M,A) ⇒ (S |j , i |B,M,A) 0 < i < j

I Transition-based parsing with three interleaved processes:I Assign morphology when words are shifted onto the stackI Optionally sort words into projective order <p

I Build dependency tree T by connecting adjacent subtrees

Recent Advances in Dependency Parsing 52(54)

Joint Morphological and Syntactic Analysis

Parsing Richly Inflected Languages

I Full morphological analysis: lemma + postag + featuresI Beam search and structured predicationI Parser selects from k best tags + featuresI Rule-based morphology provides additional features

I Evaluation metrics:I PM = morphology (postag + features)I LAS = labeled attachment score

Czech Finnish German Hungarian RussianPM LAS PM LAS PM LAS PM LAS PM LAS

Pipeline 93.0 83.1 88.8 79.9 89.1 91.8 96.1 88.4 92.6 87.4Joint 94.4 83.5 91.6 82.5 91.2 92.1 97.4 89.1 95.1 88.0

[Bohnet et al. 2013]

Recent Advances in Dependency Parsing 53(54)

Conclusion

Summary

I Transition-based parsing:I Efficient parsing using heuristic inferenceI Unconstrained history-based feature models

I Recent advances in synergy:I Beam search and structured predictionI Easy-first parsing and dynamic oraclesI Online reordering for non-projective treesI Joint morphological and syntactic analysis

Recent Advances in Dependency Parsing 54(54)

References and Further Reading

References and Further Reading

I Giuseppe Attardi. 2006. Experiments with a multilanguage non-projectivedependency parser. In Proceedings of the 10th Conference on ComputationalNatural Language Learning (CoNLL), pages 166–170.

I Bernd Bohnet and Jonas Kuhn. 2012. The best of both worlds – a graph-basedcompletion model for transition-based parsers. In Proceedings of the 13thConference of the European Chpater of the Association for ComputationalLinguistics (EACL), pages 77–87.

I Bernd Bohnet and Joakim Nivre. 2012. A transition-based system for jointpart-of-speech tagging and labeled non-projective dependency parsing. InProceedings of the 2012 Joint Conference on Empirical Methods in NaturalLanguage Processing and Computational Natural Language Learning, pages1455–1465.

I Bernd Bohnet, Joakim Nivre, Igor Boguslavsky, Richard Farkas, Filip Ginter, andJan Hajic. 2013. Joint morphological and syntactic analysis for richly inflectedlanguages. Transactions of the Association for Computational Linguistics,1:415–428.

I Jinho D. Choi and Andrew McCallum. 2013. Transition-based dependency parsingwith selectional branching. In Proceedings of the 51st Annual Meeting of theAssociation for Computational Linguistics (Volume 1: Long Papers), pages1052–1062.

Recent Advances in Dependency Parsing 54(54)

References and Further Reading

I Jinho D. Choi and Martha Palmer. 2011. Getting the most out of transition-baseddependency parsing. In Proceedings of the 49th Annual Meeting of the Associationfor Computational Linguistics: Human Language Technologies, pages 687–692.

I Shay B. Cohen, Carlos Gomez-Rodrıguez, and Giorgio Satta. 2011. Exact inferencefor generative probabilistic non-projective dependency parsing. In Proceedings ofthe 2011 Conference on Empirical Methods in Natural Language Processing, pages1234–1245.

I Michael Collins and Brian Roark. 2004. Incremental parsing with the perceptronalgorithm. In Proceedings of the 42nd Annual Meeting of the Association forComputational Linguistics (ACL), pages 112–119.

I Michael Collins. 2002. Discriminative training methods for hidden markov models:Theory and experiments with perceptron algorithms. In Proceedings of theConference on Empirical Methods in Natural Language Processing (EMNLP),pages 1–8.

I Michael A. Covington. 2001. A fundamental algorithm for dependency parsing. InProceedings of the 39th Annual ACM Southeast Conference, pages 95–102.

I Yoav Goldberg and Michael Elhadad. 2010. An efficient algorithm for easy-firstnon-directional dependency parsing. In Human Language Technologies: The 2010Annual Conference of the North American Chapter of the Association forComputational Linguistics (NAACL HLT), pages 742–750.

Recent Advances in Dependency Parsing 54(54)

References and Further Reading

I Yoav Goldberg and Joakim Nivre. 2012. A dynamic oracle for arc-eager dependencyparsing. In Proceedings of COLING 2012, pages 959–976.

I Yoav Goldberg and Joakim Nivre. 2013. Training deterministic parsers withnon-deterministic oracles. Transactions of the Association for ComputationalLinguistics, 1:403–414.

I Carlos Gomez-Rodrıguez and Joakim Nivre. 2010. A transition-based parser for2-planar dependency structures. In Proceedings of the 48th Annual Meeting of theAssociation for Computational Linguistics, pages 1492–1501.

I Jun Hatori, Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2011.Incremental joint pos tagging and dependency parsing in chinese. In Proceedings of5th International Joint Conference on Natural Language Processing (IJCNLP),pages 1216–1224.

I Matthew Honnibal, Yoav Goldberg, and Mark Johnson. 2013. A non-monotonicarc-eager transition system for dependency parsing. In Proceedings of theSeventeenth Conference on Computational Natural Language Learning, pages163–172.

I Liang Huang and Kenji Sagae. 2010. Dynamic programming for linear-timeincremental parsing. In Proceedings of the 48th Annual Meeting of the Associationfor Computational Linguistics (ACL), pages 1077–1086.

Recent Advances in Dependency Parsing 54(54)

References and Further Reading

I Liang Huang, Suphan Fayong, and Yang Guo. 2012. Structured perceptron withinexact search. In Proceedings of the 2012 Conference of the North AmericanChapter of the Association for Computational Linguistics: Human LanguageTechnologies, pages 142–151.

I Richard Johansson and Pierre Nugues. 2006. Investigating multilingual dependencyparsing. In Proceedings of the Tenth Conference on Computational NaturalLanguage Learning (CoNLL), pages 206–210.

I Marco Kuhlmann, Carlos Gomez-Rodrıguez, and Giorgio Satta. 2011. Dynamicprogramming algorithms for transition-based dependency parsers. In Proceedings ofthe 49th Annual Meeting of the Association for Computational Linguistics (ACL),pages 673–682.

I John Lee, Jason Naradowsky, and David A. Smith. 2011. A discriminative modelfor joint morphological disambiguation and dependency parsing. In Proceedings ofthe 29th Annual Meeting of the Association for Computational Linguistics (ACL),pages 885–894.

I Zhenghua Li, Min Zhang, Wanxiang Che, Ting Liu, Wenliang Chen, and HaizhouLi. 2011. Joint models for chinese pos tagging and dependency parsing. InProceedings of the Conference on Empirical Methods in Natural LanguageProcessing (EMNLP), pages 1180–1191.

Recent Advances in Dependency Parsing 54(54)

References and Further Reading

I Ryan McDonald. 2006. Discriminative Training and Spanning Tree Algorithms forDependency Parsing. University of Pennsylvania. Ph.D. thesis, PhD Thesis.

I Joakim Nivre and Jens Nilsson. 2005. Pseudo-projective dependency parsing. InProceedings of the 43rd Annual Meeting of the Association for ComputationalLinguistics (ACL), pages 99–106.

I Joakim Nivre, Marco Kuhlmann, and Johan Hall. 2009. An improved oracle fordependency parsing with online reordering. In Proceedings of the 11th InternationalConference on Parsing Technologies (IWPT’09), pages 73–76.

I Joakim Nivre. 2003. An efficient algorithm for projective dependency parsing. InGertjan Van Noord, editor, Proceedings of the 8th International Workshop onParsing Technologies (IWPT), pages 149–160.

I Joakim Nivre. 2004. Incrementality in deterministic dependency parsing. In FrankKeller, Stephen Clark, Matthew Crocker, and Mark Steedman, editors, Proceedingsof the Workshop on Incremental Parsing: Bringing Engineering and CognitionTogether (ACL), pages 50–57.

I Joakim Nivre. 2007. Incremental non-projective dependency parsing. InProceedings of Human Language Technologies: The Annual Conference of theNorth American Chapter of the Association for Computational Linguistics(NAACL-HLT), pages 396–403.

Recent Advances in Dependency Parsing 54(54)

References and Further Reading

I Joakim Nivre. 2009. Non-projective dependency parsing in expected linear time. InProceedings of the 47th Annual Meeting of the Association for ComputationalLinguistics (ACL), pages 351–359.

I Francesco Sartorio, Giorgio Satta, and Joakim Nivre. 2013. A transition-baseddependency parser using a dynamic parsing strategy. In Proceedings of the 51stAnnual Meeting of the Association for Computational Linguistics (Volume 1: LongPapers), pages 135–144.

I Katerina Vesela, Havelka Jiri, and Eva Hajicova. 2004. Condition of projectivity inthe underlying dependency structures. In Proceedings of the 20th InternationalConference on Computational Linguistics (COLING), pages 289–295.

I Anssi Yli-Jyra. 2003. Multiplanarity – a model for dependency structures intreebanks. In Proceedings of the Second Workshop on Treebanks and LinguisticTheories (TLT), pages 189–200.

I Yue Zhang and Stephen Clark. 2008. A tale of two parsers: Investigating andcombining graph-based and transition-based dependency parsing. In Proceedings ofthe Conference on Empirical Methods in Natural Language Processing (EMNLP),pages 562–571.

I Yue Zhang and Joakim Nivre. 2011. Transition-based parsing with rich non-localfeatures. In Proceedings of the 29th Annual Meeting of the Association forComputational Linguistics (ACL), pages 188–193.

Recent Advances in Dependency Parsing 54(54)

References and Further Reading

I Yue Zhang and Joakim Nivre. 2012. Analyzing the effect of global learning andbeam-search on transition-based dependency parsing. In Proceedings of COLING2012: Posters, pages 1391–1400.

Recent Advances in Dependency Parsing 54(54)