1 / (プレゼンテーション資料の作り方 ご提案) 1Toshiba Confidential TOSHIBA OF EUROPE LTD.
History of Major NLP Products & Services in Toshiba
1978 「 JW-10 」 : Japanese Word Processor
1985 「 ASTRANSAC EJ 」 : EtoJ MT System
1989 「 ASTRANSAC JE 」 : JtoE MT System
1995 「 The 翻訳」 : PC MT System (Internet & Personal)
1996 「 News Watch 」 : Information Filtering Service
1999 「 Fresh Eye 」 : Internet Search Engine/Portal
2001 「 KnowledgeMeister 」: KM Support System
2005 Chinese-Japanese Translation Service
2006 「 KnowledgeMeister - Succeed 」
2
Confidential
00 Month 0000 (edit in View > Header and Footer)
2Toshiba Confidential 2TOSHIBA OF EUROPE LTD.
Toshiba of Europe Ltd.Hideki Hirakawa
Integrated Use of Phrase Structure Forest and Dependency Forest in Preference Dependency Grammar (PDG)
29 January, 2008
3 3Toshiba Confidential TOSHIBA OF EUROPE LTD.
Agenda
Phrase Structure and Dependency Structure Analysis
Overview of the Preference Dependency Grammar(PDG)
Packed Shared Data Structure “Dependency Forest”
Evaluation of Dependency Forest
Conclusion
4 4Toshiba Confidential TOSHIBA OF EUROPE LTD.
Phrase Structure (PS) and Dependency Structure (DS)
Two major syntactic representation schemes
detpre
vp
n
time fly like an arrow
nv
np
pp
np
s
Information explicitly expressed by PS- Phrases (non-terminal nodes)- Structural categories (non-terminal labels)
detpre
vppsub
time fly like an arrow
Information explicitly expressed by DS- Head-dependent relations (directed arcs)- Functional categories (arc labels)
Phrase Structure (PS) Dependency Structure (DS)
5 5Toshiba Confidential TOSHIBA OF EUROPE LTD.
Constituency and dependency describe different dimensions.
A phrase-structure tree (PST) is closely related to a derivation, whereas a dependency tree rather describes the product of a process of derivation.
Constituency and dependency are not adversaries, they are complementary notions. Using them together we can overcome the problems that each notion has individually.
Formal & Computational Aspects of Dependency Grammar [Kruijff 02]
Relation between PS (Constituency) and DS
6 6Toshiba Confidential TOSHIBA OF EUROPE LTD.
Phrase structure analysis - Lexicalized PCFG Lexical information (including dependency relation) improves PS analysis accuracy (ex. Charniak 1997; Collins 1999; Bikel 2004)
- Use of dependency relations as discriminative features of maximum entropy phrase structure parser (ex. HPSG Parser (Oepen 2002), Reranking parser (Charniak and Johnson 2005))
- Use of another independent shallow dependency parser (Sagae et al. 2007)
Dependency analysis Almost no use of phrase structure information (Kakari-uke parsers, MSTParser (McDonald 2005), Malt parser(Nivre 2004)
Integration requires mapping Integration of PS and DS requires mapping between two structures of a sentence because sentence analyzers cannot combine any linguistic information without correspondence between the two structures.
Integrated Use of Phrase and Dependency Structures
7 7Toshiba Confidential TOSHIBA OF EUROPE LTD.
Mapping between PS and DS ( traditional researches ) Conversion from/to PS to/from DS based on heuristics Phrase Structure Tree (PST) → Dependency Tree (DT) [Collins 99], DT → PST [Xia&Palmer 00] ⇒ Measurement of parse accuracy, tree bank creation etc.
Grammar equivalence [Gaifman 65],[Abney 94] studied the equivalence relation between CFG PSG (CFG) and DG (Tesniere model DG) ⇒ DG is strongly equivalent to only sub-class of CFG*1
Structure mapping based on packed shared data structures Partial structure mapping framework based on the Syntactic Graph [Seo&Simmons 89]. Creates mappings between PSTs and DTs based on partial structure mapping rules (described later) ⇒ Syntactic graph generates inappropriate mapping [Hirakawa 06]
Complete mapping based on the “Dependency Forest”⇒ Integrated use of PS and DS (described later)
8 8Toshiba Confidential TOSHIBA OF EUROPE LTD.
Agenda
Phrase Structure and Dependency Structure Analysis
Overview of the Preference Dependency Grammar(PDG)
Packed Shared Data Structure “Dependency Forest”
Evaluation of Dependency Forest
Conclusion
9 9Toshiba Confidential TOSHIBA OF EUROPE LTD.
Basic Sentence Analysis Model
Sentence ◎○○
×
×
×
×
× ×
×
○
×
Generation Knowledge generates all possible interpretations
Interpretation Space prescribed by interpretation description scheme
Constraint Knowledge rejection of interpretations
Preference Knowledge preference order of interpretations
○
Interpretation ◎ correct
○ plausible× implausible
◎ ○ ×> >
Optimum Interpretation Extraction
◎The optimum interpretation
reject
accept
10 10Toshiba Confidential TOSHIBA OF EUROPE LTD.
Example (1) Probabilistic Context Free Grammar(PCFG)
◎○○
×
×
×
×
××
×
○
×○
Generation Knowledge CFG rules
Interpretation Space Phrase structure (parse tree)
Constraint Knowledge No constraints
Optimum Interpretation Extraction the Viterbi algorithm
Preference Knowledge Probabilities of the CFG rules
◎ ○ ×> >
Sentence ◎The optimum interpretation
12 12Toshiba Confidential TOSHIBA OF EUROPE LTD.
Basic Sentence Analysis Model of PDG
PK: Preference Knowledge, CK: Constraint Knowledge, GK: Generation Knowledge, IS: Interpretation Space
(a) NLA system with multilevel interpretation space(b) Packed shared data structure and interpretation mapping(c) Interpretations are externalizations of the lower level interpretations
Multilevel Packed Shared Data Connection Model
PK1 CK1
Sentence
GK1
IS1
5◇
◇
3◇
IS2
◎The OptimumInterpretation
OptimumInterpretationExtraction
mapping
2◇△
2△
△
△
4△
△ △
5△
6△
m△
3△
△ 1△△
Level 1 Interpretation:
IS3
△△ ◎◇◇
l ◇
◇◇
◇
4◇1◇◇
◇
◇
△
△
PK2 CK2
GK2
PK3 CK3
GK3
2○n○
5○1◎○
○
4○3○
6○
○
○
Level 2 Interpretation: Level 3 Interpretation:
1. Data Structure2. Optimum Solution Search
13 13Toshiba Confidential TOSHIBA OF EUROPE LTD.
PDG Implementation Model (data structure)
WPP = Word POS Pair, Phrase structure forest (PSF) = (packed shared) parse forest
Syntactic Layer
○○
All PSTs All DTs
Sentence“Time flies”
Morphological Layer
The OptimumDependency Tree
○
All WPP sequences
Interpretation mapping
Phrase str. forest
np np vp
fly/v
time/n
time/v
fly/n
vp
roots s
Dependency forest
top
fly/v
time/n
time/v
fly/n
obj
sub
toptopfly/vtime/n
time/v fly/n
WPP trellis
△△
×
××
×
×
top
fly/v
time/nsub
topDTPST
np vp
time/n fly/v
rootsWPP sequence
fly/vtime/n
PDG is an all-pair dependency analysis method with three level architecture utilizing three packed shared data structures
Integrated use of PS and DS level in syntactic layer
14 14Toshiba Confidential TOSHIBA OF EUROPE LTD.
□
: △
△
×
: ◎Optimum interpretation
1 □2 ◎
1 ◎
: ◎Optimum interpretation
□ 2 ◎
1 ◎
:
MSTParser
PDG
All MorphologicalInterpretations
1-best MorphologicalInterpretation
No CFG Grammar
MorphologyLevel
All DS Interpretations
All Interpretations with no POS ambiguities
◎
◎
:Well-formed Interpretations
Sentence ◎ , ◎
Comparison with other dependency analysis methods
No CFG Grammar
Sentence
Sentence
All DS Interpretations
PS Level DS Level
CDG
All PS Interpretations
CFG Filtering
CDG: Constraint Dependency Grammar, MSTParser : Maximum Spanning Tree Parser
CombinatorialExplosion
Over Pruning
15 15Toshiba Confidential TOSHIBA OF EUROPE LTD.
PDG Implementation Model (optimum solution search)
Integration of Preference Knowledge: Preference scores based on multilevel data structures are integrated into scores on a DF
Scoring
“Time flies”
Graph Branch Algorithm
PS forest
np np vp
fly/v
time/n
time/v
fly/n
vp
roots s
Dep. forest
top
fly/v
time/n
time/v
fly/n
obj
sub
toptopSentence fly/vtime/n
time/v fly/n
WPP trellis
The optimum dep. tree
Score integration
WPP seq. score Phrase str. score Dep. score
top
time/n
top
subfly/v
Syntactic LayerMorphological Layer
Optimum solution search
16 16Toshiba Confidential TOSHIBA OF EUROPE LTD.
PDG Analysis FlowSentence
Dependency Forest
PS Forest
WPP Trellis
Scored Dependency Forest
Extended Chart Parser
Forest Generation Scoring
Optimum TreeSearch
・ Preference Score Integration
・ Optimum Tree Search based on CM and PM
The Optimum Tree
Co-occurrenceScore Matrix
・ Dependency Forest Generation
17 17Toshiba Confidential TOSHIBA OF EUROPE LTD.
Agenda
Phrase Structure and Dependency Structure Analysis
Overview of the Preference Dependency Grammar(PDG)
Packed Shared Data Structure “Dependency Forest”
Evaluation of Dependency Forest
Conclusion
20 20Toshiba Confidential TOSHIBA OF EUROPE LTD.
=
Grammar Rule : partial structure mapping rule
X1/w1
Y/wh
Xh/wh Xn/wnXi/wi… … …
whd1 di
w1widn
wn
…
…
Partial Dependency Tree
Parser
Mapping
Sentence
Set of dependency trees
◇
◇
◇◇
◇◇
◇◇
◇◇◇
◇◇◇ ◇
◇
△△
△
△△
△△
△△
△ △
△△ △
△
=
Mapping
Set of phrase structure trees
Packed Shared Dependency Structure(Syntactic Graph)
Packed Shared Phrase Structure(Phrase structure forest)
Partial Structure Mapping Method [Seo&Simmons 89]
Headed CFG Rule
21 21Toshiba Confidential TOSHIBA OF EUROPE LTD.
Syntactic Graph
Packed Shared Data Structure for Dependency Trees
Encompasses all dependency trees corresponding to phrase structure trees in the parse forest for a sentence
[1,fly,v][0,time,n]
[0,time,v] [1,fly,n]
[2,like,p]
[2,like,v]
[3,an,det] [4,arrow,n]
mod npp vnp
det
ppnvpp
vppsnp
snpvnp
SS
S
“Time flies likes an arrow”
1 2 3 4 5 6 7 8 9 10 11 12 13
1 1 1 1 1 1 1 1 1
2
3 1 1 1 1
4 1 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1 1
7 1 1 1 1 1 1 1 1
8 1 1 1 1 1 1 1 1
9 1 1 1 1 1 1 1
10 1 1 1 1 1 1 1 1
11 1 1 1 1 1 1 1
12 1 1 1 1 1 1 1 1
13 1 1 1 1 1 1 1 1
Node: WPP Arc: Dependency Relation
Syntactic Graph Exclusion Matrix
22 22Toshiba Confidential TOSHIBA OF EUROPE LTD.
Completeness and Soundness of the syntactic graph
Definitions
Completeness : For every parse tree in the forest, there is a syntactic reading from the syntactic graph that is structurally equivalent to that parse tree.
∀PST : Phr.Str.Tree ∃DT: Dep.Tree PST corresponds to DT
Soundness : For every syntactic reading from the syntactic graph, there is a parse tree in the forest that is structurally equivalent to that syntactic reading.
∀ DT: Dep.Tree ∃ PST : Phr.Str. Tree PST corresponds to DT
Problem of the syntactic graph Violation of the soundness [Hirakawa 06]
×
×
×○
○
○
○
Phrase structure forestSyntactic graph
completeness
soundness
×
Dep. tree : DT Phr. str. tree : PT
23 23Toshiba Confidential TOSHIBA OF EUROPE LTD.
Example of the violation of soundness
Tokyo taxi driver call center○ ○ ○ ○ ○
nc-1 nc-2 nc-6nc-3
nj-5 nj-7S
rt-8nj-4
np1
Tokyo taxi driver call center
○ ○ ○ ○ ○
nc-1 nc-2 nc-6
nj-7S
rt-8
np3
Tokyo taxi driver call center○ ○ ○ ○ ○
nc-1 nc-6nc-3nj-5
Srt-8
np2
Tokyo taxi driver call center○ ○ ○ ○ ○
nc-2 nc-6nc-3nj-4
S
rt-8
○ ○ ○ ○ ○
nc-1 nc-2 nc-6nc-3
(a) (b)
(c)
(d)
Syntactic graph for (a),(b) and (c) generates (d) which has no corresponding phrase structure tree in the phrase structure forest
EM 1 4 2 5 3 7 6 81 14 1 1 12 15 1 1 13 17 1 1 168
Syntactic Graph/Exclusion Matrix
S
rt-8
24 24Toshiba Confidential TOSHIBA OF EUROPE LTD.
Packed Shared Data Structure for Dependency Trees
Dependency Forest(DF) = Dependency Graph(DG) + Co-occurrence Matrix(CM) CM(Dependency Forest): Defines the arc co-occurrence relation (Equivalent arcs are allowed in DF)
Dependency Forest [Hirakawa 06]
Co-occurrence MatrixDependency Graph
Dependency Forest for “Time flies like an arrow.”
npp19
det14
pre15vpp20
vpp18sub24
sub23obj4
nc2 obj16
0,time/n 1,fly/v
0,time/v 1,fly/n
2,like/p
2,like/v
3,an/det 4,arrow/n
root
rt29
rt32
rt31
2 24 4 23 19 18 20 14 16 15 31 29 322 - ○ ○ ○ ○ 24 - ○ ○ ○ ○ 4 - ○ ○ ○ ○ ○23 ○ - ○ ○ ○ 19 ○ - ○ ○ ○18 ○ - ○ ○ ○ 20 ○ - ○ ○ ○14 ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○16 ○ ○ ○ - ○ 15 ○ ○ ○ ○ ○ ○ - ○ ○31 ○ ○ ○ ○ - 29 ○ ○ ○ ○ - 32 ○ ○ ○ ○ ○ -
obj25
25 25Toshiba Confidential TOSHIBA OF EUROPE LTD.
Features of the Dependency Forest
Mapping is assured (phrase structure tree ⇔ dependency tree) → usable for multilevel packed shared data connection model
High flexibility in describing constraints
ex. non-projective dependency structure*1
*1 : dependency structure violating at least the following projectivity conditions ''no cross dependency exits'' ''no dependency covers the top node''
26 26Toshiba Confidential TOSHIBA OF EUROPE LTD.
Generation Flow of Phrase Structure Forest and Dependency Forest
Input sentence
WPP Trellis
Parse Forest
Initial Dependency Forest
DF Extraction
Chart Parsing
Dictionary
ExtendedCFG
Optimum Solution Search
Dependency Tree
Dependency Forest
Morphological Analysis
DF Reduction
(1)
(2)
(3)
(4)
PDG analysis process PDG data structure
27 27Toshiba Confidential TOSHIBA OF EUROPE LTD.
y/Xi→x1/X1,...,xn/Xn
CFG
PDG Grammar RuleExtended CFG rule with phrase head and mapping to dependency structure
Xi: Variable Xh(phrase head) :
“ Xh” is either of “X1”..“Xn”
Rewriting rule part
y/X h→ x1/X1,...,xn/Xn
Dependency tree
Nodes: X1, ... , Xn
Top node: Xh
: [arc(arcname1,Xi,X j ),...,arc(arcnamen-1,Xk,X l )] Dependency structure part
ex. vp/V → v/V, np/NP, pp/PP : [arc(obj,NP,V), arc(vpp,PP,V)]
V ( = see/v )
obj
PP ( = in/pre )NP ( = girl/n )
vpp
vp/V(=see/v)
v/V(=see/v) np/NP(=girl/n)pp/PP(=in/pre)
Phrase structure Dependency structuresee a girl in the forest
28 28Toshiba Confidential TOSHIBA OF EUROPE LTD.
Standard Chart Parsing: Structure of Standard Edge
<0,2, s → np ・ vp pp>
a cat chases …
0 1 2 3
<0,2, np → det noun ・>
<0,1,det → [a]・> <1,2,n → [cat]・> <2,3,v → [chase]・>Lexical edge
Inactive edge
Active edge
Input position
EDGE <0,2, s → np・ vp pp>
Startposition
Endposition
Head category
Found constituents Remainingconstituents
29 29Toshiba Confidential TOSHIBA OF EUROPE LTD.
Structure of PDG Edge
a cat chases 0 1 2 3
<0,1,det → [a] ・ : [a-det-0]>
<0,2, s/V → np/[cat-n-1] ・ vp/V pp/PP : [arc(obj,/[cat-n-1],V), arc(vpp,PP,V)]>
Two extensions to the standard edge structure
(1) Mapping to dependency structure
(2) Packing of inactive edges PDG (packed) edge is a set of sharable PDG single edges
<0,2, np/[cat-n-1] → det/[a-det-0] noun/[cat-n-1] ・ : arc(det,[a-det-0] ,[cat-n-1] )>
PDG single edge = Standard edge + Phrase head + Dependency structure(tree)
<1,2,n → [cat] ・ : [cat-n-1]> <2,3,v → [chase] ・ : [chase-n-2]>
30 30Toshiba Confidential TOSHIBA OF EUROPE LTD.
・ Bottom-up chart parser using the Agenda
・ Terminates when the Agenda becomes empty
Generation of Phrase Structure Forest and Initial Dependency Forest
Chart
Agenda
φ
Inactive Edges
Active edges<E12 s2→… … ・><E12 s2→... … ・> :
<E52 np2→... ・ > :
<E1 s1 → [[np1 vp1]][ds11 ] >
<Eroot root → [s1 s2][ds1 ds2]>
<E2 s2 →…>
<E3 np1 → [[det1 n1]]: [ds31 ] >
<E4 vp1 → [[v1 np2] [v1 np3 pp1]]: [ds41 ds42] >
<Er root→[s] ・><E2 s2 →… ・ > :
Phrase Structure Foresta set of inactive edges reachable from the root edge
Initial Dependency Graph a set of arcs in the PS forest
arc(root-17,[like]-v-2,[root]-x),arc(root-24,[flies]-v-1,[root]-x),arc(root-27,[time]-v-0,[root]-x),arc(sub-16,[flies]-n-1,[like]-v-2),arc(nc-4,[time]-n-0,[flies]-n-1),arc(obj-14,[arrow]-n-4,[like]-v-2), :
<E3 np1 → ・
・・ >Arc3,..
<E4 vp1 → ・・・ >
Arc8,Arc9,..
Initial Co-occurrence Matrix CM1 ~ 3:CMatrix setting condition
<E1 s1 → [[np1 vp1 pp1]][Arc1,Arc2 ]
>
A1 A2 A3 A4 A8 A9
A1
A2
A3
A4
A8
A9
CM1: Between arcs in DS
○○
CM2: Between arcs in DS and arcs governed by constituents ○
○○
○
○ ○○○
○ ○○○
CM3: Between arcs governed by different constituents
○ ○
○
○
31 31Toshiba Confidential TOSHIBA OF EUROPE LTD.
Generation of Phrase Structure Forest and Initial Dependency Forest
Chart
Agenda
φ
Inactive Edges
Active edges<E12 s2→… … ・><E12 s2→... … ・> :
<E52 np2→... ・ > :
<Er root→[s] ・><E2 s2 →… ・ > :
Phrase Structure Forest
Initial Dependency Graph a set of arcs in the PS forest
Initial Co-occurrence Matrix CM1 ~ 3:CMatrix setting condition
2 24 4 25 23 19 18 20 14 16 15 31 29 322 - ○ ○ ○ ○ 24 - ○ ○ ○ ○ 4 - ○ ○ ○ ○25 - ○ ○ ○ ○23 ○ - ○ ○ ○ 19 ○ - ○ ○ ○18 ○ - ○ ○ ○ 20 ○ - ○ ○ ○14 ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○16 ○ ○ ○ - ○ 15 ○ ○ ○ ○ ○ ○ ○ - ○ ○31 ○ ○ ○ ○ - 29 ○ ○ ○ ○ - 32 ○ ○ ○ ○ ○ ○ -
Initial Dependency Forest
178 np
[1,fly,v][0,time,n]
[0,time,v] [1,fly,n]
[2,like,p]
[2,like,v]
[3,an,det]
[4,arrow,n]
123 np103 np
166
169
150
153
138
121110
101
133 np
184 vp188 pp
197 np189 vp
201 vp195 vp
191 s 186 s196 s
186 root
32 32Toshiba Confidential TOSHIBA OF EUROPE LTD.
Reduction of the Initial Dependency Forest
npp19
vpp18sub24
sub23obj4
nc2
obj25
0,time/n 1,fly/v
0,time/v 1,fly/n
2 24 4 25 23 19 18 20 14 16 15 31 29 322 - ○ ○ ○ ○ 24 - ○ ○ ○ ○ 4 - ○ ○ ○ ○25 - ○ ○ ○ ○23 ○ - ○ ○ ○ 19 ○ - ○ ○ ○18 ○ - ○ ○ ○ 20 ○ - ○ ○ ○14 ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○16 ○ ○ ○ - ○ 15 ○ ○ ○ ○ ○ ○ ○ - ○ ○31 ○ ○ ○ ○ - 29 ○ ○ ○ ○ - 32 ○ ○ ○ ○ ○ ○ -
Equivalentarc
Generated from two grammar rules vp/V → v/V,np/NP : [arc(obj,NP,V)] vp/V → v/V,np/NP,pp/PP : [arc(obj,NP,V), arc(vpp,PP,V)]
npp19
vpp18sub24
sub23obj4
nc2
0,time/n 1,fly/v
0,time/v 1,fly/n
Reduction
2 24 4 23 19 18 20 14 16 15 31 29 322 - ○ ○ ○ ○ 24 - ○ ○ ○ ○ 4 - ○ ○ ○ ○ ○23 ○ - ○ ○ ○ 19 ○ - ○ ○ ○18 ○ - ○ ○ ○ 20 ○ - ○ ○ ○14 ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○16 ○ ○ ○ - ○ 15 ○ ○ ○ ○ ○ ○ - ○ ○31 ○ ○ ○ ○ - 29 ○ ○ ○ ○ - 32 ○ ○ ○ ○ ○ -
more than one equivalent arc is merged into one arc without increasing the number of the generalized dependency trees in the dependency forests
33 33Toshiba Confidential TOSHIBA OF EUROPE LTD.
Completeness and Soundness of the Dependency Forest
Completeness : All phrase structure trees in the parse forest have corresponding dependency trees in the dependency forest.
∀PT: phrase structure tree ∃ DT: dependency tree dep_tree(PT) = DT
Soundness : Every phrase structure tree corresponding to a dependency tree in the dependency forest exists in the phrase structure forest
∀DT: dependency tree ∃ PT: phrase structure tree dep_tree(PT) = DT
×
×××
×
○
○ ○
○○
×DT : dependency tree PT : phrase structure tree
Phrase structure forestDependency forest
○
○
1:N correspondence in general
The completeness and soundness of the dependency forest is assured [Hirakawa 06]
34 34Toshiba Confidential TOSHIBA OF EUROPE LTD.
Evaluation of the Dependency Forest Framework
Analysis of prototypical ambiguous sentences
1 to N / N to 1 correspondence between phrase structure tree/trees and dependency trees/tree
Generation of Non-projective dependency tree
35 35Toshiba Confidential TOSHIBA OF EUROPE LTD.
=========== s/Sentence =========== (R1) s/VP→ np/NP,vp/VP : [arc(sub,NP,VP)] % Declarative sentence (R2) s/VP→ vp/VP : [] % Imperative sentence========= np/Noun Phrase ======== (R3) np/N→ n/N : [] % Single noun (R4) np/N2→ n/N1,n/N2 : [arc(nc,N1,N2)] % Compound noun (R5) np/N→ det/DET,n/N : [arc(det,DET,N)] % (R6) np/NP→ np/NP,pp/PP : [arc(npp,PP,NP)] % Prepositional phrase attachment (R7) np/N→ ving/V,n/N : [arc(adjs,V,N)] % Adjectival usage( subject) (R8) np/N→ ving/V,n/N : [arc(adjo,V,N)] % Adjectival usage( object) (R9) np/V→ ving/V,np/NP : [arc(obj,NP,V)] % Gerund phrase(R10) np/V→ ving/V,np/NP,pp/PP : [arc(obj,NP,V),arc(vpp,PP,V)] % Gerand phrase with PP(R11) np/NP→ np/NP0,and/AND,np/NP: [arc(and,NP0,NP),arc(cnj,AND,NP0)]% Coordination (and)(R12) np/NP→ np/NP0,or/OR,np/NP : [arc(or,NP0,NP),arc(cnj,OR,NP0)] % Coordination (or)========= vp/Verb ======== phrase(R13) vp/V→ v/V : [] % Intransitive verb(R14) vp/V→ v/V,np/NP : [arc(obj,NP,V)] % Transitive verb(R15) vp/V→ be/BE,ving/V,np/NP : [arc(obj,NP,V),arc(prg,BE,V)] % Progressive(R16) vp/BE→ be/BE,np/NP : [arc(dsc,NP,BE)] % Copular(R17) vp/VP→ vp/VP,pp/PP : [arc(vpp,PP,VP)] % PP-attachment(R18) vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)] % Adverb modification(R19) vp/V→ v/V,np/NP,adv/ADV,relc/RELP % non-projective pattern :[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)] ======== pp/Prepositional phrase ========(R20) pp/P→ pre/P,np/NP :[arc(pre,NP,P)]
Grammar rules for typical ambiguities (PP-attachment,Coordination, be-verb usage)
Grammar for Ambiguous Sentences
36 36Toshiba Confidential TOSHIBA OF EUROPE LTD.
PP-attachment Ambiguity
Input sentence: I saw a girl with a telescope in the forest.
Five well-formed dependency trees
0,I 1,saw 2,a
root
4,with 6,telescope 8,the 9,forest3,girl 5,a 7,in
det4,0det11,0 det42,0
sub33,20 obj6,20
vpp16,15
vpp27,5
npp14,10pre12,10 pre24,10
npp29,5
npp26,5root23,0
Node0,I : [i]-n-01,saw : [saw]-v-12,a : [a]-det-23,girl : [girl]-n-34,with : [with]-pre-45,a : [a]-det-56,telescope : [telescope]-n-67,in : [in]-pre-78,the : [the]-det-89,forest : [forest]-n-9root : [root]-x-root
33 4 6 14 16 11 12 29 26 27 23 24 4233 - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○4 ○ - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○6 ○ ○ - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○14 ○ ○ ○ - ○ ○ ○ ○ ○ ○ ○ ○16 ○ ○ ○ - ○ ○ ○ ○ ○ ○ ○11 ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○ ○ ○12 ○ ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○ ○29 ○ ○ ○ ○ ○ ○ - ○ ○ ○26 ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○27 ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○23 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○24 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ - ○42 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ -
Crossing
Single role
37 37Toshiba Confidential TOSHIBA OF EUROPE LTD.
Coordination Scope Ambiguity
Input sentence : Earth and Moon or Jupiter and Ganymede.
Node0,earth : [earth]-n-01,and : [and]-and-12,moon : [moon]-n-23,or : [or]-or-34,jupiter : [jupiter]-n-45,and : [and]-and-56,ganymede: [ganymede]-n-6root : [root]-x-root
0,earth 1,and 2,moon 3,or 4,jupitor
root
5,and 6,ganymede
and12,10
and25,20
cnj2,0
or9,4
cnj6,0
or22,3
cnj14,0
and18,12
root26,0
and14,5
25 12 4 2 22 9 6 18 14 2625 - ○ ○ ○ ○ ○ ○ ○12 - ○ ○ ○ ○ ○ ○4 - ○ ○ ○ ○ ○ ○ ○2 ○ ○ ○ - ○ ○ ○ ○ ○ ○22 ○ ○ ○ - ○ ○ ○ ○9 ○ ○ ○ ○ - ○ ○ ○ ○6 ○ ○ ○ ○ ○ ○ - ○ ○ ○18 ○ ○ ○ ○ ○ ○ ○ - ○ ○14 ○ ○ ○ ○ ○ ○ ○ ○ - ○26 ○ ○ ○ ○ ○ ○ ○ ○ ○ -
Crossing
Single role
Five well-formed dependency trees
38 38Toshiba Confidential TOSHIBA OF EUROPE LTD.
Structural Interpretation Ambiguity andPP-attachment Ambiguity
Input sentence: My hobby is watching birds with telescopeTen well-formed dependency trees
0,my 1,hobby 2,is 3,watching 4,birds
root
5,with 6,telescope
sub35,1
sub38,10
prg2,10adj4,12
dsc33,8
dsc36,10
obj6,15
sub5,5
npp23,5
npp27,3
vpp24,7
root44,0det1,0 pre22,0root41,0
Node0,my : [my]-det-01,hobby : [hobby]-n-12,is : [is]-be-23,watching : [watching]-ving-34,birds : [birds]-n-45,with : [with]-pre-56,telescope : [telescope]-n-6root : [root]-x-root
1 38 35 2 4 33 36 6 5 23 27 24 22 44 411 - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○38 ○ - ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ 35 ○ - ○ ○ ○ ○ ○ ○2 ○ ○ - ○ ○ ○ ○ ○4 ○ ○ - ○ ○ ○ ○ 33 ○ ○ - ○ ○ ○ ○ ○ ○ ○ 36 ○ ○ ○ - ○ ○ ○ 6 ○ ○ ○ ○ ○ - ○ ○ ○ ○ ○ ○5 ○ ○ ○ - ○ ○ ○ ○ 23 ○ ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○ ○27 ○ ○ ○ ○ ○ - ○ ○ 24 ○ ○ ○ ○ ○ ○ - ○ ○ ○22 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ - ○ ○44 ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ ○ - 41 ○ ○ ○ ○ ○ ○ ○ -
39 39Toshiba Confidential TOSHIBA OF EUROPE LTD.
N to 1 Correspondence from PSTs to One DT ( 1 )
Spurious ambiguity (Eisner96),(Noro05)
(R17)vp/VP→ vp/VP,pp/PP : [arc(vpp,PP,VP)] % PP-attachment
(R18)vp/VP→ adv/ADV,vp/VP : [arc(adv,ADV,VP)] % Adverb modification
in the forestsaw a catShe curiously
vp ppnp adv
vp
vp
s
in the forestsaw a catShe curiously
vp ppnp adv
vp
vp
sRule application: R17 → R18 Rule application: R18 → R17
sawShe in the foresta catcuriously
adv
vpp
40 40Toshiba Confidential TOSHIBA OF EUROPE LTD.
Modification scope problem (Mel'uk88) Dependency structure has ambiguities in modification scope when it has a head word which
has dependants located at the right-hand side and the left-hand side of the head word. ex. Earth and Jupiter in Solar System.
0,Earth 1,and 2,Jupiter 3,in 4,Solar System
rootand4,20npp8,0
4 2 8 7 124 - ○ ○ ○ ○2 ○ - ○ ○ ○8 ○ ○ - ○ ○7 ○ ○ ○ - ○12 ○ ○ ○ ○ -
pre7,0cnj2,0 root12,0
・ Introduction of “Grouping” ( Coordination and operator words (ex. not, only) ) [Mel'uk88]
・ Japanese has no modification scope problem because it has no right to left dependency.
Jupiter
np
in Solar System
pp
Earth
np
and
cnj
np
np
Jupiter
np
in Solar System
pp
Earth
np
and
cnj
np
np
N to 1 Correspondence from PSTs to One DT ( 2 )
41 41Toshiba Confidential TOSHIBA OF EUROPE LTD.
Generation of Non-projective Dependency Tree
Grammar rule for non-projective dependency tree
(R19)vp/V → v/V,np/NP,adv/ADV,relc/REL :
[arc(obj,NP,V),arc(adv,ADV,V),arc(rel,RELP,NP)]
1,saw 2,the
root
5,which was Persian0,She 4,curiously3,cat
det4,0sub12,20
obj6,20
adv10,15
root14,0
re1 11,10
12 4 6 10 11 1412 - ○ ○ ○ ○ ○4 ○ - ○ ○ ○ ○6 ○ ○ - ○ ○ ○10 ○ ○ ○ - ○ ○11 ○ ○ ○ ○ - ○14 ○ ○ ○ ○ ○ -
Input sentence : She saw the cat curiously which was Persian*1
*1: Artificial example for showing the rule applicability
42 42Toshiba Confidential TOSHIBA OF EUROPE LTD.
Conclusion
Dependency forest is a packed shared data structure - Bridge between phrase structure and dependency structure
usable for Multilevel Packed Shared Data Connection MODEL of PDG
- High flexibility in describing constraints
Future work
Extension of the framework for the modification scope problem (Grouping)
Real-world system implementation