Date post: | 30-Dec-2015 |
Category: |
Documents |
Upload: | paula-richards |
View: | 228 times |
Download: | 0 times |
1
Learning for Semantic Parsing Using Statistical Syntactic Parsing
Techniques
Ruifang GePh.D. Final Defense
Supervisor: Raymond J. Mooney
Machine Learning GroupDepartment of Computer ScienceThe University of Texas at Austin
2
Semantic Parsing
Semantic Parsing: Transforming natural language (NL) sentences into completely formal meaning representations (MRs)
Sample application domains where MRs are directly executable by another computer system to perform some task CLang: Robocup Coach Language Geoquery: A Database Query Application
3
CLang (RoboCup Coach Language)
In RoboCup Coach competition, teams compete to coach simulated players
The coaching instructions are given in a formal language called CLang
Simulated soccer field
Coach
If our player 2 has the ball, then position
our player 5 in the midfield.
CLang ((bowner (player our {2}))
(do (player our {5}) (pos (midfield))))
Semantic Parsing
4
GeoQuery: A Database Query Application
Query application for U.S. geography database [Zelle & Mooney, 1996]
User What are the rivers in Texas?
Semantic Parsing
DataBaseAngelina, Angelina, Blanco, …Blanco, …
Query answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
5
Motivation for Semantic Parsing
Theoretically, it answers the question of how people interpret language
Practical applications Question answering Natural language interface Knowledge acquisition Reasoning
6
Motivating Example
Semantic parsing is a compositional process. Sentence structures are needed for building meaning representations.
((bowner (player our {2})) (do our {4} (pos (half our))))
If our player 2 has the ball, our player 4 should stay in our half
bowner: ball ownerpos: position
7
Syntax-Based Approaches
Meaning composition follows the tree structure of a syntactic parse
Composing the meaning of a constituent from the meanings of its sub-constituents in a syntactic parse
Hand-built approaches (Woods, 1970, Warren and Pereira, 1982)
Learned approaches Miller et al. (1996): Conceptually simple sentences Zettlemoyer & Collins (2005)): hand-built Combinatory
Categorial Grammar (CCG) template rules
8
Example
our player 2 has
the ball
PRP$ NN CD VB
DT NN
NP
VPNP
S
MR: bowner(player(our,2))
Use the structure of a syntactic parse
9
our player 2 has
the ball
PRP$-our NN-player(_,_) CD-2 VB-bowner(_)
DT-null NN-null
NP
VPNP
S
ExampleMR: bowner(player(our,2))
Assign semantic concepts to words
10
our player 2 has
the ball
PRP$-our NN-player(_,_) CD-2 VB-bowner(_)
DT-null NN-null
NP
VPNP-player(our,2)
S
ExampleMR: bowner(player(our,2))
Compose meaning for the internal nodes
11
our player 2 has
the ball
PRP$-our NN-player(_,_) CD-2 VB-bowner(_)
DT-null NN-null
NP-null
VP-bowner(_)NP-player(our,2)
S
ExampleMR: bowner(player(our,2))
Compose meaning for the internal nodes
12
our player 2 has
the ball
PRP$-our NN-player(_,_) CD-2 VB-bowner(_)
DT-null NN-null
NP-null
VP-bowner(_)NP-player(our,2)
S-bowner(player(our,2))
ExampleMR: bowner(player(our,2))
Compose meaning for the internal nodes
13
Semantic Grammars
Non-terminals in a semantic grammar correspond to semantic concepts in application domains
Hand-built approaches (Hendrix et al., 1978) Learned approaches
Tang & Mooney (2001), Kate & Mooney (2006), Wong & Mooney (2006)
14
our player 2
has the ballour 2
player
bowner
ExampleMR: bowner(player(our,2))
bowner → player has the ball
15
Thesis Contributions
Introduce two novel syntax-based approaches to semantic parsing Theoretically well-founded in computational
semantics (Blackburn and Bos, 2005) Great opportunity: leverage the significant
progress made in statistical syntactic parsing for semantic parsing (Collins, 1997; Charniak and Johnson, 2005; Huang, 2008)
16
Thesis Contributions
SCISSOR: a novel integrated syntactic-semantic parser
SYNSEM: exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional meaning composition
Investigate when the knowledge of syntax can help
17
Representing Semantic Knowledge in Meaning Representation Language Grammar (MRLG)
Production PredicateCONDITION →(bowner PLAYER) P_BOWNER
PLAYER →(player TEAM {UNUM}) P_PLAYER
UNUM → 2 P_UNUM
TEAM → our P_OUR
Assumes a meaning representation language (MRL) is defined by an unambiguous context-free grammar.
Each production rule introduces a single predicate in the MRL.
The parse of a MR gives its predicate-argument structure.
19
Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations
Integrated syntactic-semantic parsing Allows both syntax and semantics to be used
simultaneously to obtain an accurate combined syntactic-semantic analysis
A statistical parser is used to generate a semantically augmented parse tree (SAPT)
SCISSOR
21
SAPT
PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER
DT-NULL NN-NULL
NP-NULL
VP-P_BOWNERNP-P_PLAYER
S-P_BOWNER
our player 2 has
the ball
Non-terminals now have both syntactic and semantic labels
Semantic labels: dominate predicates in the sub-trees
22
SAPT
PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER
DT-NULL NN-NULL
NP-NULL
VP-P_BOWNERNP-P_PLAYER
S-P_BOWNER
our player 2 has
the ball
MR: P_BOWNER(P_PLAYER(P_OUR,P_UNUM))
25
Extending Collins’ (1997) Syntactic Parsing Model
Find a SAPT with the maximum probability A lexicalized head-driven syntactic parsing
model Extending the parsing model to generate
semantic labels simultaneously with syntactic labels
26
Why Extending Collins’ (1997) Syntactic Parsing Model
Suitable for incorporating semantic knowledge Head dependency: predicate-argument relation Syntactic subcategorization: a set of arguments
that a predicate appears with
Bikel (2004) implementation: easily extendable
27
Parser Implementation
Supervised training on annotated SAPTs is just frequency counting
Testing: a variant of standard CKY chart-parsing algorithm
Details in the thesis
28
Smoothing
Each label in SAPT is the combination of a syntactic label and a semantic label
Increases data sparsity Break the parameters down
Ph(H | P, w)
= Ph(Hsyn, Hsem | P, w)
= Ph(Hsyn | P, w) × Ph(Hsem | P, w, Hsyn)
29
Experimental Corpora
CLang (Kate, Wong & Mooney, 2005) 300 pieces of coaching advice 22.52 words per sentence
Geoquery (Zelle & Mooney, 1996) 880 queries on a geography database 7.48 word per sentence MRL: Prolog and FunQL
30
Prolog vs. FunQL (Wong, 2007)
Prolog: answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
What are the rivers in Texas?
FunQL: answer(river(loc_2(stateid(texas))))
X1: river; x2: texas
Logical forms: widely used as MRLs in computational semantics, support reasoning
31
Prolog: answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
What are the rivers in Texas?
FunQL: answer(river(loc_2(stateid(texas))))
Flexible order
Strict order
Better generalization on Prolog
Prolog vs. FunQL (Wong, 2007)
32
Experimental Methodology
standard 10-fold cross validation Correctness
CLang: exactly matches the correct MR Geoquery: retrieves the same answers as the
correct MR Metrics
Precision: % of the returned MRs that are correct Recall: % of NLs with their MRs correctly returned F-measure: harmonic mean of precision and recall
33
Compared Systems
COCKTAIL (Tang & Mooney, 2001) Deterministic, inductive logic programming
WASP (Wong & Mooney, 2006) Semantic grammar, machine translation
KRISP (Kate & Mooney, 2006) Semantic grammar, string kernels
Z&C (Zettleymoyer & Collins, 2007) Syntax-based, combinatory categorial grammar (CCG)
LU (Lu et al., 2008) Semantic grammar, generative parsing model
34
Compared Systems
COCKTAIL (Tang & Mooney, 2001) Deterministic, inductive logic programming
WASP (Wong & Mooney, 2006) Semantic grammar, machine translation
KRISP (Kate & Mooney, 2006) Semantic grammar, string kernels
Z&C (Zettleymoyer & Collins, 2007) Syntax-based, combinatory categorial grammar (CCG)
LU (Lu et al., 2008) Semantic grammar, generative parsing model
Hand-built lexicon for Geoquery
Manual CCGTemplate rules
35
Compared Systems
COCKTAIL (Tang & Mooney, 2001) Deterministic, inductive logic programming
WASP (Wong & Mooney, 2006) Semantic grammar, machine translation
KRISP (Kate & Mooney, 2006) Semantic grammar, string kernels
Z&C (Zettleymoyer & Collins, 2007) Syntax-based, combinatory categorial grammar (CCG)
LU (Lu et al., 2008) Semantic grammar, generative parsing model
λ-WASP, handling logical forms
36
Results on CLang
Precision Recall F-measure
COCKTAIL - - -
SCISSOR 89.5 73.7 80.8
WASP 88.9 61.9 73.0
KRISP 85.2 61.9 71.7
Z&C - - -
LU 82.4 57.7 67.8
(LU: F-measure after reranking is 74.4%)
Memory overflow
Not reported
37
Results on CLang
Precision Recall F-measure
SCISSOR 89.5 73.7 80.8
WASP 88.9 61.9 73.0
KRISP 85.2 61.9 71.7
LU 82.4 57.7 67.8
(LU: F-measure after reranking is 74.4%)
38
Results on Geoquery
Precision Recall F-measure
SCISSOR 92.1 72.3 81.0
WASP 87.2 74.8 80.5
KRISP 93.3 71.7 81.1
LU 86.2 81.8 84.0
COCKTAIL 89.9 79.4 84.3
λ-WASP 92.0 86.6 89.2
Z&C 95.5 83.2 88.9
(LU: F-measure after reranking is 85.2%)
Prolog
FunQL
39
Results on Geoquery (FunQL)
Precision Recall F-measure
SCISSOR 92.1 72.3 81.0
WASP 87.2 74.8 80.5
KRISP 93.3 71.7 81.1
LU 86.2 81.8 84.0
(LU: F-measure after reranking is 85.2%)
competitive
40
Why Knowledge of Syntax does not Help
Geoquery: 7.48 word per sentence Short sentence
Sentence structure can be feasibly learned from NLs paired with MRs
Gain from knowledge of syntax vs. flexibility loss
41
Limitation of Using Prior Knowledge of Syntax
What state
is the smallest
N1
N2
answer(smallest(state(all)))
Traditional syntactic analysis
42
Limitation of Using Prior Knowledge of Syntax
What state
is the smallest
state is the smallest
N1
What N2
N1
N2
answer(smallest(state(all))) answer(smallest(state(all)))
Traditional syntactic analysis Semantic grammar
Isomorphic syntactic structure with MRBetter generalization
43
Why Prior Knowledge of Syntax does not Help
Geoquery: 7.48 word per sentence Short sentence
Sentence structure can be feasibly learned from NLs paired with MRs
Gain from knowledge of syntax vs. flexibility loss
LU vs. WASP and KRISP Decomposed model for semantic grammar
44
Detailed Clang Results on Sentence Length
0-10(7%)
11-20(33%)
21-30(46%)
31-40(13%)
0-10(7%)
11-20(33%)
21-30(46%)
0-10(7%)
11-20(33%)
31-40(13%)
21-30(46%)
0-10(7%)
11-20(33%)
45
SCISSOR Summary
Integrated syntactic-semantic parsing approach
Learns accurate semantic interpretations by utilizing the SAPT annotations
knowledge of syntax improves performance on long sentences
47
SYNSEM Motivation
SCISSOR requires extra SAPT annotation for training
Must learn both syntax and semantics from same limited training corpus
High performance syntactic parsers are available that are trained on existing large corpora (Collins, 1997; Charniak & Johnson, 2005)
48
SCISSOR Requires SAPT Annotation
PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER
DT-NULL NN-NULL
NP-NULL
VP-P_BOWNERNP-P_PLAYER
S-P_BOWNER
our player 2 has
the ball
Time consuming.Automate it!
49
Part I: Syntactic Parse
PRP$ NN CD VB
DT NN
NP
VPNP
S
our player 2 has
the ball
Use a statistical syntactic parser
50
Part II: Word Meanings
P_OUR P_PLAYER P_UNUM P_BOWNER NULL NULL
our player 2 hasthe ball
Use a word alignment model (Wong and
Mooney (2006) )
our player 2 has ballthe
P_PLAYERP_BOWNER P_OUR P_UNUM
51
Learning a Semantic Lexicon
IBM Model 5 word alignment (GIZA++) top 5 word/predicate alignments for each training
example Assume each word alignment and syntactic parse
defines a possible SAPT for composing the correct MR
52
Introducing λvariables in semantic labels for missing arguments (a1: the first argument)
our player 2 has ballthe
VP
S
NP
NP
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
NP
53
our player 2 has ballthe
VP
S
NP
NP
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
P_BOWNER
P_PLAYER
P_UNUMP_OUR
Part III: Internal Semantic Labels
How to choose the dominant predicates?
NP
54
λa1λa2P_PLAYER P_UNUM
?
player 2
P_BOWNER
P_PLAYER
P_UNUMP_OUR
, a2=c2P_PLAYERλa1λa2PLAYER + P_UNUM λa1
(c2: child 2)
Learning Semantic Composition Rules
55
our player 2 has ballthe
VP
S
NPP_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER
?
λa1λa2PLAYER + P_UNUM {λa1P_PLAYER, a2=c2}
P_BOWNER
P_PLAYER
P_UNUMP_OUR
Learning Semantic Composition Rules
56
our player 2 has ballthe
VP
S
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER ?
P_PLAYER
P_OUR +λa1P_PLAYER {P_PLAYER, a1=c1}
P_BOWNER
P_PLAYER
P_UNUMP_OUR
Learning Semantic Composition Rules
57
our player 2 has ballthe
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER
P_PLAYER
NULL
λa1P_BOWNER
?P_BOWNER
P_PLAYER
P_UNUMP_OUR
Learning Semantic Composition Rules
58
our player 2 has ballthe
P_OUR
λa1λa2P_PLAYER
λa1P_BOWNER
P_UNUM NULLNULL
λa1P_PLAYER
P_PLAYER
NULL
λa1P_BOWNER
P_BOWNER
P_PLAYER + λa1P_BOWNER {P_BOWNER, a1=c1}
P_BOWNER
P_PLAYER
P_UNUMP_OUR
Learning Semantic Composition Rules
59
Ensuring Meaning Composition
What state
is the smallest
N1
N2
answer(smallest(state(all)))
Non-isomorphism
60
Ensuring Meaning Composition
Non-isomorphism between NL parse and MR parse Various linguistic phenomena Machine translation between NL and MRL Use automated syntactic parses
Introduce macro-predicates that combine multiple predicates.
Ensure that MR can be composed using a syntactic parse and word alignment
61
Unambiguous CFG of MRL
Training set, {(S,T,MR)}
Training
Semantic parsingInput sentence parse T Output MR
Testing
Before training & testing
training/test sentence, SSyntactic parser
syntactic parse tree,T
Semantic knowledge acquisition
Semantic lexicon & composition rules
Parameter estimation
Probabilistic parsing model
SYNSEM Overview
62
Unambiguous CFG of MRL
Training set, {(S,T,MR)}
Training
Semantic parsingInput sentence, S Output MR
Testing
Before training & testing
training/test sentence, SSyntactic parser
syntactic parse tree,T
Semantic knowledge acquisition
Semantic lexicon & composition rules
Parameter estimation
Probabilistic parsing model
SYNSEM Overview
63
Parameter Estimation
• Apply the learned semantic knowledge to all training examples to generate possible SAPTs
• Use a standard maximum-entropy model similar to that of Zettlemoyer & Collins (2005), and Wong & Mooney (2006)
• Training finds a parameter that (approximately) maximizes the sum of the conditional log-likelihood of the training set including syntactic parses
• Incomplete data since SAPTs are hidden variables
64
Features
Lexical features: Unigram features: # that a word is assigned a
predicate Bigram features: # that a word is assigned a
predicate given its previous/subsequent word. Rule features: # a composition rule applied in
a derivation
65
answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
What are the rivers in Texas?
λv1P_ANSWER(x1)
λv1P_RIVER(x1) λv1λv2P_LOC(x1,x2) λv1P_EQUAL(x2)
Handling Logical Forms
Handle shared logical variablesUse Lambda Calculus (v: variable)
66
Prolog Example
answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
What are the rivers in Texas?
λv1P_ANSWER(x1)
(λv1P_RIVER(x1) λv1 λv2P_LOC(x1,x2) λv1P_EQUAL(x2))
Handle shared logical variablesUse Lambda Calculus (v: variable)
67
Prolog Example
answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
What are the rivers in Texas?
λv1P_ANSWER(x1)
(λv1P_RIVER(x1) λv1λv2P_LOC(x1,x2) λv1P_EQUAL(x2))
Handle shared logical variablesUse Lambda Calculus (v: variable)
68
Prolog Example
What are the rivers in Texas
NP
PP
IN
SBARQ
NP
NP
SQ
VBPWHNP
answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
Start from a syntactic parse
69
Prolog Example
What are the rivers in Texas
PP
SBARQ
NP
SQ
λv1λa1P_ANSWER NULL λv1P_RIVER λv1λv2P_LOC λv1P_EQUAL
Add predicates to words
answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
70
Prolog Example
What are the rivers in Texas
SBARQ
NP
SQ
λv1λa1P_ANSWER NULL λv1P_RIVER λv1λv2P_LOC λv1P_EQUAL
λv1P_LOC
answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))
Learn a rule with variable unification
λv1λv2P_LOC(x1,x2) + λv1P_EQUAL(x2) λv1P_LOC
72
Syntactic Parsers (Bikel,2004)
WSJ only CLang(SYN0): F-measure=82.15% Geoquery(SYN0) : F-measure=76.44%
WSJ + in-domain sentences CLang(SYN20): 20 sentences, F-measure=88.21% Geoquery(SYN40): 40 sentences, F-measure=91.46%
Gold-standard syntactic parses (GOLDSYN)
73
Questions
Q1. Can SYNSEM produce accurate semantic interpretations?
Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers?
Q3. Does it also improve on long sentences? Q4. Does it improve on limited training data due to the
prior knowledge from large treebanks? Q5. Can it handle syntactic errors?
74
Results on CLang
Precision Recall F-measure
GOLDSYN 84.7 74.0 79.0
SYN20 85.4 70.0 76.9
SYN0 87.0 67.0 75.7
SCISSOR 89.5 73.7 80.8
WASP 88.9 61.9 73.0
KRISP 85.2 61.9 71.7
LU 82.4 57.7 67.8
(LU: F-measure after reranking is 74.4%)
SYNSEM
SAPTs
GOLDSYN > SYN20 > SYN0
75
Questions
Q1. Can SynSem produce accurate semantic interpretations? [yes]
Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes]
Q3. Does it also improve on long sentences?
76
Detailed Clang Results on Sentence Length
31-40(13%)
21-30(46%)
0-10(7%)
11-20(33%)
Prior Knowledge
Syntactic error
+ Flexibility +
= ?
77
Questions
Q1. Can SynSem produce accurate semantic interpretations? [yes]
Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes]
Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the
prior knowledge from large treebanks?
78
Results on Clang (training size = 40)
Precision Recall F-measure
GOLDSYN 61.1 35.7 45.1
SYN20 57.8 31.0 40.4
SYN0 53.5 22.7 31.9
SCISSOR 85.0 23.0 36.2
WASP 88.0 14.4 24.7
KRISP 68.35 20.0 31.0
SYNSEM
SAPTs
The quality of syntactic parser is critically important!
79
Questions
Q1. Can SynSem produce accurate semantic interpretations? [yes]
Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes]
Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the
prior knowledge from large treebanks? [yes] Q5. Can it handle syntactic errors?
80
Handling Syntactic Errors
Training ensures meaning composition from syntactic parses with errors
For test NLs that generate correct MRs, measure the F-measures of their syntactic parses SYN0: 85.5% SYN20: 91.2%
If DR2C7 is true then players 2 , 3 , 7 and 8 should pass to player 4
81
Questions
Q1. Can SynSem produce accurate semantic interpretations? [yes]
Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes]
Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the
prior knowledge of large treebanks? [yes] Q5. Is it robust to syntactic errors? [yes]
82
Results on Geoquery (Prolog)
Precision Recall F-measure
GOLDSYN 91.9 88.2 90.0
SYN40 90.2 86.9 88.5
SYN0 81.8 79.0 80.4
COCKTAIL 89.9 79.4 84.3
λ-WASP 92.0 86.6 89.2
Z&C 95.5 83.2 88.9
SYNSEM
SYN0 does not perform well
All other recent systems perform competitively
83
SYNSEM Summary
Exploits an existing syntactic parser to drive the meaning composition process
Prior knowledge of syntax improves performance on long sentences
Prior knowledge of syntax improves performance on limited training data
Handle syntactic errors
84
Discriminative Reranking for semantic Parsing
Adapt global features used for reranking syntactic parsing for semantic parsing
Improvement on CLang No improvement on Geoquery where
sentences are short, and are less likely for global features to show improvement on
86
Future Work
Improve SCISSOR Discriminative SCISSOR (Finkel, et al., 2008)
Handling logical forms SCISSOR without extra annotation (Klein and
Manning, 2002, 2004) Improve SYNSEM
Utilizing syntactic parsers with improved accuracy and in other syntactic formalism
87
Future Work
Utilizing wide-coverage semantic representations (Curran et al., 2007) Better generalizations for syntactic variations
Utilizing semantic role labeling (Gildea and Palmer, 2002) Provides a layer of correlated semantic
information
89
Conclusions
SCISSOR: a novel integrated syntactic-semantic parser.
SYNSEM: exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional meaning composition.
Both produce accurate semantic interpretations. Using the knowledge of syntax improves
performance on long sentences. SYNSEM also improves performance on limited
training data.