1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D....

1

Learning for Semantic Parsing Using Statistical Syntactic Parsing

Techniques

Ruifang GePh.D. Final Defense

Supervisor: Raymond J. Mooney

Machine Learning GroupDepartment of Computer ScienceThe University of Texas at Austin

2

Semantic Parsing

Semantic Parsing: Transforming natural language (NL) sentences into completely formal meaning representations (MRs)

Sample application domains where MRs are directly executable by another computer system to perform some task CLang: Robocup Coach Language Geoquery: A Database Query Application

3

CLang (RoboCup Coach Language)

In RoboCup Coach competition, teams compete to coach simulated players

The coaching instructions are given in a formal language called CLang

Simulated soccer field

Coach

If our player 2 has the ball, then position

our player 5 in the midfield.

CLang ((bowner (player our {2}))

(do (player our {5}) (pos (midfield))))

Semantic Parsing

4

GeoQuery: A Database Query Application

Query application for U.S. geography database [Zelle & Mooney, 1996]

User What are the rivers in Texas?

Semantic Parsing

DataBaseAngelina, Angelina, Blanco, …Blanco, …

Query answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))

5

Motivation for Semantic Parsing

Theoretically, it answers the question of how people interpret language

Practical applications Question answering Natural language interface Knowledge acquisition Reasoning

6

Motivating Example

Semantic parsing is a compositional process. Sentence structures are needed for building meaning representations.

((bowner (player our {2})) (do our {4} (pos (half our))))

If our player 2 has the ball, our player 4 should stay in our half

bowner: ball ownerpos: position

7

Syntax-Based Approaches

Meaning composition follows the tree structure of a syntactic parse

Composing the meaning of a constituent from the meanings of its sub-constituents in a syntactic parse

Hand-built approaches (Woods, 1970, Warren and Pereira, 1982)

Learned approaches Miller et al. (1996): Conceptually simple sentences Zettlemoyer & Collins (2005)): hand-built Combinatory

Categorial Grammar (CCG) template rules

8

Example

our player 2 has

the ball

PRP$ NN CD VB

DT NN

NP

VPNP

S

MR: bowner(player(our,2))

Use the structure of a syntactic parse

9

our player 2 has

the ball

PRP$-our NN-player(_,_) CD-2 VB-bowner(_)

DT-null NN-null

NP

VPNP

S

ExampleMR: bowner(player(our,2))

Assign semantic concepts to words

10

our player 2 has

the ball


DT-null NN-null

NP

VPNP-player(our,2)

S


Compose meaning for the internal nodes

11

our player 2 has

the ball


DT-null NN-null

NP-null

VP-bowner(_)NP-player(our,2)

S



12

our player 2 has

the ball


DT-null NN-null

NP-null

VP-bowner(_)NP-player(our,2)

S-bowner(player(our,2))



13

Semantic Grammars

Non-terminals in a semantic grammar correspond to semantic concepts in application domains

Hand-built approaches (Hendrix et al., 1978) Learned approaches

Tang & Mooney (2001), Kate & Mooney (2006), Wong & Mooney (2006)

14

our player 2

has the ballour 2

player

bowner


bowner → player has the ball

15

Thesis Contributions

Introduce two novel syntax-based approaches to semantic parsing Theoretically well-founded in computational

semantics (Blackburn and Bos, 2005) Great opportunity: leverage the significant

progress made in statistical syntactic parsing for semantic parsing (Collins, 1997; Charniak and Johnson, 2005; Huang, 2008)

16

Thesis Contributions

SCISSOR: a novel integrated syntactic-semantic parser

SYNSEM: exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional meaning composition

Investigate when the knowledge of syntax can help

17

Representing Semantic Knowledge in Meaning Representation Language Grammar (MRLG)

Production PredicateCONDITION →(bowner PLAYER) P_BOWNER

PLAYER →(player TEAM {UNUM}) P_PLAYER

UNUM → 2 P_UNUM

TEAM → our P_OUR

Assumes a meaning representation language (MRL) is defined by an unambiguous context-free grammar.

Each production rule introduces a single predicate in the MRL.

The parse of a MR gives its predicate-argument structure.

18

Roadmap

SCISSOR SYNSEM

Future Work Conclusions

19

Semantic Composition that Integrates Syntax and Semantics to get Optimal Representations

Integrated syntactic-semantic parsing Allows both syntax and semantics to be used

simultaneously to obtain an accurate combined syntactic-semantic analysis

A statistical parser is used to generate a semantically augmented parse tree (SAPT)

SCISSOR

20

Syntactic Parse

PRP$ NN CD VB

DT NN

NP

VPNP

S

our player 2 has

the ball

21

SAPT

PRP$-P_OUR NN-P_PLAYER CD- P_UNUM VB-P_BOWNER

DT-NULL NN-NULL

NP-NULL

VP-P_BOWNERNP-P_PLAYER

S-P_BOWNER

our player 2 has

the ball

Non-terminals now have both syntactic and semantic labels

Semantic labels: dominate predicates in the sub-trees

22

SAPT


DT-NULL NN-NULL

NP-NULL


S-P_BOWNER

our player 2 has

the ball

MR: P_BOWNER(P_PLAYER(P_OUR,P_UNUM))

23

SCISSOR Overview

Integrated Semantic ParserSAPT Training Examples

TRAINING

learner

24

Integrated Semantic Parser

SAPT

Compose MR

MR

NL Sentence

TESTING

SCISSOR Overview

25

Extending Collins’ (1997) Syntactic Parsing Model

Find a SAPT with the maximum probability A lexicalized head-driven syntactic parsing

model Extending the parsing model to generate

semantic labels simultaneously with syntactic labels

26

Why Extending Collins’ (1997) Syntactic Parsing Model

Suitable for incorporating semantic knowledge Head dependency: predicate-argument relation Syntactic subcategorization: a set of arguments

that a predicate appears with

Bikel (2004) implementation: easily extendable

27

Parser Implementation

Supervised training on annotated SAPTs is just frequency counting

Testing: a variant of standard CKY chart-parsing algorithm

Details in the thesis

28

Smoothing

Each label in SAPT is the combination of a syntactic label and a semantic label

Increases data sparsity Break the parameters down

Ph(H | P, w)

= Ph(Hsyn, Hsem | P, w)

= Ph(Hsyn | P, w) × Ph(Hsem | P, w, Hsyn)

29

Experimental Corpora

CLang (Kate, Wong & Mooney, 2005) 300 pieces of coaching advice 22.52 words per sentence

Geoquery (Zelle & Mooney, 1996) 880 queries on a geography database 7.48 word per sentence MRL: Prolog and FunQL

30

Prolog vs. FunQL (Wong, 2007)

Prolog: answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))

What are the rivers in Texas?

FunQL: answer(river(loc_2(stateid(texas))))

X1: river; x2: texas

Logical forms: widely used as MRLs in computational semantics, support reasoning

31

Prolog: answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))


FunQL: answer(river(loc_2(stateid(texas))))

Flexible order

Strict order

Better generalization on Prolog

Prolog vs. FunQL (Wong, 2007)

32

Experimental Methodology

standard 10-fold cross validation Correctness

CLang: exactly matches the correct MR Geoquery: retrieves the same answers as the

correct MR Metrics

Precision: % of the returned MRs that are correct Recall: % of NLs with their MRs correctly returned F-measure: harmonic mean of precision and recall

33

Compared Systems

COCKTAIL (Tang & Mooney, 2001) Deterministic, inductive logic programming

WASP (Wong & Mooney, 2006) Semantic grammar, machine translation

KRISP (Kate & Mooney, 2006) Semantic grammar, string kernels

Z&C (Zettleymoyer & Collins, 2007) Syntax-based, combinatory categorial grammar (CCG)

LU (Lu et al., 2008) Semantic grammar, generative parsing model

34

Compared Systems






Hand-built lexicon for Geoquery

Manual CCGTemplate rules

35

Compared Systems






λ-WASP, handling logical forms

36

Results on CLang

Precision Recall F-measure

COCKTAIL - - -

SCISSOR 89.5 73.7 80.8

WASP 88.9 61.9 73.0

KRISP 85.2 61.9 71.7

Z&C - - -

LU 82.4 57.7 67.8

(LU: F-measure after reranking is 74.4%)

Memory overflow

Not reported

37

Results on CLang


SCISSOR 89.5 73.7 80.8

WASP 88.9 61.9 73.0

KRISP 85.2 61.9 71.7

LU 82.4 57.7 67.8


38

Results on Geoquery


SCISSOR 92.1 72.3 81.0

WASP 87.2 74.8 80.5

KRISP 93.3 71.7 81.1

LU 86.2 81.8 84.0

COCKTAIL 89.9 79.4 84.3

λ-WASP 92.0 86.6 89.2

Z&C 95.5 83.2 88.9


Prolog

FunQL

39

Results on Geoquery (FunQL)


SCISSOR 92.1 72.3 81.0

WASP 87.2 74.8 80.5

KRISP 93.3 71.7 81.1

LU 86.2 81.8 84.0


competitive

40

Why Knowledge of Syntax does not Help

Geoquery: 7.48 word per sentence Short sentence

Sentence structure can be feasibly learned from NLs paired with MRs

Gain from knowledge of syntax vs. flexibility loss

41

Limitation of Using Prior Knowledge of Syntax

What state

is the smallest

N1

N2

answer(smallest(state(all)))

Traditional syntactic analysis

42

Limitation of Using Prior Knowledge of Syntax

What state

is the smallest

state is the smallest

N1

What N2

N1

N2

answer(smallest(state(all))) answer(smallest(state(all)))

Traditional syntactic analysis Semantic grammar

Isomorphic syntactic structure with MRBetter generalization

43

Why Prior Knowledge of Syntax does not Help

Geoquery: 7.48 word per sentence Short sentence

Sentence structure can be feasibly learned from NLs paired with MRs

Gain from knowledge of syntax vs. flexibility loss

LU vs. WASP and KRISP Decomposed model for semantic grammar

44

Detailed Clang Results on Sentence Length

0-10(7%)

11-20(33%)

21-30(46%)

31-40(13%)

0-10(7%)

11-20(33%)

21-30(46%)

0-10(7%)

11-20(33%)

31-40(13%)

21-30(46%)

0-10(7%)

11-20(33%)

45

SCISSOR Summary

Integrated syntactic-semantic parsing approach

Learns accurate semantic interpretations by utilizing the SAPT annotations

knowledge of syntax improves performance on long sentences

46

Roadmap

SCISSOR SYNSEM


47

SYNSEM Motivation

SCISSOR requires extra SAPT annotation for training

Must learn both syntax and semantics from same limited training corpus

High performance syntactic parsers are available that are trained on existing large corpora (Collins, 1997; Charniak & Johnson, 2005)

48

SCISSOR Requires SAPT Annotation


DT-NULL NN-NULL

NP-NULL


S-P_BOWNER

our player 2 has

the ball

Time consuming.Automate it!

49

Part I: Syntactic Parse

PRP$ NN CD VB

DT NN

NP

VPNP

S

our player 2 has

the ball

Use a statistical syntactic parser

50

Part II: Word Meanings

P_OUR P_PLAYER P_UNUM P_BOWNER NULL NULL

our player 2 hasthe ball

Use a word alignment model (Wong and

Mooney (2006) )

our player 2 has ballthe

P_PLAYERP_BOWNER P_OUR P_UNUM

51

Learning a Semantic Lexicon

IBM Model 5 word alignment (GIZA++) top 5 word/predicate alignments for each training

example Assume each word alignment and syntactic parse

defines a possible SAPT for composing the correct MR

52

Introducing λvariables in semantic labels for missing arguments (a1: the first argument)


VP

S

NP

NP

P_OUR

λa1λa2P_PLAYER

λa1P_BOWNER

P_UNUM NULLNULL

NP

53


VP

S

NP

NP

P_OUR

λa1λa2P_PLAYER

λa1P_BOWNER

P_UNUM NULLNULL

P_BOWNER

P_PLAYER

P_UNUMP_OUR

Part III: Internal Semantic Labels

How to choose the dominant predicates?

NP

54

λa1λa2P_PLAYER P_UNUM

?

player 2

P_BOWNER

P_PLAYER

P_UNUMP_OUR

, a2=c2P_PLAYERλa1λa2PLAYER + P_UNUM λa1

(c2: child 2)

Learning Semantic Composition Rules

55


VP

S

NPP_OUR

λa1λa2P_PLAYER

λa1P_BOWNER

P_UNUM NULLNULL

λa1P_PLAYER

?

λa1λa2PLAYER + P_UNUM {λa1P_PLAYER, a2=c2}

P_BOWNER

P_PLAYER

P_UNUMP_OUR


56


VP

S

P_OUR

λa1λa2P_PLAYER

λa1P_BOWNER

P_UNUM NULLNULL

λa1P_PLAYER ?

P_PLAYER

P_OUR +λa1P_PLAYER {P_PLAYER, a1=c1}

P_BOWNER

P_PLAYER

P_UNUMP_OUR


57


P_OUR

λa1λa2P_PLAYER

λa1P_BOWNER

P_UNUM NULLNULL

λa1P_PLAYER

P_PLAYER

NULL

λa1P_BOWNER

?P_BOWNER

P_PLAYER

P_UNUMP_OUR


58


P_OUR

λa1λa2P_PLAYER

λa1P_BOWNER

P_UNUM NULLNULL

λa1P_PLAYER

P_PLAYER

NULL

λa1P_BOWNER

P_BOWNER

P_PLAYER + λa1P_BOWNER {P_BOWNER, a1=c1}

P_BOWNER

P_PLAYER

P_UNUMP_OUR


59

Ensuring Meaning Composition

What state

is the smallest

N1

N2

answer(smallest(state(all)))

Non-isomorphism

60

Ensuring Meaning Composition

Non-isomorphism between NL parse and MR parse Various linguistic phenomena Machine translation between NL and MRL Use automated syntactic parses

Introduce macro-predicates that combine multiple predicates.

Ensure that MR can be composed using a syntactic parse and word alignment

61

Unambiguous CFG of MRL

Training set, {(S,T,MR)}

Training

Semantic parsingInput sentence parse T Output MR

Testing

Before training & testing

training/test sentence, SSyntactic parser

syntactic parse tree,T

Semantic knowledge acquisition

Semantic lexicon & composition rules

Parameter estimation

Probabilistic parsing model

SYNSEM Overview

62

Unambiguous CFG of MRL

Training set, {(S,T,MR)}

Training

Semantic parsingInput sentence, S Output MR

Testing

Before training & testing

training/test sentence, SSyntactic parser

syntactic parse tree,T

Semantic knowledge acquisition

Semantic lexicon & composition rules

Parameter estimation

Probabilistic parsing model

SYNSEM Overview

63

Parameter Estimation

• Apply the learned semantic knowledge to all training examples to generate possible SAPTs

• Use a standard maximum-entropy model similar to that of Zettlemoyer & Collins (2005), and Wong & Mooney (2006)

• Training finds a parameter that (approximately) maximizes the sum of the conditional log-likelihood of the training set including syntactic parses

• Incomplete data since SAPTs are hidden variables

64

Features

Lexical features: Unigram features: # that a word is assigned a

predicate Bigram features: # that a word is assigned a

predicate given its previous/subsequent word. Rule features: # a composition rule applied in

a derivation

65

answer(x1, (river(x1), loc(x1,x2), equal(x2,stateid(texas))))


λv1P_ANSWER(x1)

λv1P_RIVER(x1) λv1λv2P_LOC(x1,x2) λv1P_EQUAL(x2)

Handling Logical Forms

Handle shared logical variablesUse Lambda Calculus (v: variable)

66

Prolog Example



λv1P_ANSWER(x1)

(λv1P_RIVER(x1) λv1 λv2P_LOC(x1,x2) λv1P_EQUAL(x2))


67

Prolog Example



λv1P_ANSWER(x1)

(λv1P_RIVER(x1) λv1λv2P_LOC(x1,x2) λv1P_EQUAL(x2))


68

Prolog Example

What are the rivers in Texas

NP

PP

IN

SBARQ

NP

NP

SQ

VBPWHNP


Start from a syntactic parse

69

Prolog Example


PP

SBARQ

NP

SQ

λv1λa1P_ANSWER NULL λv1P_RIVER λv1λv2P_LOC λv1P_EQUAL

Add predicates to words


70

Prolog Example


SBARQ

NP

SQ

λv1λa1P_ANSWER NULL λv1P_RIVER λv1λv2P_LOC λv1P_EQUAL

λv1P_LOC


Learn a rule with variable unification

λv1λv2P_LOC(x1,x2) + λv1P_EQUAL(x2) λv1P_LOC

71

Experimental Results

CLang Geoquery (Prolog)

72

Syntactic Parsers (Bikel,2004)

WSJ only CLang(SYN0): F-measure=82.15% Geoquery(SYN0) : F-measure=76.44%

WSJ + in-domain sentences CLang(SYN20): 20 sentences, F-measure=88.21% Geoquery(SYN40): 40 sentences, F-measure=91.46%

Gold-standard syntactic parses (GOLDSYN)

73

Questions

Q1. Can SYNSEM produce accurate semantic interpretations?

Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers?

Q3. Does it also improve on long sentences? Q4. Does it improve on limited training data due to the

prior knowledge from large treebanks? Q5. Can it handle syntactic errors?

74

Results on CLang


GOLDSYN 84.7 74.0 79.0

SYN20 85.4 70.0 76.9

SYN0 87.0 67.0 75.7

SCISSOR 89.5 73.7 80.8

WASP 88.9 61.9 73.0

KRISP 85.2 61.9 71.7

LU 82.4 57.7 67.8


SYNSEM

SAPTs

GOLDSYN > SYN20 > SYN0

75

Questions

Q1. Can SynSem produce accurate semantic interpretations? [yes]

Q2. Can more accurate Treebank syntactic parsers produce more accurate semantic parsers? [yes]

Q3. Does it also improve on long sentences?

76

Detailed Clang Results on Sentence Length

31-40(13%)

21-30(46%)

0-10(7%)

11-20(33%)

Prior Knowledge

Syntactic error

+ Flexibility +

= ?

77

Questions



Q3. Does it also improve on long sentences? [yes] Q4. Does it improve on limited training data due to the

prior knowledge from large treebanks?

78

Results on Clang (training size = 40)


GOLDSYN 61.1 35.7 45.1

SYN20 57.8 31.0 40.4

SYN0 53.5 22.7 31.9

SCISSOR 85.0 23.0 36.2

WASP 88.0 14.4 24.7

KRISP 68.35 20.0 31.0

SYNSEM

SAPTs

The quality of syntactic parser is critically important!

79

Questions




prior knowledge from large treebanks? [yes] Q5. Can it handle syntactic errors?

80

Handling Syntactic Errors

Training ensures meaning composition from syntactic parses with errors

For test NLs that generate correct MRs, measure the F-measures of their syntactic parses SYN0: 85.5% SYN20: 91.2%

If DR2C7 is true then players 2 , 3 , 7 and 8 should pass to player 4

81

Questions




prior knowledge of large treebanks? [yes] Q5. Is it robust to syntactic errors? [yes]

82

Results on Geoquery (Prolog)


GOLDSYN 91.9 88.2 90.0

SYN40 90.2 86.9 88.5

SYN0 81.8 79.0 80.4

COCKTAIL 89.9 79.4 84.3

λ-WASP 92.0 86.6 89.2

Z&C 95.5 83.2 88.9

SYNSEM

SYN0 does not perform well

All other recent systems perform competitively

83

SYNSEM Summary

Exploits an existing syntactic parser to drive the meaning composition process

Prior knowledge of syntax improves performance on long sentences

Prior knowledge of syntax improves performance on limited training data

Handle syntactic errors

84

Discriminative Reranking for semantic Parsing

Adapt global features used for reranking syntactic parsing for semantic parsing

Improvement on CLang No improvement on Geoquery where

sentences are short, and are less likely for global features to show improvement on

85

Roadmap

SCISSOR SYNSEM


86

Future Work

Improve SCISSOR Discriminative SCISSOR (Finkel, et al., 2008)

Handling logical forms SCISSOR without extra annotation (Klein and

Manning, 2002, 2004) Improve SYNSEM

Utilizing syntactic parsers with improved accuracy and in other syntactic formalism

87

Future Work

Utilizing wide-coverage semantic representations (Curran et al., 2007) Better generalizations for syntactic variations

Utilizing semantic role labeling (Gildea and Palmer, 2002) Provides a layer of correlated semantic

information

88

Roadmap

SCISSOR SYNSEM


89

Conclusions

SCISSOR: a novel integrated syntactic-semantic parser.

SYNSEM: exploits an existing syntactic parser to produce disambiguated parse trees that drive the compositional meaning composition.

Both produce accurate semantic interpretations. Using the knowledge of syntax improves

performance on long sentences. SYNSEM also improves performance on limited

training data.

90

Thank you!

Questions?

Date post:	30-Dec-2015
Category:	Documents
Upload:	paula-richards
View:	228 times
Download:	0 times

1 Learning for Semantic Parsing Using Statistical Syntactic Parsing Techniques Ruifang Ge Ph.D....

Documents