+ All Categories
Home > Documents > A Kernel-based Approach to Learning Semantic Parsers

A Kernel-based Approach to Learning Semantic Parsers

Date post: 14-Jan-2016
Category:
Upload: fruma
View: 42 times
Download: 1 times
Share this document with a friend
Description:
A Kernel-based Approach to Learning Semantic Parsers. Rohit J. Kate Doctoral Dissertation Proposal Supervisor: Raymond J. Mooney. November 21, 2005. Outline. Semantic Parsing Related Work Background on Kernel-based Methods Completed Research Proposed Research Conclusions. - PowerPoint PPT Presentation
112
University of Texas at Austin Machine Learning Group Machine Learning Group Department of Computer Sciences University of Texas at Austin A Kernel-based Approach to Learning Semantic Parsers November 21, 2005 Rohit J. Kate Doctoral Dissertation Proposal Supervisor: Raymond J. Mooney
Transcript
Page 1: A Kernel-based Approach to Learning Semantic Parsers

University of Texas at Austin

Machine Learning Group

Machine Learning GroupDepartment of Computer Sciences

University of Texas at Austin

A Kernel-based Approach to Learning Semantic Parsers

November 21, 2005

Rohit J. Kate Doctoral Dissertation Proposal Supervisor: Raymond J. Mooney

Page 2: A Kernel-based Approach to Learning Semantic Parsers

2

Outline

• Semantic Parsing

• Related Work

• Background on Kernel-based Methods

• Completed Research

• Proposed Research

• Conclusions

Page 3: A Kernel-based Approach to Learning Semantic Parsers

3

Semantic Parsing

• Semantic Parsing: Transforming natural language (NL) sentences into computer executable complete meaning representations (MRs)

• Importance of Semantic Parsing– Natural language communication with computers

– Insights into human language acquisition

• Example application domains– CLang: Robocup Coach Language

– Geoquery: A Database Query Application

Page 4: A Kernel-based Approach to Learning Semantic Parsers

4

CLang: RoboCup Coach Language

• In RoboCup Coach competition teams compete to coach simulated players

• The coaching instructions are given in a formal language called CLang

Simulated soccer field

Coach

CLang

If our player 4 has the ball, our player

4 should shoot.

((bowner our {4}) (do our {4} shoot))

Semantic Parsing

Page 5: A Kernel-based Approach to Learning Semantic Parsers

5

Geoquery: A Database Query Application

• Query application for U.S. geography database containing about 800 facts [Zelle & Mooney, 1996]

User

Which rivers run through the

states bordering Texas?

Query answer(traverse_2( next_to(stateid(‘texas’))))

Semantic Parsing

Page 6: A Kernel-based Approach to Learning Semantic Parsers

6

Learning Semantic Parsers

• Assume meaning representation languages (MRLs) have deterministic context free grammars– true for almost all computer languages

– MRs can be parsed unambiguously

Page 7: A Kernel-based Approach to Learning Semantic Parsers

7

NL: Which rivers run through the states bordering Texas?

MR: answer(traverse_2(next_to(stateid(‘texas’))))

Parse tree of MR:

Non-terminals: ANSWER, RIVER, TRAVERSE_2, STATE, NEXT_TO, STATEID

Terminals: answer, traverse_2, next_to, stateid, ‘texas’

Productions: ANSWER answer(RIVER), RIVER TRAVERSE_2(STATE),

STATE NEXT_TO(STATE), STATE NEXT_TO(STATE)

TRAVERSE_2 traverse_2, NEXT_TO next_to, STATEID ‘texas’

ANSWER

answer

STATE

RIVER

STATE

NEXT_TO

TRAVERSE_2

STATEID

stateid ‘texas’

next_to

traverse_2

Page 8: A Kernel-based Approach to Learning Semantic Parsers

8

Learning Semantic Parsers

• Assume meaning representation languages (MRLs) have deterministic context free grammars– true for almost all computer languages– MRs can be parsed unambiguously

• Training data consists of NL sentences paired with their MRs

• Induce a semantic parser which can map novel NL sentences to their correct MRs

• Learning problem differs from that of syntactic parsing where training data has trees annotated over the NL sentences

Page 9: A Kernel-based Approach to Learning Semantic Parsers

9

Outline

• Semantic Parsing

• Related Work

• Background on Kernel-based Methods

• Completed Research

• Proposed Research

• Conclusions

Page 10: A Kernel-based Approach to Learning Semantic Parsers

10

Related Work: CHILL [Zelle & Mooney, 1996]

• Uses Inductive Logic Programming (ILP) to induce a semantic parser

• Learns rules to control actions of a deterministic shift-reduce parser

• Processes sentence one word at a time making hard parsing decision each time

• Brittle and ILP techniques do not scale to large corpora

Page 11: A Kernel-based Approach to Learning Semantic Parsers

11

Related Work: SILT [Kate, Wong & Mooney, 2005]

• Transformation rules associate NL patterns with MRL templates

• NL patterns matched in the sentence are replaced by the MRL templates

• By the end of parsing, NL sentence gets transformed into its MR

• Two versions: string patterns and syntactic tree patterns

NL pattern our left [3] penalty area

MRL template AREA (left (penalty-area our))

Page 12: A Kernel-based Approach to Learning Semantic Parsers

12

Related Work: SILT contd.

Weaknesses of SILT:• Hard-matching transformation rules are brittle:

– For e.g. for NL pattern our left [3] penalty area

“our left penalty area”

“our left side of penalty area ”

“left of our penalty area”

“our ah.. left penalty area”

• Parsing is done deterministically which is less robust than probabilistic parsing

Page 13: A Kernel-based Approach to Learning Semantic Parsers

13

Related Work: WASP [Wong, 2005]

• Based on Synchronous Context-free Grammars

• Uses Machine Translation technique of statistical word alignment to find good transformation rules

• Builds a maximum entropy model for parsing

• The transformation rules are hard-matching

Page 14: A Kernel-based Approach to Learning Semantic Parsers

14

• Based on a fairly standard approach to compositional semantics [Jurafsky and Martin, 2000]

• A statistical parser is used to generate a semantically augmented parse tree (SAPT)– Augment Collins’ head-driven model 2 (Bikel’s

implementation, 2004) to incorporate semantic labels

• Translate SAPT into a complete formal meaning representation

Related Work: SCISSOR [Ge & Mooney, 2005]

our player 2 has

the ball

PRP$-team NN-player CD-unum VB-bowner

DT-null NN-null

NP-null

VP-bownerNP-player

S-bowner

our player 2 has

the ball

PRP$-team NN-player CD-unum VB-bowner

DT-null NN-null

NP-null

VP-bownerNP-player

S-bowner

Page 15: A Kernel-based Approach to Learning Semantic Parsers

15

Related Work: Zettlemoyer & Collins [2005]

• Uses Combinatorial Categorial Grammar (CCG) formalism to learn a statistical semantic parser

• Generates CCG lexicon relating NL words to semantic types through general hand-built template rules

• Uses maximum entropy model for compacting this lexicon and doing probabilistic CCG parsing

Page 16: A Kernel-based Approach to Learning Semantic Parsers

16

Outline

• Semantic Parsing

• Related Work

• Background on Kernel-based Methods

• Completed Research

• Proposed Research

• Conclusions

Page 17: A Kernel-based Approach to Learning Semantic Parsers

17

Traditional Machine Learning with Structured Data

Examples

Feature Vectors

Information loss

Machine Learning Algorithm

Feature Engineering

Page 18: A Kernel-based Approach to Learning Semantic Parsers

18

Kernel-based Machine Learning with Structured Data

Examples

Kernelized Machine Learning Algorithm

Kernel ComputationsImplicit mapping to potentially infinite number of features

Page 19: A Kernel-based Approach to Learning Semantic Parsers

19

Kernel Functions

• A kernel K is a similarity function over domain X which maps any two objects x, y in X to their similarity score K(x,y)

• For x1, x2 ,…, xn in X, the n-by-n matrix (K(xi,xj))ij

should be symmetric and positive-semidefinite, then the kernel function calculates the dot-product of the implicit feature vectors in some high-dimensional feature space

• Machine learning algorithms which use the data only to compute similarity can be kernelized (e.g. Support Vector Machines, Nearest Neighbor etc.)

Page 20: A Kernel-based Approach to Learning Semantic Parsers

20

String Subsequence Kernel

• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]

• All possible subsequences become the implicit feature vectors and the kernel computes their dot-products

s = “left side of our penalty area”

t = “our left penalty area”

K(s,t) = ?

Page 21: A Kernel-based Approach to Learning Semantic Parsers

21

String Subsequence Kernel

• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]

• All possible subsequences become the implicit feature vectors and the kernel computes their dot-products

s = “left side of our penalty area”

t = “our left penalty area”

u = left

K(s,t) = 1+?

Page 22: A Kernel-based Approach to Learning Semantic Parsers

22

String Subsequence Kernel

• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]

• All possible subsequences become the implicit feature vectors and the kernel computes their dot-products

s = “left side of our penalty area”

t = “our left penalty area”

u = our

K(s,t) = 2+?

Page 23: A Kernel-based Approach to Learning Semantic Parsers

23

String Subsequence Kernel

• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]

• All possible subsequences become the implicit feature vectors and the kernel computes their dot-products

s = “left side of our penalty area”

t = “our left penalty area”

u = penalty

K(s,t) = 3+?

Page 24: A Kernel-based Approach to Learning Semantic Parsers

24

String Subsequence Kernel

• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]

• All possible subsequences become the implicit feature vectors and the kernel computes their dot-products

s = “left side of our penalty area”

t = “our left penalty area”

u = area

K(s,t) = 4+?

Page 25: A Kernel-based Approach to Learning Semantic Parsers

25

String Subsequence Kernel

• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]

• All possible subsequences become the implicit feature vectors and the kernel computes their dot-products

s = “left side of our penalty area”

t = “our left penalty area”

u = left penalty

K(s,t) = 5+?

Page 26: A Kernel-based Approach to Learning Semantic Parsers

26

String Subsequence Kernel

• Define kernel between two strings as the number of common subsequences between them [Lodhi et al., 2002]

• All possible subsequences become the implicit feature vectors and the kernel computes their dot-products

s = “left side of our penalty area”

t = “our left penalty area”

K(s,t) = 11

Page 27: A Kernel-based Approach to Learning Semantic Parsers

27

Normalized String Subsequence Kernel

• Normalize the kernel (range [0,1]) to remove any bias due to different string lengths

• Lodhi et al. [2002] give O(n|s||t|) for computing string subsequence kernel

• Used for Text Categorization [Lodhi et al, 2002] and Information Extraction [Bunescu & Mooney, 2005b]

),(*),(

),(),(

ttKssK

tsKtsKnormalized

Page 28: A Kernel-based Approach to Learning Semantic Parsers

28

Support Vector Machines

• Mapping data to high-dimensional feature spaces can lead to overfitting of training data (“curse of dimensionality”)

• Support Vector Machines (SVMs) are known to be resistant to this overfitting

Page 29: A Kernel-based Approach to Learning Semantic Parsers

29

SVMs: Maximum Margin

• Given positive and negative examples, SVMs find a separating hyperplane such that the margin ρ between the closest examples is maximized

• Maximizing the margin is good according to intuition and PAC theory ρ

Separating hyperplane

Page 30: A Kernel-based Approach to Learning Semantic Parsers

30

SVMs: Probability Estimates

• Probability estimate of a point belonging to a class can be obtained using its distance from the hyperplane [Platt, 1999]

Page 31: A Kernel-based Approach to Learning Semantic Parsers

31

Why Kernel-based Approach to Learning Semantic Parsers?

• Natural language sentences are structured• Natural languages are flexible, various ways to

express the same semantic concept CLang MR: (left (penalty-area our))

NL: our left penalty area

our left side of penalty area

left side of our penalty area

left of our penalty area

our penalty area towards the left side

our ah.. left penalty area

Page 32: A Kernel-based Approach to Learning Semantic Parsers

32

Why Kernel-based Approach to Learning Semantic Parsers?

our left side of penalty area

our left penalty arealeft side of our penalty area

left of our penalty area

our penalty area towards the left side

our ah.. left penalty area

opponent’s right penalty area

our right midfield

right side of our penalty area

Kernel methods can robustly capture the range of NL contexts.

Page 33: A Kernel-based Approach to Learning Semantic Parsers

33

Outline

• Semantic Parsing

• Related Work

• Background on Kernel-based Methods

• Completed Research

• Proposed Research

• Conclusions

Page 34: A Kernel-based Approach to Learning Semantic Parsers

34

KRISP: Kernel-based Robust Interpretation by Semantic Parsing

• Learns semantic parser from NL sentences paired with their respective MRs given MRL grammar

• Productions of MRL are treated like semantic concepts

• SVM classifier is trained for each production with string subsequence kernel

• These classifiers are used to compositionally build MRs of the sentences

Page 35: A Kernel-based Approach to Learning Semantic Parsers

35

Overview of KRISP

Train string-kernel-based SVM classifiers

Semantic Parser

Collect positive and negative examples

MRL Grammar

NL sentences with MRs

Novel NL sentences Best MRs

Best semantic derivations (correct and incorrect)

Training

Testing

Page 36: A Kernel-based Approach to Learning Semantic Parsers

36

Overview of KRISP

Train string-kernel-based SVM classifiers

Semantic Parser

Collect positive and negative examples

MRL Grammar

NL sentences with MRs

Novel NL sentences Best MRs

Best semantic derivations (correct and incorrect)

Training

Testing

Page 37: A Kernel-based Approach to Learning Semantic Parsers

37

Overview of KRISP’s Semantic Parsing

• We first define Semantic Derivation of an NL sentence

• We define probability of a semantic derivation• Semantic parsing of an NL sentence involves

finding its most probable semantic derivation • Straightforward to obtain MR from a semantic

derivation

Page 38: A Kernel-based Approach to Learning Semantic Parsers

38

Semantic Derivation of an NL Sentence

ANSWER

answer

STATE

RIVER

STATE

NEXT_TO

TRAVERSE_2

STATEID

stateid ‘texas’

next_to

traverse_2

Which rivers run through the states bordering Texas?

MR parse with non-terminals on the nodes:

Page 39: A Kernel-based Approach to Learning Semantic Parsers

39

Semantic Derivation of an NL Sentence

Which rivers run through the states bordering Texas?

ANSWER answer(RIVER)

RIVER TRAVERSE_2(STATE)

TRAVERSE_2 traverse_2 STATE NEXT_TO(STATE)

NEXT_TO next_to STATE STATEID

STATEID ‘texas’

MR parse with productions on the nodes:

Page 40: A Kernel-based Approach to Learning Semantic Parsers

40

Semantic Derivation of an NL Sentence

Which rivers run through the states bordering Texas?

ANSWER answer(RIVER)

RIVER TRAVERSE_2(STATE)

TRAVERSE_2 traverse_2 STATE NEXT_TO(STATE)

NEXT_TO next_to STATE STATEID

STATEID ‘texas’

Semantic Derivation: Each node covers an NL substring:

Page 41: A Kernel-based Approach to Learning Semantic Parsers

41

Semantic Derivation of an NL Sentence

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4]) (STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

Semantic Derivation: Each node contains a production and the substring of NL sentence it covers:

(NEXT_TO next_to, [5..7])

1 2 3 4 5 6 7 8 9

Page 42: A Kernel-based Approach to Learning Semantic Parsers

42

Semantic Derivation of an NL Sentence

Through the states that border Texas which rivers run?

ANSWER answer(RIVER)

RIVER TRAVERSE_2(STATE)

TRAVERSE_2 traverse_2 STATE NEXT_TO(STATE)

NEXT_TO next_to STATE STATEID

STATEID ‘texas’

Substrings in NL sentence may be in a different order:

Page 43: A Kernel-based Approach to Learning Semantic Parsers

43

Semantic Derivation of an NL Sentence

Through the states that border Texas which rivers run?

(ANSWER answer(RIVER), [1..10])

(RIVER TRAVERSE_2(STATE), [1..10]]

(TRAVERSE_2 traverse_2, [7..10])(STATE NEXT_TO(STATE), [1..6])

(NEXT_TO next_to, [1..5]) (STATE STATEID, [6..6])

(STATEID ‘texas’, [6..6])

Nodes are allowed to permute the children productions from the original MR parse

1 2 3 4 5 6 7 8 9 10

Page 44: A Kernel-based Approach to Learning Semantic Parsers

44

Probability of a Semantic Derivation• Let Pπ(s[i..j]) be the probability that production π covers

the substring s[i..j],

• For e.g., PNEXT_TO next_to (“the states bordering”)

• Obtained from the string-kernel-based SVM classifiers trained for each production π

• Probability of a semantic derivation D:

Dji

jisPDP])..[,(

])..[()(

(NEXT_TO next_to, [5..7])

the states bordering5 6 7

Page 45: A Kernel-based Approach to Learning Semantic Parsers

45

Computing the Most Probable Semantic Derivation

• Task of semantic parsing is to find the most probable semantic derivation

• Let En,s[i..j], partial derivation, denote any subtree of a derivation tree with n as the LHS non-terminal of the root production covering sentence s from index i to j

• Example of ESTATE,s[5..9] :

• Derivation D is then EANSWER, s[1..|s|]

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

the states bordering Texas?5 6 7 8 9

Page 46: A Kernel-based Approach to Learning Semantic Parsers

46

Computing the Most Probable Semantic Derivation contd.

• Let E*STATE,s[5.,9], denote the most probable partial derivation

among all ESTATE,s[5.,9]

• This is computed recursively as follows:

(STATE NEXT_TO(STATE), [5..9])

the states bordering Texas?

))])..[((maxarg(..

]..[,*

1

jisPmakeTreeEGnnn

jisn

t

E*STATE,s[5..9]

5 6 7 8 9

Page 47: A Kernel-based Approach to Learning Semantic Parsers

47

Computing the Most Probable Semantic Derivation contd.

• Let E*STATE,s[5.,9], denote the most probable partial derivation

among all ESTATE,s[5.,9]

• This is computed recursively as follows:

(STATE NEXT_TO(STATE), [5..9])

the states bordering Texas?

))])..[((maxarg(..

]..[,*

1

jisPmakeTreeEGnnn

jisn

t

E*NEXT_TO,s[i..j] E*

STATE,s[i..j]

5 6 7 8 9

E*STATE,s[5..9]

Page 48: A Kernel-based Approach to Learning Semantic Parsers

48

Computing the Most Probable Semantic Derivation contd.

• Let E*STATE,s[5.,9], denote the most probable partial derivation

among all ESTATE,s[5.,9]

• This is computed recursively as follows:

(STATE NEXT_TO(STATE), [5..9])

the states bordering Texas?

))])..[((maxarg(..

]..[,*

1

jisPmakeTreeEGnnn

jisn

t

E*NEXT_TO,s[5..5] E*

STATE,s[6..9]

5 6 7 8 9

E*STATE,s[5..9]

Page 49: A Kernel-based Approach to Learning Semantic Parsers

49

Computing the Most Probable Semantic Derivation contd.

• Let E*STATE,s[5.,9], denote the most probable partial derivation

among all ESTATE,s[5.,9]

• This is computed recursively as follows:

(STATE NEXT_TO(STATE), [5..9])

the states bordering Texas?

))])..[((maxarg(..

]..[,*

1

jisPmakeTreeEGnnn

jisn

t

E*NEXT_TO,s[5..6] E*

STATE,s[7..9]

5 6 7 8 9

E*STATE,s[5..9]

Page 50: A Kernel-based Approach to Learning Semantic Parsers

50

Computing the Most Probable Semantic Derivation contd.

• Let E*STATE,s[5.,9], denote the most probable partial derivation

among all ESTATE,s[5.,9]

• This is computed recursively as follows:

(STATE NEXT_TO(STATE), [5..9])

the states bordering Texas?

))])..[((maxarg(..

]..[,*

1

jisPmakeTreeEGnnn

jisn

t

E*NEXT_TO,s[5..7] E*

STATE,s[8..9]

5 6 7 8 9

E*STATE,s[5..9]

Page 51: A Kernel-based Approach to Learning Semantic Parsers

51

Computing the Most Probable Semantic Derivation contd.

• Let E*STATE,s[5.,9], denote the most probable partial derivation

among all ESTATE,s[5.,9]

• This is computed recursively as follows:

(STATE NEXT_TO(STATE), [5..9])

the states bordering Texas?

))])..[((maxarg(..

]..[,*

1

jisPmakeTreeEGnnn

jisn

t

E*NEXT_TO,s[5..8] E*

STATE,s[9..9]

5 6 7 8 9

E*STATE,s[5..9]

Page 52: A Kernel-based Approach to Learning Semantic Parsers

52

Computing the Most Probable Semantic Derivation contd.

• Let E*STATE,s[5.,9], denote the most probable partial derivation

among all ESTATE,s[5.,9]

• This is computed recursively as follows:

(STATE NEXT_TO(STATE), [5..9])

the states bordering Texas?

E*NEXT_TO,s[i..j] E*

STATE,s[i..j]

)))(])..[((maxarg(..1

,*

)],..[(),...,(..

]..[,*

1

1

tk

pn

tjispartitionpp

Gnnnjisn kk

t

t

EPjisPmakeTreeE

5 6 7 8 9

E*STATE,s[5..9]

Page 53: A Kernel-based Approach to Learning Semantic Parsers

53

Computing the Most Probable Semantic Derivation contd.

• Let E*STATE,s[5.,9], denote the most probable partial derivation

among all ESTATE,s[5.,9]

• This is computed recursively as follows:

(STATE NEXT_TO(STATE), [5..9])

the states bordering Texas?

E*STATE,s[i..j] E*

NEXT_TO,s[i..j]

)))(])..[((maxarg(..1

,*

)],..[(),...,(..

]..[,*

1

1

tk

pn

tjispartitionpp

Gnnnjisn kk

t

t

EPjisPmakeTreeE

5 6 7 8 9

E*STATE,s[5..9]

Page 54: A Kernel-based Approach to Learning Semantic Parsers

54

Computing the Most Probable Semantic Derivation contd.

• Implemented by extending Earley’s [1970] context-free grammar parsing algorithm

• Predicts subtrees top-down and completes them bottom-up

• Dynamic programming algorithm which generates and compactly stores each subtree once

• Extended because:– Probability of a production depends on which substring

of the sentence it covers

– Leaves are not terminals but substrings of words

Page 55: A Kernel-based Approach to Learning Semantic Parsers

55

Computing the Most Probable Semantic Derivation contd.

• Does a greedy approximation search, with beam width ω, and returns ω most probable derivations it finds

• Uses a threshold θ to prune low probability trees

Page 56: A Kernel-based Approach to Learning Semantic Parsers

56

Overview of KRISP

Train string-kernel-based SVM classifiers

Semantic Parser

Collect positive and negative examples

MRL Grammar

NL sentences with MRs

Novel NL sentences Best MRs

Best semantic derivations (correct and incorrect)

Training

Testing

Pπ(s[i..j])

Page 57: A Kernel-based Approach to Learning Semantic Parsers

57

KRISP’s Training Algorithm

• Takes NL sentences paired with their respective MRs as input

• Obtains MR parses • Proceeds in iterations• In the first iteration, for every production π:

– Call those sentences positives whose MR parses use that production

– Call the remaining sentences negatives

Page 58: A Kernel-based Approach to Learning Semantic Parsers

58

KRISP’s Training Algorithm contd.

STATE NEXT_TO(STATE)

•which rivers run through the states bordering texas?

•what is the most populated state bordering oklahoma ?

•what is the largest city in states that border california ?

•what state has the highest population ?

•what states does the delaware river run through ?

•which states have cities named austin ?

•what is the lowest point of the state with the largest area ?

Positives Negatives

PSTATENEXT_TO(STATE) (s[i..j])

String-kernel-based SVM classifier

First Iteration

Page 59: A Kernel-based Approach to Learning Semantic Parsers

59

KRISP’s Training Algorithm contd.

• Using these classifiers Pπ(s[i..j]), obtain the ω best semantic derivations of each training sentence

• Some of these derivations will give the correct MR, called correct derivations, some will give incorrect MRs, called incorrect derivations

• For the next iteration, collect positives from most probable correct derivation

• Collect negatives from incorrect derivations with higher probability than the most probable correct derivation

Page 60: A Kernel-based Approach to Learning Semantic Parsers

60

KRISP’s Training Algorithm contd.

Most probable correct derivation:

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4]) (STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

1 2 3 4 5 6 7 8 9

Page 61: A Kernel-based Approach to Learning Semantic Parsers

61

KRISP’s Training Algorithm contd.

Most probable correct derivation: Collect positive examples

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4]) (STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

1 2 3 4 5 6 7 8 9

Page 62: A Kernel-based Approach to Learning Semantic Parsers

62

KRISP’s Training Algorithm contd.

Incorrect derivation with probability greater than the most probable correct derivation:

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

1 2 3 4 5 6 7 8 9

Incorrect MR: answer(traverse_2(stateid(‘texas’)))

Page 63: A Kernel-based Approach to Learning Semantic Parsers

63

KRISP’s Training Algorithm contd.

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’,[8..9])

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4])

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

Most ProbableCorrect derivation:

Incorrect derivation:

Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

Page 64: A Kernel-based Approach to Learning Semantic Parsers

64

KRISP’s Training Algorithm contd.

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’,[8..9])

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4])

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

Most ProbableCorrect derivation:

Incorrect derivation:

Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

Page 65: A Kernel-based Approach to Learning Semantic Parsers

65

KRISP’s Training Algorithm contd.

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’,[8..9])

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4])

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

Most ProbableCorrect derivation:

Incorrect derivation:

Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

Page 66: A Kernel-based Approach to Learning Semantic Parsers

66

KRISP’s Training Algorithm contd.

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’,[8..9])

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4])

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

Most ProbableCorrect derivation:

Incorrect derivation:

Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

Page 67: A Kernel-based Approach to Learning Semantic Parsers

67

KRISP’s Training Algorithm contd.

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’,[8..9])

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4])

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

Most ProbableCorrect derivation:

Incorrect derivation:

Traverse both trees in breadth-first order till the first nodes where their productions differ are found.

Page 68: A Kernel-based Approach to Learning Semantic Parsers

68

KRISP’s Training Algorithm contd.

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’,[8..9])

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4])

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

Most ProbableCorrect derivation:

Incorrect derivation:

Mark the words under these nodes.

Page 69: A Kernel-based Approach to Learning Semantic Parsers

69

KRISP’s Training Algorithm contd.

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’,[8..9])

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4])

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

Most ProbableCorrect derivation:

Incorrect derivation:

Mark the words under these nodes.

Page 70: A Kernel-based Approach to Learning Semantic Parsers

70

KRISP’s Training Algorithm contd.

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’,[8..9])

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4])

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

Most ProbableCorrect derivation:

Incorrect derivation:

Consider all the productions covering the marked words.Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.

Page 71: A Kernel-based Approach to Learning Semantic Parsers

71

KRISP’s Training Algorithm contd.

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..7])

(STATE STATEID, [8..9])

(STATEID ‘texas’,[8..9])

Which rivers run through the states bordering Texas?

(ANSWER answer(RIVER), [1..9])

(RIVER TRAVERSE_2(STATE), [1..9])

(TRAVERSE_2 traverse_2, [1..4])

(STATE NEXT_TO(STATE), [5..9])

(STATE STATEID, [8..9])

(STATEID ‘texas’, [8..9])

(NEXT_TO next_to, [5..7])

Most ProbableCorrect derivation:

Incorrect derivation:

Consider the productions covering the marked words.Collect negatives for productions which cover any marked word in incorrect derivation but not in the correct derivation.

Page 72: A Kernel-based Approach to Learning Semantic Parsers

72

KRISP’s Training Algorithm contd.

STATE NEXT_TO(STATE)

•the states bordering texas?

•state bordering oklahoma ?

•states that border california ?

•states which share border

•next to state of iowa

•what state has the highest population ?

•what states does the delaware river run through ?

•which states have cities named austin ?

•what is the lowest point of the state with the largest area ?

•which rivers run through states bordering

Positives Negatives

PSTATENEXT_TO(STATE) (s[i..j])

String-kernel-based SVM classifier

Next Iteration

Page 73: A Kernel-based Approach to Learning Semantic Parsers

73

KRISP’s Training Algorithm contd.

• In the next iteration, SVM classifiers are trained with the new positive examples and the accumulated negative examples

• Iterate specified number of times

Page 74: A Kernel-based Approach to Learning Semantic Parsers

74

Experimental Corpora

• CLang – 300 randomly selected pieces of coaching advice from the log files

of the 2003 RoboCup Coach Competition

– 22.52 words on average in NL sentences

– 13.42 tokens on average in MRs

• Geo250 [Zelle & Mooney, 1996] – 250 queries for the given U.S. geography database

– 6.76 words on average in NL sentences

– 6.20 tokens on average in MRs

• Geo880 [Tang & Mooney, 2001]– Superset of Geo250 with 880 queries

– 7.48 words on average in NL sentences

– 6.47 tokens on average in MRs

Page 75: A Kernel-based Approach to Learning Semantic Parsers

75

Experimental Methodology

• Evaluated using standard 10-fold cross validation• Correctness

– CLang: output exactly matches the correct representation

– Geoquery: the resulting query retrieves the same answer as the correct representation

• Metrics

MRsoutputcompletewithsentencestestofNumber

MRscorrectofNumberPrecision

sentencestestofNumber

MRscorrectofNumberRecall

Page 76: A Kernel-based Approach to Learning Semantic Parsers

76

Experimental Methodology contd.

• Compared Systems:– SILT [Kate, Wong & Mooney, 2005]– WASP [Wong, 2005]– SCISSOR [Ge & Mooney, 2005]– CHILL

• COCKTAIL ILP algorithm [Tang & Mooney, 2001]

– Zettlemoyer & Collins (2005)• Different Experimental Setup (600 training, 280 testing

examples)• Results available only for Geo880 corpus

– Geobase• Hand-built NL interface [Borland International, 1988]• Results available only for Geo250

Page 77: A Kernel-based Approach to Learning Semantic Parsers

77

Experimental Methodology contd.

• KRISP gives probabilities for its semantic derivation which are taken as confidences of the MRs

• We plot precision-recall curves by first sorting the best MR for each sentence by confidences and then finding precision for every recall value

• WASP and SCISSOR also output confidences so we show their precision-recall curves

• Results of other systems shown as points on precision-recall graphs

Page 78: A Kernel-based Approach to Learning Semantic Parsers

78

Results on CLang

CHILL gives 49.2% precision and 12.67% recall with 160 examples, can’t run beyond.

requires more annotation on corpus

Page 79: A Kernel-based Approach to Learning Semantic Parsers

79

Results on Geo250

Page 80: A Kernel-based Approach to Learning Semantic Parsers

80

Results on Geo880

Page 81: A Kernel-based Approach to Learning Semantic Parsers

81

Results on Multilingual Geo250

• We have Geo250 corpus translated into Japanese, Spanish and Turkish

• KRISP is directly applicable to other languages

Page 82: A Kernel-based Approach to Learning Semantic Parsers

82

Results on Multilingual Geo250

Page 83: A Kernel-based Approach to Learning Semantic Parsers

83

Outline

• Semantic Parsing

• Related Work

• Background on Kernel-based Methods

• Completed Research

• Proposed Research– Short-term

– Long term

• Conclusions

Page 84: A Kernel-based Approach to Learning Semantic Parsers

84

Short Term: Exploiting Natural Language Syntax

• KRISP currently uses only word order of the sentence

• Semantic interpretation depends largely on NL syntax, exploiting it should help semantic parsing

• We already have syntactic annotations on our corpora, used in SILT-tree and SCISSOR

• Existing syntactic parsers can be trained on our corpora in addition to WSJ [Bikel, 2004]

Page 85: A Kernel-based Approach to Learning Semantic Parsers

85

Exploiting Natural Language Syntax contd.

• Most natural extension of KRISP is to use syntactic-tree kernel instead of string kernel

• Syntactic-tree kernel– Introduced by Collins & Duffy [2001]

– K(x,y) = Number of subtrees common between x & yNP

NP PP

JJ NN

left side

IN NP

of DT NN

the midfield

NP

NP PP

JJ NN

left side

IN NP

of PRP$ NN NN

our penalty areaK(x,y) = ?

Page 86: A Kernel-based Approach to Learning Semantic Parsers

86

Exploiting Natural Language Syntax contd.

• Most natural extension of KRISP is to use syntactic-tree kernel instead of string kernel

• Syntactic-tree kernel– Introduced by Collins & Duffy [2001]

– K(x,y) = Number of subtrees common between x & yNP

NP PP

JJ NN

left side

IN NP

of DT NN

the midfield

NP

NP PP

JJ NN

left side

IN NP

of PRP$ NN NN

our penalty areaK(x,y) = 1+?

Page 87: A Kernel-based Approach to Learning Semantic Parsers

87

Exploiting Natural Language Syntax contd.

• Most natural extension of KRISP is to use syntactic-tree kernel instead of string kernel

• Syntactic-tree kernel– Introduced by Collins & Duffy [2001]

– K(x,y) = Number of subtrees common between x & yNP

NP PP

JJ NN

left side

IN NP

of DT NN

the midfield

NP

NP PP

JJ NN

left side

IN NP

of PRP$ NN NN

our penalty areaK(x,y) = 2+?

Page 88: A Kernel-based Approach to Learning Semantic Parsers

88

Exploiting Natural Language Syntax contd.

• Most natural extension of KRISP is to use syntactic-tree kernel instead of string kernel

• Syntactic-tree kernel– Introduced by Collins & Duffy [2001]

– K(x,y) = Number of subtrees common between x & yNP

NP PP

JJ NN

left side

IN NP

of DT NN

the midfield

NP

NP PP

JJ NN

left side

IN NP

of PRP$ NN NN

our penalty areaK(x,y) = 3+?

Page 89: A Kernel-based Approach to Learning Semantic Parsers

89

Exploiting Natural Language Syntax contd.

• Most natural extension of KRISP is to use syntactic-tree kernel instead of string kernel

• Syntactic-tree kernel– Introduced by Collins & Duffy [2001]

– K(x,y) = Number of subtrees common between x & yNP

NP PP

JJ NN

left side

IN NP

of DT NN

the midfield

NP

NP PP

JJ NN

left side

IN NP

of PRP$ NN NN

our penalty areaK(x,y) = 8

Page 90: A Kernel-based Approach to Learning Semantic Parsers

90

Exploiting Natural Language Syntax contd.

• Often the syntactic information needed is present in dependency trees, full syntactic trees not necessary

• Dependency trees capture most important functional relationship between words

• Various dependency tree kernels have been used successfully for doing Information Extraction [Zelenko, Aone & Richardella, 2003], [Cumby & Roth, 2003], [Culotta & Sorenson, 2004], [Bunescu & Mooney, 2005a]

Page 91: A Kernel-based Approach to Learning Semantic Parsers

91

Short Term: Noisy NL Sentences

• If users are interacting with the semantic parser through speech then many ways noise can be present [Zue & Glass, 2000]– Speech recognition errors– Interjections (um’s and ah’s)– Environment noise (door slams, phone rings etc.)– Out-of-domain words and ill-formed utterances

• In KRISP, presence of extra words or corrupted words may decrease kernel values but won’t affect semantic parsing in a hard way

• KRISP is hence more robust to noise compared to systems with hard-matching rules like SILT and WASP, or systems doing complete syntactic-semantic parsing like SCISSOR

Page 92: A Kernel-based Approach to Learning Semantic Parsers

92

Noisy NL Sentences contd.

• We plan to do preliminary experiments by artificially corrupting our existing corpora

• Then we plan to get and experiment on some real world noisy corpus

Page 93: A Kernel-based Approach to Learning Semantic Parsers

93

Short Term: Committees of Semantic Parsers

System Correct CLang MRs out of 300

KRISP 178

WASP 185

SCISSOR 232

• Good indication that forming their committee will improve performance.

Committee Upper-bound on Correct MRs

KRISP+WASP 223

KRISP+SCISSOR 253

WASP+SCISSOR 246

KRISP+WASP+SCISSOR 259

Page 94: A Kernel-based Approach to Learning Semantic Parsers

94

Committees of Semantic Parsers contd.

Two general approaches to combine parse trees [Henderson & Brill, 1999]

• Parser Switching: Learn which parser works best on which types of sentences

• Parse Hybridization: Look into output MRs and combine their best components– Particularly useful when none of the parser generates

complete MRs

Prior work is specific to combining syntactic parses.We plan to explore these two general approaches for

combining MRs.

Page 95: A Kernel-based Approach to Learning Semantic Parsers

95

Long Term: Non-parallel Training Corpus

• Training data contained NL sentences aligned with their respective MRs

• In some domains many NL sentences and semantic MRs may be available but not aligned– For e.g. in RoboCup commentary task [Binsted et al., 2000], NL

sentences and symbolic description of events are available but not aligned

• Referential ambiguity: Which NL description refers to which symbolic description?

• In our present work we resolve which portion of the sentence refers to which production of MR parse

• Same approach could be extended to one level higher

Page 96: A Kernel-based Approach to Learning Semantic Parsers

96

Non-parallel Training Corpus contd.

• Let training corpus be {(Mi, Si)|i=1..N} where each Mi is a set of MRs and each Si is a set of NL sentences

• Align every MR in Mi to every NL sentence in Si for i=1..N

• Use KRISP’s training algorithm to learn classifiers

• Find best alignment for MRs and NL sentences in (Mi, Si) by semantic parsing using these classifiers

• Repeat till alignments don’t change

• We plan to first do preliminary experiments by artificially making our corpus non-parallel and then extracting the alignments then test on a real-world corpus

Page 97: A Kernel-based Approach to Learning Semantic Parsers

97

Long Term: Complex Relation Extraction

• Bunescu & Mooney [2005b] use string-based kernel to extract binary relation “protein-protein interaction” from text

• This can be viewed as learning for an MRL grammar with only one production

INTERACTION PROTEIN PROTEIN

• Complex relation is an n-ary relation among n typed entities [McDonald et al., 2005]– For example, (person, job, company) NL sentence: John Smith is the CEO of Inc. Corp.

Extraction: (John Smith, CEO, Inc. Corp.)

Page 98: A Kernel-based Approach to Learning Semantic Parsers

98

Complex Relation Extraction contd.

John Smith is the CEO of Inc. Corp.

(person, job) (job, company)

(person, job,company)

KRISP should be applicable to extract complex relations by treating it like higher level production composed of lower level productions.

Page 99: A Kernel-based Approach to Learning Semantic Parsers

99

Conclusions

• KRISP: A new kernel-based approach to learning semantic parser

• String-kernel-based SVM classifiers trained for each MRL production

• Classifiers used to compositionally build complete MRs of NL sentences

• Evaluated on two real-world corpora– Performs better than deterministic rule-based systems– Performs comparable to recent statistical systems

• Proposed work: exploit NL syntax, form committees and broaden application domains

Page 100: A Kernel-based Approach to Learning Semantic Parsers

100

Thank You!

Questions??

Page 101: A Kernel-based Approach to Learning Semantic Parsers

101

Extra: Dealing with Constants

• MRL grammar may contain productions corresponding to constants in the domain:STATEID ‘new york’ RIVERID ‘colorado’NUM ‘2’ STRING ‘DR4C10’

• User can specify these as constant productions giving their NL substrings

• Classifiers are not learned for these productions • Matching substring’s probability is taken as 1• If n constant productions have same substring then

each gets probability of 1/nSTATEID ‘colorado’ RIVERID ‘colorado’

Page 102: A Kernel-based Approach to Learning Semantic Parsers

102

Extra: Better String Subsequence Kernel

• Subsequences with gaps should be downweighted • Decay factor λ in the range of (0,1] penalizes gaps• All subsequences are the implicit features and

penalties are the feature values

s = “left side of our penalty area”

t = “our left penalty area”

u = left penalty

K(s,t) = 4+?

Page 103: A Kernel-based Approach to Learning Semantic Parsers

103

Extra: Better String Subsequence Kernel

• Subsequences with gaps should be downweighted • Decay factor λ in the range of (0,1] penalizes gaps• All subsequences are the implicit features and

penalties are the feature values

s = “left side of our penalty area”

t = “our left penalty area”

u = left penalty

K(s,t) = 4+λ3*λ0 +?

Gap of 3 => λ3

Gap of 0 => λ0

Page 104: A Kernel-based Approach to Learning Semantic Parsers

104

Extra: Better String Subsequence Kernel

• Subsequences with gaps should be downweighted • Decay factor λ in the range of (0,1] penalizes gaps• All subsequences are the implicit features and

penalties are the feature values

s = “left side of our penalty area”

t = “our left penalty area”

K(s,t) = 4+3λ+3 λ3+ λ5

Page 105: A Kernel-based Approach to Learning Semantic Parsers

105

Extra: KRISP’s Training Algorithm contd.

• What if none of the ω most probable derivations of a sentence is correct?

• Extended Earley’s algorithm can be forced to derive only the correct derivations by making sure all subtrees it generates exist in the correct MR parse

Page 106: A Kernel-based Approach to Learning Semantic Parsers

106

Extra: N-best MRs for Geo880

Page 107: A Kernel-based Approach to Learning Semantic Parsers

107

Extra: KRISP’s Average Running Times

Corpus Average Training Time (minutes)

Average Testing Time (minutes)

Geo250 1.44 0.05

Geo880 18.1 0.65

CLang 58.85 3.18

Average running times per fold in minutes taken by KRISP.

Page 108: A Kernel-based Approach to Learning Semantic Parsers

108

Extra: KRISP’s Learning PR Curves on CLang

Page 109: A Kernel-based Approach to Learning Semantic Parsers

109

Extra: KRISP’s Learning PR Curves on Geo250

Page 110: A Kernel-based Approach to Learning Semantic Parsers

110

Extra: KRISP’s Learning PR Curves on Geo880

Page 111: A Kernel-based Approach to Learning Semantic Parsers

111

((bpos (penalty-area opp))

(do (player-except our{4}) (pos (half our)))

Extra: Experimental Methodology

• Correctness– CLang: output exactly matches the correct

representation

– Geoquery: the resulting query retrieves the same answer as the correct representation

If the ball is in our penalty area, all our players except player 4 should stay in our half.

((bpos (penalty-area our))

(do (player-except our{4}) (pos (half our)))

Correct:

Output:

Page 112: A Kernel-based Approach to Learning Semantic Parsers

112

Extra: Formal Language Grammar

NL: If our player 4 has the ball, our player 4 should shoot.CLang: ((bowner our {4}) (do our {4} shoot)) CLang Parse:

• Non-terminals: RULE, CONDITION, ACTION…• Terminals: bowner, our, 4…• Productions: RULE CONDITION DIRECTIVE DIRECTIVE do TEAM UNUM ACTION ACTION shoot

RULE

CONDITION DIRECTIVE

do TEAM UNUM ACTIONbowner TEAM UNUM

our 4 our 4 shoot


Recommended