+ All Categories
Home > Documents > The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Date post: 21-Mar-2016
Category:
Upload: hashim
View: 40 times
Download: 4 times
Share this document with a friend
Description:
The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection. Kristina Toutanova, Penka Markova, Christopher Manning Computer Science Department Stanford University. Motivation: the task. “ I would like to meet with you again on Monday ”. - PowerPoint PPT Presentation
42
The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection Kristina Toutanova, Penka Markova, Christopher Manning Computer Science Department Stanford University
Transcript
Page 1: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The Leaf Projection Path View of Parse Trees: Exploring

String Kernels for HPSG Parse Selection

Kristina Toutanova, Penka Markova,Christopher Manning

Computer Science DepartmentStanford University

Page 2: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Motivation: the task“I would like to meet with you again on

Monday” Input: a sentence

Classify to one of the possible parses

focus on discriminating among parses

Page 3: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Motivation: traditional representation of parse trees

Features are pieces of local rule productions with grand-parenting

When using plain context free rules most features make no reference to the input string – naive for a discriminative model!

Lexicalization with the head word introduces more connection to the input

meetmeet

to

on

Page 4: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Motivation: traditional representation of parse trees

All subtrees representation: features are (a restricted kind) of subtrees of the original tree

must choose features or discount larger trees

Page 5: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

General idea: representation

Provides broader view of tree contexts Increases connection to the input string (words) Captures examples of non-head dependencies like in “more

careful than his sister” (Bod 98)

Trees are lists of leaf projection paths

Non-head path is included in addition to the head path

Each node is lexicalized with all words dominated by it

Trees must be binarized

Page 6: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

General idea: tree kernels Often only a kernel (a similarity measure) between trees is

necessary for ML algorithms.

Measure the similarity between trees by the similarity between projection paths of common words/pos tags in the trees.

Page 7: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

General idea: tree kernels from string kernels

Measures of similarity between sequences (strings) have been developed for many domains.

use string kernels between projection paths and combine them into a tree kernel via a convolution

this gives rise to interesting features and more global modeling of the syntactic environment of words

SVPVP-NFVPVP-NFVP-NFVP-NFVP-NFmeet

SVPVP-NFVP-NFVP-NFVPVP-NFVP-NFmeet

SIM

Page 8: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Overview HPSG syntactic analyses representation Illustration of the leaf projection paths

representation Comparison to traditional rule

representation experimental results

Tree kernels from string kernels on projection paths

Experimental results

Page 9: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

HPSG tree representation: derivation trees

THAT_DEIX

IMPER

HCOMP

HCOMP HCOMP

LET_V1 US

let us

PLAN_ON_V2

plan

HCOMP

ON

onthat

HPSG – Head Driven Phrase Structure Grammar; lexicalized unification based grammarERG grammar of English Node labels are rule names such as head-complement and head-adjunctThe inventory of rules is larger than in traditional HPSG grammarsFull HPSG signs can be recovered from the derivation trees using the grammar We use annotated derivation trees as the main representation for disambiguation

Page 10: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

HPSG tree representation: annotation of nodes

THAT_DEIX

IMPER

HCOMP

HCOMP HCOMP

LET_V1 US

let us

PLAN_ON_V2

plan

HCOMP

ON

onthat

Annotation with the value of synsem.local.cat.headIts values are a small set of part-of-speech tags: verb

: verb

: verb : verb

: prep*

Page 11: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

HPSG tree representation: syntactic word classes

The word classes are around 500 types in the HPSG type hierarchy. They show detailed syntactic information including e.g. subcategorization.

p_reg n_deictic_pro

Our representation heavily uses word classes to backoff from words

LET_V1 US

let us

PLAN_ON_V2

plan

ON THAT_DEIX

on thatv_sorb

v_empty_prep_intrans

n_pers_pro

word types lexical item ids

Page 12: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Leaf projection paths representation

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let usPLAN_ON_V2

planHCOMP

ON

on that

: verb: verb

: verb : verb: prep*

v_empty_prep_ intrans p_reg

n_deictic_pro

n_pers_prov_sorb

•The tree is represented as a list of paths from the words to the top. •The paths are keyed by words and corresponding word classes.•The head and non-head paths are treated separately.

LET_V1 verb

HCOMP: verb

IMPER: verb

HCOMP: verb

END

letv_sorb

START

letv_sorb

START

END

Page 13: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Leaf projection paths representation

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let usPLAN_ON

planHCOMP

ON

on that

: verb: verb

: verb : verb: prep*

v_empty_prep_ intransp_reg

n_deictic_pro

n_pers_pro

v_sorb

•The tree is represented as a list of paths from the words to the top. •The paths are keyed by words and corresponding word classes. •The head and non-head paths are treated separately.

PLAN_ON: verb

HCOMP: verbEND

planv_empty_prep_intrans

START

planv_empty_prep_intrans

START

HCOMP: verb

IMPER: verb

END

Page 14: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Leaf projection paths representation

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let usPLAN_ON

planHCOMP

ON

on that

: verb: verb

: verb : verb: prep*

v_empty_prep_ intrans p_reg

n_deictic_pro

n_pers_pro

v_sorb

Can recover local rules by annotation of nodes with sister and parent categoriesNow extract features from this representation for discriminative models

Page 15: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Overview HPSG syntactic analyses representation Illustration of the leaf projection paths

representation Comparison to traditional rule

representation experimental results

Tree kernels from string kernels on projection paths

Experimental Results

Page 16: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Machine learning task setupGiven m training sentences

Sentence si has pi possible analyses and ti,1 is the correct analysis

Learn a parameter vector and choose for a test sentence the tree t with the maximum score

))(),...,(,( ,1, ipiii tts

)(. tw

w

Linear Models e.g. (Collins 00)

Page 17: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Choosing the parameter vector

Previous formulations (Collins 01, Shen and Joshi 03)

We solve this problem using SVMLight for ranking For all models we extract all features from the

kernel’s feature map and solve the problem with a linear kernel

0:1

1))()((:121 min

,

,,1,

,,

ji

jijii

jiji

ji

ttwji

Cww

Page 18: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

Goals: Compare context free rule models to projection

path models Evaluate the usefulness of non-head paths

Models Projection paths:

Bi-gram model on projection paths (2PP) Bi-gram model on head projection paths only

(2HeadPP) Context free rules:

Joint rule model (J-Rule) Independent rule model (I-Rule)

Page 19: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

2PP has as features bi-grams from the projection paths. Features of 2PP including the node HCOMP

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let us

PLAN_ON_V2

plan

HCOMP

ON

onthat

: verb

: verb

: verb

: prep*

v_empty_prep_ intrans

p_regn_deictic_pro

n_pers_prov_sorb

:verb

plan (head path) [v_empty_prep_intrans,PLAN_ON_V2,HCOMP,head][v_empty_prep_intrans,HCOMP,END,head]

Page 20: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

2PP has as features bi-grams from the projection paths. Features of 2PP including the node HCOMP

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let us

PLAN_ON_V2

plan

HCOMP

ON

onthat

: verb

: verb

: verb

: prep*

v_empty_prep_ intrans

p_regn_deictic_pro

n_pers_prov_sorb

:verb

plan (head path) [v_empty_prep_intrans,PLAN_ON_V2,HCOMP,head][v_empty_prep_intrans,HCOMP,END,head]

Page 21: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

2PP has as features bi-grams from the projection paths. Features of 2PP including the node HCOMP

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let us

PLAN_ON_V2

plan

HCOMP

ON

onthat

: verb

: verb

: verb

: prep*

v_empty_prep_ intrans

p_regn_deictic_pro

n_pers_prov_sorb

:verb

on (non-head path)[p_reg,START,HCOMP,non-head][p_reg,HCOMP,HCOMP,non-head]

Page 22: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

2PP has as features bi-grams from the projection paths. Features of 2PP including the node HCOMP

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let us

PLAN_ON_V2

plan

HCOMP

ON

onthat

: verb

: verb

: verb

: prep*

v_empty_prep_ intrans

p_regn_deictic_pro

n_pers_prov_sorb

:verb

on (non-head path)[p_reg,START,HCOMP,non-head][p_reg,HCOMP,HCOMP,non-head]

Page 23: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

2PP has as features bi-grams from the projection paths. Features of 2PP including the node HCOMP

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let us

PLAN_ON_V2

plan

HCOMP

ON

onthat

: verb

: verb

: verb

: prep*

v_empty_prep_ intrans

p_regn_deictic_pro

n_pers_prov_sorb

:verb

that (non-head path)[n_deictic_pro,HCOMP,HCOMP,non-head][n_deictic_pro,HCOMP,HCOMP,non-head]

Page 24: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

2PP has as features bi-grams from the projection paths. Features of 2PP including the node HCOMP

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let us

PLAN_ON_V2

plan

HCOMP

ON

onthat

: verb

: verb

: verb

: prep*

v_empty_prep_ intrans

p_regn_deictic_pro

n_pers_prov_sorb

:verb

that (non-head path)[n_deictic_pro,HCOMP,HCOMP,non-head][n_deictic_pro,HCOMP,HCOMP,non-head]

Page 25: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

I-Rule has as features edges of the tree, annotated with the word class of the child and head vs. non-head information

Features of I-Rule including the node HCOMP

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let usPLAN_ON_V2

planHCOMP

ON

on that

: verb: verb

: verb : verb: prep*

v_empty_prep_ intrans p_reg

n_deictic_pro

n_pers_prov_sorb

[v_empty_prep_intrans,PLAN_ON_V2,HCOMP,head]

Page 26: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

I-Rule has as features edges of the tree, annotated with the word class of the child and head vs. non-head information

Features of I-Rule including the node HCOMP

[p_reg,HCOMP,HCOMP,non-head]

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let usPLAN_ON_V2

planHCOMP

ON

on that

: verb: verb

: verb : verb: prep*

v_empty_prep_ intrans p_reg

n_deictic_pro

n_pers_prov_sorb

Page 27: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The leaf projection paths view versus the context free rule view

I-Rule has as features edges of the tree, annotated with the word class of the child and head vs. non-head information

Features of I-Rule including the node HCOMP

[v_empty_prep_intrans,HCOMP,HCOMP,non-head]

THAT_DEIX

IMPERHCOMP

HCOMP HCOMPLET_V1 US

let usPLAN_ON_V2

planHCOMP

ON

on that

: verb: verb

: verb : verb: prep*

v_empty_prep_ intrans p_reg

n_deictic_pro

n_pers_prov_sorb

Page 28: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Comparison results Redwoods corpus

3829 ambiguous sentences; average number of words 7.8 average ambiguity 10.810-fold cross-validation ; report exact match accuracy

80.14

80.99 81.07

82.70

78.0

79.0

80.0

81.0

82.0

83.0

Model

Accu

racy

2HeadPP J-Rule I-Rule 2PP

Non-head paths are useful (13% relative error reduction from head only)The bi-gram model on projection paths performs better than a very similar local rule based model

Page 29: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Overview HPSG syntactic analyses representation Illustration of the leaf projection paths

representation Comparison to traditional rule

representation experimental results

Tree kernels from string kernels on projection paths

Experimental Results

Page 30: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

String kernels on projection paths

We looked at a bi-gram model on projection paths (2PP).

This is a special case of a string kernel (n-gram kernel).

We could use more general string kernels on projection paths --- existing ones, that handle non-contiguous substrings or more complex matching of nodes.

It is straightforward to combine them into tree kernels.

Page 31: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Formal representation of parse trees

)],(),..,,[( 11 mm xkeyxkey

key1=let (head)

X1=“START LET_V1:verb HCOMP:verb HCOMP:verb IMPER:verb END”

key2=v_sorb(head) X2 = X1

key3=let (non-head)

X3=“START END”

key4=v_sorb(non-head) X4 = X3

LET_V1 verbHCOMP: verb

IMPER: verb

HCOMP: verb

END

letv_sorb

STARTletv_sorb

STARTEND

t

Page 32: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Tree kernels using string kernels on projection paths

)],(),..,,[( 11 mm xkeyxkey

)]','(),..,','[( 11 nn xkeyxkeyt’

m

i

n

jjjii xkeyxkeyKPttKT

1 1

))','(),,(()',(

t

otherwise ,0)),(),,(( if ,),()),(),,((

xykexkeyKPykekeyxxKxykexkeyKP

Page 33: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

String kernels overview Define string kernels by their feature map

from strings to vectors indexed by feature indicesExample: 1-gram kernel

LET_V1HCOMP

IMPER

HCOMP

END

START

1 ,1 ,2 ,1 ,1

1_

STARTVLET

HCOMPIMEREND

Page 34: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Repetition kernel General idea: Improve on the 1-gram kernel by

better handling repeated symbols.

He eats chocolate from Belgium with fingers .head path of eats when high attachment – (NP PP PP

NP)Rather than the feature for PP having twice as much

weight, there should be a separate feature indicating that there are two PPs.

The feature space is indexed by strings Two discount factors for gaps and for letters

PPPPNPNP

aaa ,...1 2

5. if

5.),,,( ,1),,,(

21

,

NPPPPPNPNPPPPPNP PPPPPP

Page 35: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

The Repetition kernel versus 1-gram and 2-gram

1-gram 44,278 features

Repetition 52,994 features

2-gram 104,331 features 84.15

83.59

82.21

81 82 83 84 85 86

Repetition achieves 7.8% error reduction from 1-gram

Page 36: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Other string kernels So far: 1-gram,2-gram, repetition Next: allow general discontinuous n-grams

restricted subsequence kernel Also: allow partial matching

wildcard kernel allowing a wild-card character in the n-gram features; the wildcard matches any character

Lodhi et al. 02; Leslie and Kuang 03

Page 37: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Restricted subsequence kernel

Has parameters k – maximum size of the feature n-gram; g – maximum span in the string; λ1 - gap penalty and λ2 - letter - penalty λ2

when k=2,g=5, λ1 =.5, λ2 =1

LET_V1

HCOMP

IMPER

HCOMP

END

START

0

,...125.,5.15.1,1

1 ,1,2 ,1 ,1

,

,,,

1_

STARTEND

STARTIMPERHCOMPIMPERIMPEREND

STARTVLETHCOMPIMEREND

Page 38: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Varying the string kernels on word class keyed paths

81 82 83 84 85 86

1-gram (13K) 81.432-gram (37K) 82.70

81 82 83 84 85 86

subseq (2,3,.50,2) (81K) 83.22subseq (2,3,.25,2) (81K) 83.48subseq (2,4,. 5,2) (102K) 83.29subseq (3,5,.25,2)(416K) 83.06

Page 39: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Varying the string kernels on word class keyed paths

81 82 83 84 85 86

1-gram (13K) 81.432-gram (37K) 82.70

81 82 83 84 85 86

subseq (2,3,.50,2) (81K) 83.22subseq (2,3,.25,2) (81K) 83.48subseq (2,4,.50,2) (102K) 83.29subseq (3,5,.25,2) (416K) 83.06

Increasing the amount of discontinuity or adding larger n-gram did not help

Page 40: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Adding word keyed paths

83.22 83.48 83.29

84.96 84.7584.4

81.0

82.0

83.0

84.0

85.0

86.0

subseq(2,3,.5,2)

subseq(2,3,.25,2)

subseq(2,4,.5,2)

word classes word classes+words

Best previous result from a single classifier 82.7 (mostly local rule based). Relative error reduction is 13%

Fixed the kernel for word keyed paths to 2-gram+repetition

Page 41: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Other models and model combination

Many features are available in the HPSG signs. A single model is likely to over-fit when given too many

features. To better use the additional information, train several

classifiers and combine them by voting

85.484.96

81.0

82.0

83.0

84.0

85.0

86.0best single model model combination

Best previous result from voting classifiers is 84.23% (Osborne & Balbridge 04)

Page 42: The Leaf Projection Path View of Parse Trees: Exploring String Kernels for HPSG Parse Selection

Conclusions and future work

Summary We presented a new representation of parse trees

leading to a tree kernel It allows the modeling of more global tree contexts as well

as greater lexicalization We demonstrated gains from applying existing

string kernels on projection paths and new kernels useful for the domain (Repetition kernel)

The major gains were due to the representationFuture Work Other sequence kernels better suited for the task Feature selection: which words / word classes

deserve better modeling of their leaf paths Other corpora


Recommended