+ All Categories
Home > Documents > Support Vector Machines and Kernel Methods for Robust...

Support Vector Machines and Kernel Methods for Robust...

Date post: 10-May-2020
Category:
Upload: others
View: 6 times
Download: 0 times
Share this document with a friend
64
Support Vector Machines and Kernel Methods for Robust Semantic NL Processing Roberto Basili (1) , Alessandro Moschitti (2) (1) DISP, Università di Roma, Tor vergata, (2) Università Trento
Transcript
Page 1: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Support Vector Machines and Kernel Methods for Robust

Semantic NL Processing

Roberto Basili(1), Alessandro Moschitti(2)

(1) DISP, Università di Roma, Tor vergata,(2) Università Trento

Page 2: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Overview

• Theory and practice of Support VectorMachines

• Kernel for HLTs

– Tree Kernels

• Semantic Role Labeling

– Linear Features

– The role of Syntax

Page 3: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Predicate and Arguments

Predicate

Arg. 0

Arg. M

S

N

NP

D N

VP

V Paul

in

gives

a lecture

PP

IN N

Rome

Arg. 1

• The syntax-semantic mapping

• Different semantic annotations (e.g. PropBank vs. FrameNet)

Page 4: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Linking syntax to semantics

S

N

NP

Det N

VP

VPolice

for

arrested

the man

PP

IN N

shoplifting

Authority

Suspect Offense

Arrest

• Police arrested the man for shoplifting

Page 5: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Semantics in NLP: Resources

• Lexicalized Models

– Propbank

– NomBank

• Framenet

– Inspired by frame semantics

– Frames are lexicalized prototoypes for real -world situations

– Participants are called frame elements (roles)

Page 6: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Generative vs. Discriminative Learning in NLP

• Generative models (e.g. HMMs) require– The design of a model of visible and hidden variables

– The definition of laws of association between hidden and visible variables

– Robust estimation methods from the available samples

• Limitations:– Lack of precise generative models for language

phenomena

– Data sparseness: most of the language phenomena are simply too rare for robust estimation even in largesamples

Page 7: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Generative vs. Discriminative Learning

• Discriminative models are not tight to any model (i.e. specific association among the problem variables).

• They learn to discriminate negative from positive evidence without building an explicit model of the target property

• They derive useful evidence from training data only through observed individual features by optimizing some function of the recognition task (e.g. error)

• Examples of discriminative models are the perceptrons (i.e. linear classifiers)

Page 8: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

An hyperplane has equation :

is the vector of the instance to be classified

is the hyperplane gradient

Classification function:

Linear Classifiers (1)

bwxbwxxf n ,, ,)(

x

w

( ) sign( ( ))h x f x

Page 9: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Linear Classifiers (2)

• Computationally simple.

• Basic idea: select an hypothesis that makes no mistake over training-set.

• The separating function is equivalent to a neural net with just one neuron (perceptron)

Page 10: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

A neuron

Page 11: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Perceptron

bxwx i

ni

i

..1

sgn)(

Page 12: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Geometric Margin

Page 13: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Geometrical margin Training set margin

x

x

i

x ix

x

x

x

x

x

xj

o j

ooo

o

o

o

oo o

o

Geometric margin vs. data points in the training set

Page 14: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Maximal margin vs other margins

Page 15: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Perceptron: on-line algorithm

),(k,return

found iserror no until

endfor

endif

1kk

then0)( if

to1 ifor

Repeat

| || |max;0;0;0

2

1

1

100

kk

ikk

iikk

kiki

ili

bw

Rybb

xyww

bxwy

xRkbw

Classification Error

adjustments

Page 16: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Perceptron: the hyperplane coefficents

Page 17: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

The on-line perceptron algorithm: mistakes and adjustments (1)

Page 18: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

The on-line perceptron algorithm: mistakes and adjustments (2)

Page 19: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

The decision function of linear classifiers can be written as follows:

as well the adjustment function

The learning rate impacts only in the re-scaling of the hyperplanes, and does not influence the algorithm ( )

Training data only appear in the scalar products!!

1.

Duality

)sgn(

)sgn()sgn()(

..1

..1

bxxy

bxxybxwxh

jj

i

j

jj

j

j

iiijj

j

jibxxyy then 0)( if

..1

Page 20: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Which hyperplane?

Page 21: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Maximum Margin Hyperplanes

Var1

Var2

Margin

Margin

IDEA: Select the

hyperplane that

maximizes the margin

Page 22: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Support Vectors

Var1

Var2

Margin

Support vectors

Page 23: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

How to get the maximum margin?

Var1

Var2kbxw

kbxw

0 bxw

kk

w

The geometric margin

is:2 k

w

Optimization problem

ex. negativ a is se ,

ex. positive a is if ,

||||

2

xkbxw

xkbxw

w

kMAX

Page 24: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

The optimization problem

• The optimal hyperplane satyisfies:

– Minimize

– Under:

• The dual problem is simpler

libxwy

ww

ii ,...,1,1))((

2

1)(

2

Page 25: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Dual optimization problem

Page 26: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Some consequences

• Lagrange constraints:

• Karush-Kuhn-Tucker constraints

• The support vector are having not null , i.e. such that

They lie on the frontier

• b is derived through the following formula

ii

l

i

ii

l

i

i xywya

11

0

libwxy iii ,...,1,0]1)([

i

1)( bwxyii

i

x

Page 27: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Non linearly separable training data

Var1

Var21w x b

1w x b

0 bxw

11

w

iSlack variables

are introduced

Mistakes are

allowed and

optimization function

is penalized

i

Page 28: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Soft Margin SVMs

Var1

Var21w x b

1w x b

0 bxw

11

w

i

New constraints:

Objective function:

C is the trade-off

between

margin and errors

i i

Cw 2||||2

1min

0

1)(

i

iiii xbxwy

Page 29: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Dual optimization problem

Page 30: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Soft Margin Support Vector Machines

• The algorithm tries to keep i to 0 and maximize the margin

• OBS: The algorithm does not minimize the number oferrors (NP-complete problem) but just minimize the sum of the distances from the hyperplane

• If C, the solution is the one with the hard-margin

• If C = 0 we get =0. Infact it is always possible tosatisfy

• When C grows the number of errors is decreased withthe error set to 0, when C (i.e. the hard-marginformulation)

i iCw 2||||

2

1min

0

1)(

i

iiii xbxwy

|||| w

iii xby

1

Page 31: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Robustness: Soft vs Hard Margin SVMs

i

Var1

Var2

0 bxw

i

Var1

Var20 bxw

Soft Margin SVM Hard Margin SVM

Page 32: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Soft vs Hard Margin SVMs

• A Soft-Margin SVM has always a solution

• A Soft-Margin SVM is more robust wrt odd training examples

– Insufficient Vocabularies

– High ambiguity of linguistic features

• An Hard-Margin SVM requires no parameter

Page 33: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

References

• Basili, R., A. Moschitti Automatic Text Categorization: From Information Retrieval to Support Vector Learning , Aracne Editrice, Informatica, ISBN: 88-548-0292-1, 2005

• A tutorial on Support Vector Machines for Pattern Recognition (C.J.Burges ) – URL: http://www.umiacs.umd.edu/~joseph/support-vector-machines4.pdf

• The Vapnik-Chervonenkis Dimension and the Learning Capability of Neural Nets (E.D: Sontag)

– URL: http://www.math.rutgers.edu/~sontag/FTP_DIR/vc-expo.pdf

• Computational Learning Theory(Sally A Goldman Washington University St. Louis Missouri)– http://www.learningtheory.org/articles/COLTSurveyArticle.ps

• AN INTRODUCTION TO SUPPORT VECTOR MACHINES (and other kernel-based learning methods), N. Cristianini and J. Shawe-Taylor Cambridge University Press.

• The Nature of Statistical Learning Theory, V. N. Vapnik - Springer Verlag (December, 1999)

Page 34: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.
Page 35: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Semantic Role Labeling @ UTV

• An important application of SVM is Semantic Role labeling wrt Propbank or Framenet

• In the UTV system, a cascade of classification steps is applied:

– Predicate detection

– Boundary recognition

– Argument categorization (Local models)

– Reranking (Joint models)

• Input: a sentence and its parse trees

Page 36: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Linking syntax to semantics

S

N

NP

Det N

VP

VPolice

for

arrested

the man

PP

IN N

shoplifting

Authority

Suspect Offense

Arrest

• Police arrested the man for shoplifting

Page 37: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Motivations

• Modeling syntax in Natural Language learning task is complex, e.g.– Semantic role relations within predicate argument structures

and

– Question Classification

• Tree kernels are natural way to exploit syntactic information from sentence parse trees– useful to engineer novel and complex features.

• How do different tree kernels impact on different parsing paradigms and different tasks?

• Are they efficient in practical applications?

Page 38: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Tree kernels: Outline

• Tree kernel types– Subset (SST) Tree kernel

– The Partial Tree kernel

Page 39: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

The Collins and Duffy’s Tree Kernel (called SST in [Vishwanathan and Smola, 2002] )

NP

D N

VP

V

gives

a talk

NP

D N

VP

V

gives

a

NP

D N

VP

V

gives

NP

D N

VP

V NP

VP

V

Page 40: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

The overall fragment set

NP

D N

a talk

NP

D N

NP

D N

a

D N

a talk

NP

D N NP

D N

VP

V

gives

a talk

V

gives

NP

D N

VP

V

a talk

NP

D N

VP

V

NP

D N

VP

V

a

NP

D

VP

V

talk

N

a

NP

D N

VP

V

gives

talk

NP

D N

VP

V

gives NP

D N

VP

V

gives

NP

VP

V NP

VP

V

gives

talk

Page 41: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Explicit feature space

21 xx

..,0)..,0,..,1, .,1,.,1,..,0,. ..,0,..,0,..,1, ..,1,..,1,..,0, 0,(x

• counts the number of common substructures

NP

D N

a talk

NP

D N

a

NP

D N NP

D N

VP

V

gives

a talk

NP

D N

VP

V

a talk

NP

D N

VP

V

talk

Page 42: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Implicit Representation

2211

),(

),()()(

21

212121

TnTn

nn

TTKTTxx

Page 43: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Implicit Representation

• [Collins and Duffy, ACL 2002] evaluate in O(n2):

)(

1

2121

21

21

1

))),(),,((1(),(

,1),(

,0),(

nnc

j

jnchjnchnn

nn

nn

else terminals-pre if

elsedifferent are sproduction the if

2211

),(

),()()(

21

212121

TnTn

nn

TTKTTxx

Page 44: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Weighting

• Normalization

)(

1

2121

21

1

))),(),,((1(),(

,),(nnc

j

jnchjnchnn

nn

else terminals-pre if

),(),(

),(),(

2211

2121

TTKTTK

TTKTTK

Decay factor

Page 45: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Partial Tree Kernel

• By adding two decay factors we obtain:

Page 46: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

SRL Demo

• Kernel-based system for SRL over rawtexts

• … based on the Charniak parser

• Adopts the Propbank standard but hasalso been applied to Framenet

Page 47: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.
Page 48: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.
Page 49: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.
Page 50: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.
Page 51: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.
Page 52: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.
Page 53: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Automatic Predicate Argument Extraction

• Boundary Detection

– One binary classifier

• Argument Type Classification

– Multi-classification problem

– n binary classifiers (ONE-vs-ALL)

– Select the argument with maximum score

Page 54: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Typical standard flat features (Gildea & Jurasfky, 2002)

• Phrase Type of the argument

• Parse Tree Path, between the predicate and the argument

• Head word

• Predicate Word

• Position

• Voice

Page 55: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Features for the linear kernel in SRL

Page 56: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

An example

Predicate

S

N

NP

D N

VP

V Paul

in

delivers

a talk

PP

IN N

Rome

Arg. 1

Phrase

TypePredicate

Word

Head Word

Parse Tree

Path

Voice

Active

Position

Right

Page 57: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Flat features (Linear Kernel)

• To each example is associated a vector of 6 feature types

• The dot product counts the number of features in common

zx

V P PW HW PTP PT

1) ..,1,..,1,..,0, ..,0,..,1,..,0, ..,0,..,1,..,0, ..,0,..,1,..,0, 0,(x

Page 58: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Automatic Predicate Argument Extraction

Deriving Positive/Negative example

Given a sentence, a predicate p:

1. Derive the sentence parse tree

2. For each node pair <Np,Nx>

a. Extract a feature representation set F

b. If Nx exactly covers the Arg-i, F is one of its

positive examples

c. F is a negative example otherwise

Page 59: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Argument Classification Accuracy

0,75

0,78

0,80

0,83

0,85

0,88

0 10 20 30 40 50 60 70 80 90 100

Ac

cu

rac

y

---

% Training Data

ST SST

Linear PT

Page 60: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

SRL in Framenet: Results

Page 61: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Framenet SRL: best results

• Best system [Erk&Pado, 2006]

– 0.855 Precision, 0.669 Recall

– 0.751 F1

• Trento (+RTV) system (Coppola, PhD2009)

Page 62: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Conclusions

• Kernel –based learning is very useful in NLP as for the

possibility of embedding similarity measures for highly

structured data

– Sequence

– Trees

• Tree kernels are a natural way to introduce syntactic information

in natural language learning.

– Very useful when few knowledge is available about the

proposed problem.

– Alleviate manual feature engineering (predicate knowledge)

• Different forms of syntactic information require different tree

kernels.

– Collins and Duffy’s kernel (SST) useful for constituent parsing

– The new Partial Tree kernel useful for dependency parsing

Page 63: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Conclusions (2)

• Experiments on SRL and QC show that– PT and SST are efficient and very fast

– Higher accuracy when the proper kernel is used for the target task

• Open research issue are– Proper kernel design issues for the different

tasks

– Combination of syntagmatic kernels with semantic ones

• An example is the Wordnet-based kernel in (Basili et al CoNLL 05))

Page 64: Support Vector Machines and Kernel Methods for Robust ...twiki.di.uniroma1.it/pub/NLP/WebHome/ML4NLP2.pdf · Given a sentence, a predicate p: 1. Derive the sentence parse tree 2.

Tree-kernel: References

• Available over the Web:

– A. Moschitti, A study on Convolution Kernels for Shallow Semantic

Parsing. In proceedings of the 42-th Conference on Associationfor Computational Linguistic (ACL-2004), Barcelona, Spain,2004.

– A. Moschitti, Efficient Convolution Kernels for Dependency and

Constituent Syntactic Trees. In Proceedings of the 17th EuropeanConference on Machine Learning, Berlin, Germany, 2006.

– M. Collins and N. Duffy, 2002, New ranking algorithms for parsing

and tagging: Kernels over discrete structures, and the voted perceptron.In ACL02, 2002.

– S.V.N. Vishwanathan and A.J. Smola. Fast kernels on strings and

trees. In Proceedings of Neural Information ProcessingSystems, 2002.


Recommended