+ All Categories
Home > Documents > Иван Титов — Inducing Semantic Representations from Text with Little or No Supervision

Иван Титов — Inducing Semantic Representations from Text with Little or No Supervision

Date post: 14-Jul-2015
Category:
Upload: yandex
View: 426 times
Download: 0 times
Share this document with a friend
Popular Tags:
74
Inducing Semantic Representations from Text with Little or No Supervision Ivan Titov ILLC, University of Amsterdam
Transcript

Inducing Semantic Representations from

Text

with Little or No Supervision

Ivan Titov

ILLC, University of Amsterdam

2

Alex Klementiev(now at Amazon)

Contributors…

Ehsan Khoddam(U. Amsterdam)

Ashutosh Modi(Saarland Uni)

Diego Marcheggiani

(CNR, Pisa)

Why semantic representations?

3

Question Answering about knowledge in a collection of biomedical

publications:

Question: What does cyclosporin A suppress?

Answer: expression of EGR-2

Sentence: As with EGR-3 , expression of EGR-2 was blocked by cyclosporin A .

Question: What inhibits tnf-alpha?

Answer: IL -10

Sentence: Our previous studies in human monocytes have demonstrated that interleukin ( IL )

-10 inhibits lipopolysaccharide ( LPS ) -stimulated production of inflammatory

cytokines , IL-1 beta , IL-6 , IL-8 , and tumor necrosis factor alpha by blocking gene

transcription .

We need to abstract away from specific syntactic and lexical

realizations

Other applications

4

Shown beneficial in:

Machine Translation [Wu and Fung, 2009; Liu and Gildea, 2010, Gao and Vogel, 2011, …]

Dialogue Systems [Basili et al., 2009; van der Plas et al., 2011]

Predicting whether one text fragment logically follows from another, called

Textual Entailment [Sammons et al., 2009]

Authorship attribution [Hedegaard and Simonsen, 2011]

(among others)

Even though we are not yet very good at predicting these

representations

A more direct application – machine reading: extract knowledge from

texts into a form of knowledge base

Outline

5

Task: inducing semantic representations without labeled data

Signals: exploiting signals implicit in unlabeled data

Model and Inference: a hierarchical Bayesian model defining the

process of joint generation of semantic, syntactic and lexical

representations

Evaluation:

results on an human-annotated corpus

a question-answering task from the bio-medical domain

Shortcomings? Other ideas?

Reconstruction error minimization (REM) framework for semantic parser

induction

From Syntax to Semantics

6

Robust syntactic parsers [Collins 1999, Charniak 2001, Petrov and Klein 2006, McDonald

2005, Titov and Henderson 2007] available for tens of languages

However, syntactic analyses are a long way from representing the meaning of

sentences

In other words, they do not specify the underlying predicate argument structure

Specifically, they do not define Who did What to Whom (and

How, Where, When, Why, …)

Frame Semantics

7

A semantic frame [Fillmore 1968] is a conceptual structure describing a situation, object, or event along with associated properties and participants

Example: CLOSURE / OPENING frame

Jack opened the lock with a paper clip

Semantic Roles (aka Frame Elements):

AGENT – an initiator/doer in the event [Who?]

PATIENT - an affected entity [to Whom / to What?]

INSTRUMENT – the entity manipulated to accomplish the goal

Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE,

CIRCUMSTANCES, MANIPULATOR, PORTAL, …

Syntactic-Semantic Interface

8

Though syntactic and lexical representations are often predictive of

the predicate argument structure, this relation is far from trivial:

(1) John broke the window (4) John busted the window

(2) The window broke (5) The window was destroyed by John

(3) The window was broken by John (6) John tore down the window

Semantic Roles:

AGENT – an initiator/doer in the event [Who?]

PATIENT - an affected entity [to Whom / to What?]

The same relation is

encoded by different

predicates (incl. a

multiword expression)

Alternations

Supervised learning of semantic representations is challenging:

datasets provide low coverage, are domain-specific and available

only for a few languages

Our task

9

Semantics is encoded by semantic dependency graphs [Johansson, 2008]

Arguments often evoke their own frames

Arguments and predicates often expressed by multiword expressions

Induce these representations automatically from unannotated texts

woreMary an evening dress from Cardin

Wearer

Wearing

Clothing Creator

Style

GarmentPerson Occasion Brand

woreMary a dress

Wearer

Wearing

Clothing

For simplicity we assume that all

arguments evoke frames

gave an orderPeter the Great build castle

Speaker

Request

Message Created Entity

ConstructionPerson Buildings

to the

gave an orderPeter the Great build castle

9

621

3 11

23951121 3

to the

woreMary an evening dress from Cardin

12

573

5 21

7

8971121 445 1621

woreMary a dress

12

573

5

Our task

10

Semantics is encoded by semantic dependency graphs [Johansson, 2008]

Arguments often evoke their own frames

Arguments and predicates often expressed by multiword expressions

Induce these representations automatically from unannotated texts

For simplicity we assume that all

arguments evoke frames

Outline

11

Task: inducing semantic representations without labeled data

Signals: exploiting signals implicit in unlabeled data

Model and Inference: a hierarchical Bayesian model defining the

process of joint generation of semantic, syntactic and lexical

representations

Evaluation:

results on an human-annotated corpus

a question-answering task from the bio-medical domain

Shortcomings? Other ideas?

Reconstruction error minimization (REM) framework for semantic parser

induction

Induction of Frame-Semantic Information

12

The semantic induction task involves 3 sub-tasks

Construction of a transformed syntactic dependency graph (~ argument

identification)

gave Peter the Great build wooden fortified castlean order

Person Request BuildingsBeing_ProtectedConstruction

to

Material

a

Induction of Frame-Semantic Information

13

The semantic induction task involves 3 sub-tasks

Construction of a transformed syntactic dependency graph (~ argument

identification)

Induction of frames (and clusters of arguments)

gave Peter the Great build wooden fortified castlean order

Person Request

Speaker Message

Created Entity

BuildingsBeing_ProtectedConstruction

to

Material

Material

a

Type

Induction of Frame-Semantic Information

14

The semantic induction task involves 3 sub-tasks

Construction of a transformed syntactic dependency graph (~ argument

identification)

Induction of frames (and clusters of arguments)

Role Induction We model these sub-tasks

jointly within our probabilistic

model

Different from most previous work where each subtask

was tackled in isolation

Handled with a simple

heuristic or a simple

classifier

Outline

15

Task: inducing semantic representations without labeled data

Signals: exploiting signals implicit in unlabeled data

induction of semantic classes (frames and argument clusters)

induction of semantic roles

Model and Inference: a hierarchical Bayesian model defining the

process of joint generation of semantic, syntactic and lexical

representations

Evaluation:

results on an human-annotated corpus

a question-answering task from the bio-medical domain

Shortcomings? Other ideas?

Reconstruction error minimization (REM) framework for semantic parser

induction

Induction of Semantic Classes: Definition

16

Induction of frames and induction of argument clusters – the same task

We will refer to both of them as semantic classes

Induction of semantic classes involves:

Clustering of lexemes with similar meaning

break, bust, destroy should be clustered together

Detection of multi-word expression, i.e. expressions which are not (sufficiently)

compositional

these includes idiomatic expressions, terminology, proper nouns, …

E.g., hold a victory over, red herringLater, they can be clustered with atomic ones.

E.g., win + held a victory over

Now we will discuss 3 signals of semantic-relatedness we

encode in our model to induce the clusterings

Induction of Semantic Classes: Signal 1

17

Selectional preferences: roles for related predicates are filled with similar

arguments.

Top argument fillers from the PropBank dataset

to wear:

WEARER role: lawyer, employee, French, judge, woman, …

CLOTHING role: hat, uniform, suit, nothing, clothes, …

to dress (in):

WEARER role: woman, attendant, employee, defendant, investigator, …

CLOTHING role: uniform, jeans, shirts, leather, …

In this definition, we implicitly rely on the notion of roles but they are also latent

In fact, we model a distribution over (latent) semantic classes

PEOPLE, ANIMALS, …

GARMENT, HEADWEAR, …

PEOPLE, ANIMALS, …

GARMENT, …

joint learning of all semantic classes and roles may be

beneficial

Induction of Semantic Classes: Signal 2

18

Inverse of the previous one: similar arguments fill slots for similar sets of

predicates

from PropBank dataset

shirt:

an argument for predicates: dress, wear, buy, display, ..

uniform:

an argument for predicates: wear, have, dress, don, …

Again, we model not a distribution over lexemes but a distribution over (latent)

semantic classes

WEARING, COMMERCE_BUY, …

WEARING, POSSESSION, …

Induction of Semantic Classes: Signal 3

19

Levin classes, that is groups of expressions (esp. verbs) representing

similar mapping between syntactic and semantic roles are more likely to

be semantically related [Levin, 1993]:

Verbs of Transfer of a Message (teach, show, read, …)

Dative alternation:

John taught linguistics to the students

John taught the students linguistics

Constructions:

John taught the students that …

Again, relies on latent semantic roles

Not sufficient on its own but can be a useful signal

Only coarse modeling of this signal

Some of manually produced Levin

classes are not semantically

coherent [Levin, 1993]

Outline

20

Task: inducing semantic representations without labeled data

Signals: exploiting signals implicit in unlabeled data

induction of semantic classes (frames and argument clusters)

induction of semantic roles

Model and Inference: a hierarchical Bayesian model defining the

process of joint generation of semantic, syntactic and lexical

representations

Evaluation:

results on an human-annotated corpus

a question-answering task from the bio-medical domain

Shortcomings? Other ideas?

Reconstruction error minimization (REM) framework for semantic parser

induction

Though after argument and semantic class identification and we know where arguments are, we do not know their semantic roles

The step can be regarded as clustering of argument occurrences for a given semantic class

Induction of Semantic Roles: Definition

21

taughtJohn linguistics

Teaching

studentsto the

taughtDave the students

Teaching

machine learning

taughtattendants how to fly

Teaching

wereThe

Though after argument and semantic class identification and we know where arguments are, we do not know their semantic roles

The step can be regarded as clustering of argument occurrences for a given semantic class

The search space is huge – in realistic datasets frequents semantic classes appear tens of thousands times

Induction of Semantic Roles: Definition

22

We need to “color” themtaughtJohn linguistics

Teaching

studentsto the

taughtDave the students

Teaching

machine learning

taughtattendants how to fly

Teaching

wereThe

Role 1 Role 2 Role 3

Role 2Role 1 Role 3

Role 2Role 3

Argument Keys

23

We identify arg occurrences with syntactic signatures (argument

keys)

E.g., some simple alternations like locative preposition drop

Argument keys are designed so that to map mostly to a single role

Instead of clustering occurrences we cluster argument keys

Here, we would cluster ACTIVE:RIGHT:OBJ and ACTIVE:RIGHT:PMOD_up

together

More complex alternations require multiples pairs of arg keys clustered

ACTIVE:RIGHT:PMOD_up

ACTIVE:RIGHT:OBJ

climbedMary mountain

Motion

up the

climbedMary mountain

Motion

the

Role 1

Role 1

Role 2

Role 2

Induction of Semantic Roles: Signals

24

Selectional preferences:

Two argument keys are likely to correspond to the same role if the corresponding sets of arguments are similar

Duplicate roles are unlikely to occur

E.g., coloring the sentence as

is a bad idea

Conjunctions are handled

differentlytaughtJohn linguistics

Teaching

studentsto the

Role 1Role 2

Role 2

Modeling assumptions

25

(1) For roles, the distribution over classes of argument fillers is sparse

(2) Each semantic class can be verbalized as a sparse distribution over lexemes or syntactic tree fragments

(3) Semantically-similar predicates have the same linking between syntax and semantics

(4) The same semantic role rarely appears twice

(5) Argument key clusterings for different predicates are related

(Titov and Klementiev, ACL 2011, EACL 2012)

How to encode this in a statistical model?

Outline

26

Task: inducing semantic representations without labeled data

Signals: exploiting signals implicit in unlabeled data

Model and Inference: a hierarchical Bayesian model defining the

process of joint generation of semantic, syntactic and lexical

representations

Evaluation:

results on an human-annotated corpus

a question-answering task from the bio-medical domain

Shortcomings? Other ideas?

Reconstruction error minimization (REM) framework for semantic parser

induction

Inducing Semantics

27

Given a (large) collection of sentences annotated with

(transformed) syntactic dependencies

woreMary an evening dress from Cardin

Given a (large) collection of sentences annotated with

(transformed) syntactic dependencies

We want to induce its semantics , i.e. its segmentation

and clustering

gave Peter the Great build wooden fortified castlean order

Person Request

Speaker Message

Created Entity

BuildingsBeing_ProtectedConstruction

to

Material

Material

a

Type

woreMary an evening dress from Cardin

Wearer

Wearing

Clothing Creator

Style

GarmentPerson Occasion Brand

Inducing Semantics

28

gave Peter the Great build wooden fortified castlean order

1121 621

9 3

11

33332395

to

78

24

a

4

woreMary an evening dress from Cardin

12

573

5 21

7

8971121 445 1621

Given a (large) collection of sentences annotated with

(transformed) syntactic dependencies

We want to induce its semantics , i.e. its segmentation

and clustering

Inducing Semantics

29

Induction with a Generative Model

30

Define a family of generative models encoding our assumptions:

,

where is the model parameters

In the prior probability over parameters, we encode our beliefs:

We want to find the maximum a posteriori semantics given the

observable data

We use this to encode sparsity of

distributions: semantic classes can

be expressed only in a small

number of ways

The (Simplified) Model

31

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

The (Simplified) Model

32

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Request

The (Simplified) Model

33

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

gave an order

Request

{ We use hierarchical Dirichlet processes

to represent distributions over tree

fragments }

The (Simplified) Model

34

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

gave an order

Request

Speaker

The (Simplified) Model

35

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument

GenA r gument (c, t)

ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

gave an order

Request

Speaker

The (Simplified) Model

36

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument

GenA r gument (c, t)

ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

gave an order

Request

Speaker

ACTIVE:LEFT:SBJ

The (Simplified) Model

37

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument

GenA r gument (c, t)

ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

gave an order

Person Request

Speaker

ACTIVE:LEFT:SBJ

The (Simplified) Model

38

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument

GenA r gument (c, t)

ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

Recurse

gave Peter the Great an order

Person Request

Speaker

ACTIVE:LEFT:SBJ

The (Simplified) Model

39

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument

GenA r gument (c, t)

ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

RecurseContinue

generation

gave Peter the Great an order

Person Request

Speaker

ACTIVE:LEFT:SBJ

The (Simplified) Model

40

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument

GenA r gument (c, t)

ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

RecurseContinue

generation

Draw more

arguments

gave Peter the Great buildan order

Person Request

Speaker Message

Constr.

to

ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ

The (Simplified) Model

41

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)

GenA r gument (c, t)

GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument

GenA r gument (c, t)

ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

RecurseContinue

generation

Draw more

arguments

gave Peter the Great build fortified castlean order

Person Request

Speaker Message

Created Entity

BuildingsProtectedConstr.

to a

Type

ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ

ACTIVE:RIGHT:OBJ

-:LEFT:NMOD

Under the hood …

42

(1) For roles, the distribution over classes of argument fillers is sparse

We use a sparse prior, Hierarchical Dirichlet Processes [Teh et al, 05]

(2) Each semantic class can be verbalized as a sparse distribution over lexemes or syntactic tree fragments

Sparse priors over syntactic trees ( as in Bayesian TSGs [Cohn et al. 07])

(3) Semantically-similar predicates have the same linking between syntax and semantics

We use sparse Dirichlet priors to encode the linking

(4) The same semantic role rarely appears twice

Use a non-symmetric Dirichlet prior for the corresponding geom. distrib

(5) Argument key clusterings for different predicates are related

Induce a shared weighted graph used in a (distance-dependent) Chinese Restaurant Process [Blei and Frazer 11] prior for each clustering

(Titov and Klementiev, ACL 2011, EACL 2012)

Previous approaches induce roles for each predicate independently

These clusterings define permissible alternations

But many alternations are shared across verbs

Can we share this information across verbs?

Joint learning of roles across predicates

43

John gave the book to

Mary Mike threw the ball to me

Dative alternation

vs John gave Mary the

bookvs Mike threw me the ball

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

Idea: keep track of how likely a pair of argument keys should be

clustered

Define a similarity matrix (or similarity graph)

A Bayesian model for role labeling

44

Similarity score between PASS:LEFT:SBJ and ACT:RIGHT:OBJ

A Bayesian model for role labeling

45

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-

by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

open overtake

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-

by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

A Bayesian model for role labeling

46

open overtake

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:LG

S-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-

by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:LG

S-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

A Bayesian model for role labeling

47

open overtake

A formal way to encode this: dd-CRP

48

Can use CRP to define a prior on the partition of argument keys:

The first customer (argument key) sits the first table (role)

m-th customer sits at a table according to:

An extension is distance-dependent CRP (dd-CRP):

m-th customer chooses a customer to sit with according to:

. . .2

1

34

5

6

7

State of the

restaurant once m-1

customers are seated

Entire similarity

graph

Similarity between

customers m and j

1 2 3 4 5 6 7

Encodes rich-get-

richer dynamics but

not much more than

that

Footnote: marginal invariance, BNP view

Inference

49

We use an iterative sampling algorithm for inference

On every step, a sampler attempts a random change in labeling of latent

semantic representation .

Roughly, it keeps the relabeling if the probability increases, and rejects it,

otherwise.

Inference is challenging as the search space is huge

We define the following types of moves (‘relabelings’)

Role-Syntax alignment

Choose a new clustering of argument keys for a frame

Split – Merge

Merge 2 semantic classes together or split one class in two

Compose-Decompose

Compose fragments of syntactic tree to form a new realization or split a

fragment

break + bust

held + a victory = held a victory

See an alternative sampler for our model by Rabinovich and Ghahramani

(NIPS LS '14)

Outline

50

Task: inducing semantic representations without labeled data

Signals: exploiting signals implicit in unlabeled data

Model and Inference: a hierarchical Bayesian model defining the

process of joint generation of semantic, syntactic and lexical

representations

Evaluations:

results on an human-annotated corpus

a question-answering task from the bio-medical domain

Shortcomings? Other ideas?

Reconstruction error minimization (REM) framework for semantic parser

induction

70

75

80

85

90

Llogis c GraphPart SplitMergeMonoBayes SyntF

Benchmark Dataset: PropBank (CoNLL 08)

51

Evaluation of semantic role induction

Purity measures the degree to which each induced role contains arguments

sharing the same gold (“true”) role

Collocation evaluates the degree to

which arguments with the same gold

roles are assigned to a single induced role

Gold role Induced role

State-of-the-art

approaches

Optimal deterministic mapping

from syntactic relations

Our model

F1, Harmonic mean of PU and

CO

The improvement is large for this problem and we are the first

to outperform the syntactic baseline by a substantial margin

Benchmark Dataset: PropBank (CoNLL 08)

52

Looking into induced graph encoding ‘priors’ over clustering

arguments keys, the most highly ranked pairs encode (or

partially encode)

Passivization

Near-equivalence of subordinating conjunctions and

prepositions

E.g., whether and if

Benefactive alternation

Martha carved a doll for the baby

Martha carved the baby a doll

Dative alternation

I gave the book to Mary

I gave Mary the book

Recovery of unnecessary splits introduced by argument keys

Encoded as (ACTIVE:RIGHT:OBJ_if,

ACTIVE:RIGHT:OBJ_whether)

Application-based Evaluation

53

Question Answering about knowledge in a corpus of biomedical

abstracts

Dataset: 1999 biomedical abstracts from the Genia corpus (Kim et al, 2003)

Examples of induced semantic classes:Class Variations

1 motif, sequence, regulatory element, response element, element, dna sequence

2 donor, individual, subject

3 important, essential, critical

4 dose, concentration

5 activation, transcriptional activation, transactivation

6 b cell, t lymphocyte, thymocyte, b lymphocyte, t cell, t-cell line, human lymphocyte,

t-lymphocyte

7 indicate, reveal, document, suggest, demonstrate

8 augment, abolish, inhibit, convert, cause, abrogate, modulate, block, decrease,

reduce, diminish, suppress, up-regulate, impair, reverse, enhance

9 confirm, assess, examine, study, evaluate, test, resolve, determine, investigate

10 nf-kappab, nf-kappa b, nfkappab, nf-kb

Roughly “cause

change position

on a scale”

frame

Blood cells

Application-based Evaluation

54

Question Answering about knowledge in a corpus of biomedical

abstracts

Example questions and answers:

Question: What does cyclosporin A suppress?

Answer: expression of EGR-2

Sentence: As with EGR-3 , expression of EGR-2 was blocked by cyclosporin A .

Question: What inhibits tnf-alpha?

Answer: IL -10

Sentence: Our previous studies in human monocytes have demonstrated that interleukin ( IL )

-10 inhibits lipopolysaccharide ( LPS ) -stimulated production of inflammatory

cytokines , IL-1 beta , IL-6 , IL-8 , and tumor necrosis factor alpha by blocking gene

transcription .

Application-based Evaluation

55

Question Answering about knowledge in a corpus of biomedical

abstracts

More than 55% of mistakes are

due to over coarse clustering in

3 semantic classes (antonymy /

hyponymy)

This workKeyword

matching

Standard information extraction

methods

So far …

56

We proposed a method for learning semantics with no supervision

Joint induction of multiword expressions, semantic classes and roles

Substantially outperforms alternatives (pipelines / heuristic approaches /

…)

Extensions:

Semi-supervised learning [COLING 2012]

Cross-lingual extensions [ACL 2012,…]

Induction of `scripts' (how frames are organized into scenarios) [EACL

2014]

But …

[more details in ACL '11, EACL '12]

Unsupervised frame and role induction

The models rely on very restricted sets of features

not very effective in the semi-supervised set-up, and not very appropriate for languages with freer order than English

… over-rely on syntax

not going to induce, e.g., "X sent Y = Y is a shipment from X"

… use language-specific priors

a substantial drop in performance if no adaptation

… not (quite) appropriate for inference

not only no inference models but also opposites and antonyms (e.g., increase + decrease) are typically grouped together; induced granularity is often problematic; …

In contrast to supervised

methods to frame-semantic

parsing / semantic role labeling

How can we induce frames in a less restrictive feature-rich framework

and tackle other challenges along the way?

Outline

58

Task: inducing semantic representations without labeled data

Signals: exploiting signals implicit in unlabeled data

Model and Inference: a hierarchical Bayesian model defining the

process of joint generation of semantic, syntactic and lexical

representations

Evaluation:

results on an human-annotated corpus

a question-answering task from the bio-medical domain

Issues? Other ideas?

Reconstruction error minimization (REM) framework for semantic parser

induction

Preliminary experiments

Feature-rich models of semantic frames

Consider a frame realization For simplicity: focus on frame

and role labeling (no

identification + one frame per

sentence)

[Titov and Khoddam, '14]

Consider a frame realization

Feature-rich models of semantic frames

How can we define a feature-rich model for

unsupervised induction of roles and frames?

For simplicity: focus on frame

and role labeling (no

identification + one frame per

sentence)

[Titov and Khoddam, '14]

Feature representation of "The police charged... " ( )

Semantic role prediction

( = Encoding)

Assault(Agent: police, Patient: demonstrator, Instrument: baton)

demonstrator

Argument prediction

( = Reconstruction)Hidden

p(r , f |x, w)Feature-rich model

"Argument prediction" model

p(ai |a− i , r , f ,✓)

x

Argument reconstruction

Consider a frame realization

Any existing

supervised role labeler

would do

Hypothesis: semantic roles and frames are the latent

representation which helps to reconstruct arguments

[Titov and Khoddam, '14]

Argument reconstruction

Consider a frame realization

How do the components look like and how

do we estimate them jointly?

[Titov and Khoddam, '14]

Feature representation of "The police charged... " ( )

Semantic role prediction

( = Encoding)

Assault(Agent: police, Patient: demonstrator, Instrument: baton)

demonstrator

Argument prediction

( = Reconstruction)Hidden

p(r , f |x, w)Feature-rich model

"Argument prediction" model

p(ai |a− i , r , f ,✓)

xAny existing

supervised role labeler

would do

Neural autoencoders [Hinton '99, Vincent et al. 08]:

but

… applicable not only to neural models

… reconstruction and encoding components can belong to different model families

… no need to reconstruct the entire input

Reconstruction-error minimizationTrained to minimize the

reconstruction error,

for example, e.g., Reconstructed

input

Encoding

Reconstruction

Input

y 2 RpLatent representation

x 2 Rm

x̃ 2 Rm

See Ammar et al. (NIPS 2014) and also Daumé (ICML 09)

Argument reconstruction

Consider a frame realization

Feature representation of "The police charged... " ( )

Semantic role prediction

( = Encoding)

Assault(Agent: police, Patient: demonstrator, Instrument: baton)

demonstrator

Argument prediction

( = Reconstruction)Hidden

p(r , f |x, w)Feature-rich model

"Argument prediction" model

p(ai |a− i , r , f ,✓)

x

Tensor

factorization

A (structured)

linear

model

Feature representation of "The police charged... " ( )

Semantic role prediction

( = Encoding)

Assault(Agent: police, Patient: demonstrator, Instrument: baton)

demonstrator

Argument prediction

( = Reconstruction)Hidden

p(r , f |x, w)Feature-rich model

"Argument prediction" model

p(ai |a− i , r , f ,✓)

x

Distributed vectors:

- encode semantic properties of argument a

- encode expectations about other argument given that a

is assigned to role r of frame f

The reconstruction model ('softmax'):

Component 1: argument reconstruction

May encode that

demonstrators are

similar to protestors

If Agent of Assault is the

police, then Patient can be

demonstrators or protestors

A role-specific

projection matrix

Feature representation of "The police charged... " ( )

Semantic role prediction

( = Encoding)

Assault(Agent: police, Patient: demonstrator, Instrument: baton)

demonstrator

Argument prediction

( = Reconstruction)Hidden

p(r , f |x, w)Feature-rich model

"Argument prediction" model

p(ai |a− i , r , f ,✓)

x

Component 1: argument reconstruction

10

May encode that

demonstrators are

similar to protestors

A role-specific

projection matrix

Parallels to work on relation modeling (e.g., Bordes et al.,'11),

distributional semantics (e.g., Mikolov et al., '13) or (coupled)

tensor factorization (e.g., Yilmiz et al., '11)

If Agent of Assault is the

police, then Patient can be

demonstrators or protestors

Distributed vectors:

- encode semantic properties of argument a

- encode expectations about other argument given that a

is assigned to role r of frame f

Intuitively, score argument tuples according to the factorization:

The role and frame labeling model:

It can be any model as long as role and frame posteriors

and

can be computed (or approximated)

Component 2: frame + role prediction

A feature-rich representation encoding syntax-

semantics interface

The majority of supervised SRL models;

we used (Johansson and Nugues, '08)

Feature representation of "The police charged... " ( )

Semantic frame prediction

( = Encoding)

Assault(Agent: police, Patient: demonstrator, Instrument: baton)

p(r , f |x, w)Feature-rich model

x

For every structure, we aim to optimize the expectation of the

argument prediction quality given roles and frames:

Not very tractable in this exact form, usual 'tricks' are needed:

'mean field': substituting posterior means instead of marginalization

'negative sampling' (as, e.g., in Mikolov et al '13) instead of 'softmax'

Joint learning

Feature representation of "The police charged... " ( )

Semantic frame prediction

( = Encoding)

Assault(Agent: police, Patient: demonstrator, Instrument: baton)

demonstrator

Argument prediction

( = Reconstruction)Hidden

p(r , f |x, w)Feature-rich model

"Argument prediction" model

p(ai |a− i , r , f ,✓)

x

Training can be quite efficient as all models are linear (or

bilinear)

Outline

69

Task: inducing semantic representations without labeled data

Signals: exploiting signals implicit in unlabeled data

Model and Inference: a hierarchical Bayesian model defining the

process of joint generation of semantic, syntactic and lexical

representations

Evaluation:

results on an human-annotated corpus

a question-answering task from the bio-medical domain

Issues? Other ideas?

Reconstruction error minimization (REM) framework for semantic parser

induction

Preliminary experiments

Feature representation of "The police charged... " ( )

charge(Agent: police, Patient: demonstrator, Instrument: baton)

demonstrator

Hidden

p(r |x, w)Feature-rich model

"Argument prediction" model

p(ai |a− i , r , v,✓)

x

Experiments: only role induction

Evaluate on a dataset annotated with roles (PropBank for En, SALSA for De)

Compare against previous models evaluated in this set-up

use clustering evaluation measures (purity, collocation, F1)

No frame induction – use

the predicate itself (v)

We replicate previous evaluation: datasets are fairly

small (e.g., ~ 90,000 predicate-argument structures for

English)

May not be the optimal

set-up for our

expressive model

English (F1)

Previous approaches evaluated

in the same setting

Optimal deterministic

mapping from syntactic

relations

The feature-rich model

Logistic: Lang and Lapata ('10)

GraphP: Lang and Lapata ('11a)

Linking: Fürstenau and Rambow ('12)

Aggl: Lang and Lapata ('11b)

Order: Garg and Henderson ('12)

Aggl+: Lang and Lapata ('14)

Bayes: Titov and Klementiev ('12)

Performs on par with best

methods (without

language-specific priors)

Induces fewer roles than

most other approaches but

under certain regimes,

roles start to capture verb

senses

A new framework for inducing frames

allowing for combine ideas from relation modeling and supervised role labeling

more data, more languages, other factorizations, …

The framework naturally supports:

Semi-supervised learning

In principle, the reconstruction objective can be easily extended with the conditional likelihood objective on labeled data

Learning for inference

Modeling entities as arguments and coupling factorizations across relations

Document level reasoning and disambiguation

Again, factorizations provide a natural framework here (entity representation are specific to a document)

REM framework See also

Titov and Khoddam '14:

http://arxiv.org/abs/1412.2812

Conclusions

73

We know that rule-based semantic parsing is not the way to go

But (fully) supervised open-domain semantic parsing is not very

promising as well

What kind of resources can we leverage?

Un-annotated text?

Ontologies?

Linking between the two?

What kind of models should we use?

Generative (Bayesian) models?

Feature-rich models?

Thank you!

Special thanks to Dipanjan Das, Mike Kozhevnikov, Alexis Palmer, Manfred Pinkal, …

… to YOU for inviting me here!

Funding:

Google Focused Award on Natural Language Understanding 2013

Google Faculty Research Award 2011

German Research Foundation (DFG) / MMCI


Recommended