Иван Титов — Inducing Semantic Representations from Text with Little or No Supervision

Inducing Semantic Representations from

Text

with Little or No Supervision

Ivan Titov

ILLC, University of Amsterdam

2

Alex Klementiev(now at Amazon)

Contributors…

Ehsan Khoddam(U. Amsterdam)

Ashutosh Modi(Saarland Uni)

Diego Marcheggiani

(CNR, Pisa)

Why semantic representations?

3

Question Answering about knowledge in a collection of biomedical

publications:

Question: What does cyclosporin A suppress?

Answer: expression of EGR-2

Sentence: As with EGR-3 , expression of EGR-2 was blocked by cyclosporin A .

Question: What inhibits tnf-alpha?

Answer: IL -10

Sentence: Our previous studies in human monocytes have demonstrated that interleukin ( IL )

-10 inhibits lipopolysaccharide ( LPS ) -stimulated production of inflammatory

cytokines , IL-1 beta , IL-6 , IL-8 , and tumor necrosis factor alpha by blocking gene

transcription .

We need to abstract away from specific syntactic and lexical

realizations

Other applications

4

Shown beneficial in:

Machine Translation [Wu and Fung, 2009; Liu and Gildea, 2010, Gao and Vogel, 2011, …]

Dialogue Systems [Basili et al., 2009; van der Plas et al., 2011]

Predicting whether one text fragment logically follows from another, called

Textual Entailment [Sammons et al., 2009]

Authorship attribution [Hedegaard and Simonsen, 2011]

(among others)

Even though we are not yet very good at predicting these

representations

A more direct application – machine reading: extract knowledge from

texts into a form of knowledge base

Outline

5

Task: inducing semantic representations without labeled data

Signals: exploiting signals implicit in unlabeled data

Model and Inference: a hierarchical Bayesian model defining the

process of joint generation of semantic, syntactic and lexical

representations

Evaluation:

results on an human-annotated corpus

a question-answering task from the bio-medical domain

Shortcomings? Other ideas?

Reconstruction error minimization (REM) framework for semantic parser

induction

From Syntax to Semantics

6

Robust syntactic parsers [Collins 1999, Charniak 2001, Petrov and Klein 2006, McDonald

2005, Titov and Henderson 2007] available for tens of languages

However, syntactic analyses are a long way from representing the meaning of

sentences

In other words, they do not specify the underlying predicate argument structure

Specifically, they do not define Who did What to Whom (and

How, Where, When, Why, …)

Frame Semantics

7

A semantic frame [Fillmore 1968] is a conceptual structure describing a situation, object, or event along with associated properties and participants

Example: CLOSURE / OPENING frame

Jack opened the lock with a paper clip

Semantic Roles (aka Frame Elements):

AGENT – an initiator/doer in the event [Who?]

PATIENT - an affected entity [to Whom / to What?]

INSTRUMENT – the entity manipulated to accomplish the goal

Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE,

CIRCUMSTANCES, MANIPULATOR, PORTAL, …

Syntactic-Semantic Interface

8

Though syntactic and lexical representations are often predictive of

the predicate argument structure, this relation is far from trivial:

(1) John broke the window (4) John busted the window

(2) The window broke (5) The window was destroyed by John

(3) The window was broken by John (6) John tore down the window

Semantic Roles:

AGENT – an initiator/doer in the event [Who?]

PATIENT - an affected entity [to Whom / to What?]

The same relation is

encoded by different

predicates (incl. a

multiword expression)

Alternations

Supervised learning of semantic representations is challenging:

datasets provide low coverage, are domain-specific and available

only for a few languages

Our task

9

Semantics is encoded by semantic dependency graphs [Johansson, 2008]

Arguments often evoke their own frames

Arguments and predicates often expressed by multiword expressions

Induce these representations automatically from unannotated texts

woreMary an evening dress from Cardin

Wearer

Wearing

Clothing Creator

Style

GarmentPerson Occasion Brand

woreMary a dress

Wearer

Wearing

Clothing

For simplicity we assume that all

arguments evoke frames

gave an orderPeter the Great build castle

Speaker

Request

Message Created Entity

ConstructionPerson Buildings

to the

gave an orderPeter the Great build castle

9

621

3 11

23951121 3

to the


12

573

5 21

7

8971121 445 1621

woreMary a dress

12

573

5

Our task

10

Semantics is encoded by semantic dependency graphs [Johansson, 2008]

Arguments often evoke their own frames

Arguments and predicates often expressed by multiword expressions

Induce these representations automatically from unannotated texts

For simplicity we assume that all

arguments evoke frames

Outline

11





representations

Evaluation:





induction

Induction of Frame-Semantic Information

12

The semantic induction task involves 3 sub-tasks

Construction of a transformed syntactic dependency graph (~ argument

identification)

gave Peter the Great build wooden fortified castlean order

Person Request BuildingsBeing_ProtectedConstruction

to

Material

a


13



identification)

Induction of frames (and clusters of arguments)


Person Request

Speaker Message

Created Entity

BuildingsBeing_ProtectedConstruction

to

Material

Material

a

Type


14



identification)

Induction of frames (and clusters of arguments)

Role Induction We model these sub-tasks

jointly within our probabilistic

model

Different from most previous work where each subtask

was tackled in isolation

Handled with a simple

heuristic or a simple

classifier

Outline

15



induction of semantic classes (frames and argument clusters)

induction of semantic roles



representations

Evaluation:





induction

Induction of Semantic Classes: Definition

16

Induction of frames and induction of argument clusters – the same task

We will refer to both of them as semantic classes

Induction of semantic classes involves:

Clustering of lexemes with similar meaning

break, bust, destroy should be clustered together

Detection of multi-word expression, i.e. expressions which are not (sufficiently)

compositional

these includes idiomatic expressions, terminology, proper nouns, …

E.g., hold a victory over, red herringLater, they can be clustered with atomic ones.

E.g., win + held a victory over

Now we will discuss 3 signals of semantic-relatedness we

encode in our model to induce the clusterings

Induction of Semantic Classes: Signal 1

17

Selectional preferences: roles for related predicates are filled with similar

arguments.

Top argument fillers from the PropBank dataset

to wear:

WEARER role: lawyer, employee, French, judge, woman, …

CLOTHING role: hat, uniform, suit, nothing, clothes, …

to dress (in):

WEARER role: woman, attendant, employee, defendant, investigator, …

CLOTHING role: uniform, jeans, shirts, leather, …

In this definition, we implicitly rely on the notion of roles but they are also latent

In fact, we model a distribution over (latent) semantic classes

PEOPLE, ANIMALS, …

GARMENT, HEADWEAR, …

PEOPLE, ANIMALS, …

GARMENT, …

joint learning of all semantic classes and roles may be

beneficial


18

Inverse of the previous one: similar arguments fill slots for similar sets of

predicates

from PropBank dataset

shirt:

an argument for predicates: dress, wear, buy, display, ..

uniform:

an argument for predicates: wear, have, dress, don, …

Again, we model not a distribution over lexemes but a distribution over (latent)

semantic classes

WEARING, COMMERCE_BUY, …

WEARING, POSSESSION, …


19

Levin classes, that is groups of expressions (esp. verbs) representing

similar mapping between syntactic and semantic roles are more likely to

be semantically related [Levin, 1993]:

Verbs of Transfer of a Message (teach, show, read, …)

Dative alternation:

John taught linguistics to the students

John taught the students linguistics

Constructions:

John taught the students that …

Again, relies on latent semantic roles

Not sufficient on its own but can be a useful signal

Only coarse modeling of this signal

Some of manually produced Levin

classes are not semantically

coherent [Levin, 1993]

Outline

20



induction of semantic classes (frames and argument clusters)

induction of semantic roles



representations

Evaluation:





induction

Though after argument and semantic class identification and we know where arguments are, we do not know their semantic roles

The step can be regarded as clustering of argument occurrences for a given semantic class

Induction of Semantic Roles: Definition

21

taughtJohn linguistics

Teaching

studentsto the

taughtDave the students

Teaching

machine learning

taughtattendants how to fly

Teaching

wereThe

Though after argument and semantic class identification and we know where arguments are, we do not know their semantic roles

The step can be regarded as clustering of argument occurrences for a given semantic class

The search space is huge – in realistic datasets frequents semantic classes appear tens of thousands times

Induction of Semantic Roles: Definition

22

We need to “color” themtaughtJohn linguistics

Teaching

studentsto the

taughtDave the students

Teaching

machine learning

taughtattendants how to fly

Teaching

wereThe

Role 1 Role 2 Role 3

Role 2Role 1 Role 3

Role 2Role 3

Argument Keys

23

We identify arg occurrences with syntactic signatures (argument

keys)

E.g., some simple alternations like locative preposition drop

Argument keys are designed so that to map mostly to a single role

Instead of clustering occurrences we cluster argument keys

Here, we would cluster ACTIVE:RIGHT:OBJ and ACTIVE:RIGHT:PMOD_up

together

More complex alternations require multiples pairs of arg keys clustered

ACTIVE:RIGHT:PMOD_up

ACTIVE:RIGHT:OBJ

climbedMary mountain

Motion

up the

climbedMary mountain

Motion

the

Role 1

Role 1

Role 2

Role 2

Induction of Semantic Roles: Signals

24

Selectional preferences:

Two argument keys are likely to correspond to the same role if the corresponding sets of arguments are similar

Duplicate roles are unlikely to occur

E.g., coloring the sentence as

is a bad idea

Conjunctions are handled

differentlytaughtJohn linguistics

Teaching

studentsto the

Role 1Role 2

Role 2

Modeling assumptions

25

(1) For roles, the distribution over classes of argument fillers is sparse

(2) Each semantic class can be verbalized as a sparse distribution over lexemes or syntactic tree fragments

(3) Semantically-similar predicates have the same linking between syntax and semantics

(4) The same semantic role rarely appears twice

(5) Argument key clusterings for different predicates are related

(Titov and Klementiev, ACL 2011, EACL 2012)

How to encode this in a statistical model?

Outline

26





representations

Evaluation:





induction

Inducing Semantics

27

Given a (large) collection of sentences annotated with

(transformed) syntactic dependencies


…



We want to induce its semantics , i.e. its segmentation

and clustering


Person Request

Speaker Message

Created Entity

BuildingsBeing_ProtectedConstruction

to

Material

Material

a

Type


Wearer

Wearing

Clothing Creator

Style

GarmentPerson Occasion Brand

Inducing Semantics

28

…


1121 621

9 3

11

33332395

to

78

24

a

4


12

573

5 21

7

8971121 445 1621



We want to induce its semantics , i.e. its segmentation

and clustering

Inducing Semantics

29

…

Induction with a Generative Model

30

Define a family of generative models encoding our assumptions:

,

where is the model parameters

In the prior probability over parameters, we encode our beliefs:

We want to find the maximum a posteriori semantics given the

observable data

We use this to encode sparsity of

distributions: semantic classes can

be expressed only in a small

number of ways

The (Simplified) Model

31

GenSemClass(cr oot )

cr oot ∼ θr oot

f or each sentence :Draw

semantic

class for root


32


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc

GenA r gument (c, t)


GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Request


33


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc



GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

gave an order

Request

{ We use hierarchical Dirichlet processes

to represent distributions over tree

fragments }


34


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc



GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

gave an order

Request

Speaker


35


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc



GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument


ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

gave an order

Request

Speaker


36


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc



GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument


ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

gave an order

Request

Speaker

ACTIVE:LEFT:SBJ


37


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc



GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument


ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

gave an order

Person Request

Speaker

ACTIVE:LEFT:SBJ


38


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc



GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument


ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

Recurse

gave Peter the Great an order

Person Request

Speaker

ACTIVE:LEFT:SBJ


39


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc



GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument


ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

RecurseContinue

generation

gave Peter the Great an order

Person Request

Speaker

ACTIVE:LEFT:SBJ


40


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc



GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument


ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

RecurseContinue

generation

Draw more

arguments

gave Peter the Great buildan order

Person Request

Speaker Message

Constr.

to

ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ


41


cr oot ∼ θr oot


semantic

class for root

whi le [n ∼ ψ+c,t ] = 1 :

f or each r ole t = 1, . . . , T :

s ∼ φc



GenSemClass(c)

i f [n ∼ ψc,t ] = 1 :

Draw synt/lex

realization

At least one

argument

Draw first

argument


ac,t ∼ φc,t

cc,t ∼ θc,t

GenSemClass(cc,t )

Draw

argument key

Draw

semantic

class for arg

RecurseContinue

generation

Draw more

arguments

gave Peter the Great build fortified castlean order

Person Request

Speaker Message

Created Entity

BuildingsProtectedConstr.

to a

Type

ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ

ACTIVE:RIGHT:OBJ

-:LEFT:NMOD

Under the hood …

42

(1) For roles, the distribution over classes of argument fillers is sparse

We use a sparse prior, Hierarchical Dirichlet Processes [Teh et al, 05]

(2) Each semantic class can be verbalized as a sparse distribution over lexemes or syntactic tree fragments

Sparse priors over syntactic trees ( as in Bayesian TSGs [Cohn et al. 07])

(3) Semantically-similar predicates have the same linking between syntax and semantics

We use sparse Dirichlet priors to encode the linking

(4) The same semantic role rarely appears twice

Use a non-symmetric Dirichlet prior for the corresponding geom. distrib

(5) Argument key clusterings for different predicates are related

Induce a shared weighted graph used in a (distance-dependent) Chinese Restaurant Process [Blei and Frazer 11] prior for each clustering

(Titov and Klementiev, ACL 2011, EACL 2012)

Previous approaches induce roles for each predicate independently

These clusterings define permissible alternations

But many alternations are shared across verbs

Can we share this information across verbs?

Joint learning of roles across predicates

43

John gave the book to

Mary Mike threw the ball to me

Dative alternation

vs John gave Mary the

bookvs Mike threw me the ball

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

Idea: keep track of how likely a pair of argument keys should be

clustered

Define a similarity matrix (or similarity graph)

A Bayesian model for role labeling

44

Similarity score between PASS:LEFT:SBJ and ACT:RIGHT:OBJ


45

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-

by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

open overtake

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-

by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...


46

open overtake

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:LG

S-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:

LGS-

by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...

...

...

...

... ... ...

ACT:

RIGHT:

OBJ

ACT:

LEFT

:SBJ

PASS

:RIG

HT:LG

S-by

PASS

:LEF

T:SB

J

ACT:RIGHT:OBJ

ACT:LEFT:SBJ

PASS:LEFT:SBJ

PASS:RIGHT:LGS-by

...

...


47

open overtake

A formal way to encode this: dd-CRP

48

Can use CRP to define a prior on the partition of argument keys:

The first customer (argument key) sits the first table (role)

m-th customer sits at a table according to:

An extension is distance-dependent CRP (dd-CRP):

m-th customer chooses a customer to sit with according to:

. . .2

1

34

5

6

7

State of the

restaurant once m-1

customers are seated

Entire similarity

graph

Similarity between

customers m and j

1 2 3 4 5 6 7

Encodes rich-get-

richer dynamics but

not much more than

that

Footnote: marginal invariance, BNP view

Inference

49

We use an iterative sampling algorithm for inference

On every step, a sampler attempts a random change in labeling of latent

semantic representation .

Roughly, it keeps the relabeling if the probability increases, and rejects it,

otherwise.

Inference is challenging as the search space is huge

We define the following types of moves (‘relabelings’)

Role-Syntax alignment

Choose a new clustering of argument keys for a frame

Split – Merge

Merge 2 semantic classes together or split one class in two

Compose-Decompose

Compose fragments of syntactic tree to form a new realization or split a

fragment

break + bust

held + a victory = held a victory

See an alternative sampler for our model by Rabinovich and Ghahramani

(NIPS LS '14)

Outline

50





representations

Evaluations:





induction

70

75

80

85

90

Llogis c GraphPart SplitMergeMonoBayes SyntF

Benchmark Dataset: PropBank (CoNLL 08)

51

Evaluation of semantic role induction

Purity measures the degree to which each induced role contains arguments

sharing the same gold (“true”) role

Collocation evaluates the degree to

which arguments with the same gold

roles are assigned to a single induced role

Gold role Induced role

State-of-the-art

approaches

Optimal deterministic mapping

from syntactic relations

Our model

F1, Harmonic mean of PU and

CO

The improvement is large for this problem and we are the first

to outperform the syntactic baseline by a substantial margin

Benchmark Dataset: PropBank (CoNLL 08)

52

Looking into induced graph encoding ‘priors’ over clustering

arguments keys, the most highly ranked pairs encode (or

partially encode)

Passivization

Near-equivalence of subordinating conjunctions and

prepositions

E.g., whether and if

Benefactive alternation

Martha carved a doll for the baby

Martha carved the baby a doll

Dative alternation

I gave the book to Mary

I gave Mary the book

Recovery of unnecessary splits introduced by argument keys

Encoded as (ACTIVE:RIGHT:OBJ_if,

ACTIVE:RIGHT:OBJ_whether)

Application-based Evaluation

53

Question Answering about knowledge in a corpus of biomedical

abstracts

Dataset: 1999 biomedical abstracts from the Genia corpus (Kim et al, 2003)

Examples of induced semantic classes:Class Variations

1 motif, sequence, regulatory element, response element, element, dna sequence

2 donor, individual, subject

3 important, essential, critical

4 dose, concentration

5 activation, transcriptional activation, transactivation

6 b cell, t lymphocyte, thymocyte, b lymphocyte, t cell, t-cell line, human lymphocyte,

t-lymphocyte

7 indicate, reveal, document, suggest, demonstrate

8 augment, abolish, inhibit, convert, cause, abrogate, modulate, block, decrease,

reduce, diminish, suppress, up-regulate, impair, reverse, enhance

9 confirm, assess, examine, study, evaluate, test, resolve, determine, investigate

10 nf-kappab, nf-kappa b, nfkappab, nf-kb

Roughly “cause

change position

on a scale”

frame

Blood cells


54


abstracts

Example questions and answers:

Question: What does cyclosporin A suppress?

Answer: expression of EGR-2

Sentence: As with EGR-3 , expression of EGR-2 was blocked by cyclosporin A .

Question: What inhibits tnf-alpha?

Answer: IL -10

Sentence: Our previous studies in human monocytes have demonstrated that interleukin ( IL )

-10 inhibits lipopolysaccharide ( LPS ) -stimulated production of inflammatory

cytokines , IL-1 beta , IL-6 , IL-8 , and tumor necrosis factor alpha by blocking gene

transcription .


55


abstracts

More than 55% of mistakes are

due to over coarse clustering in

3 semantic classes (antonymy /

hyponymy)

This workKeyword

matching

Standard information extraction

methods

So far …

56

We proposed a method for learning semantics with no supervision

Joint induction of multiword expressions, semantic classes and roles

Substantially outperforms alternatives (pipelines / heuristic approaches /

…)

Extensions:

Semi-supervised learning [COLING 2012]

Cross-lingual extensions [ACL 2012,…]

Induction of `scripts' (how frames are organized into scenarios) [EACL

2014]

But …

[more details in ACL '11, EACL '12]

Unsupervised frame and role induction

The models rely on very restricted sets of features

not very effective in the semi-supervised set-up, and not very appropriate for languages with freer order than English

… over-rely on syntax

not going to induce, e.g., "X sent Y = Y is a shipment from X"

… use language-specific priors

a substantial drop in performance if no adaptation

… not (quite) appropriate for inference

not only no inference models but also opposites and antonyms (e.g., increase + decrease) are typically grouped together; induced granularity is often problematic; …

In contrast to supervised

methods to frame-semantic

parsing / semantic role labeling

How can we induce frames in a less restrictive feature-rich framework

and tackle other challenges along the way?

Outline

58





representations

Evaluation:



Issues? Other ideas?


induction

Preliminary experiments

Feature-rich models of semantic frames

Consider a frame realization For simplicity: focus on frame

and role labeling (no

identification + one frame per

sentence)

[Titov and Khoddam, '14]

Consider a frame realization

Feature-rich models of semantic frames

How can we define a feature-rich model for

unsupervised induction of roles and frames?

For simplicity: focus on frame

and role labeling (no

identification + one frame per

sentence)


Feature representation of "The police charged... " ( )

Semantic role prediction

( = Encoding)

Assault(Agent: police, Patient: demonstrator, Instrument: baton)

demonstrator

Argument prediction

( = Reconstruction)Hidden

p(r , f |x, w)Feature-rich model

"Argument prediction" model

p(ai |a− i , r , f ,✓)

x

Argument reconstruction


Any existing

supervised role labeler

would do

Hypothesis: semantic roles and frames are the latent

representation which helps to reconstruct arguments




How do the components look like and how

do we estimate them jointly?




( = Encoding)


demonstrator

Argument prediction




p(ai |a− i , r , f ,✓)

xAny existing

supervised role labeler

would do

Neural autoencoders [Hinton '99, Vincent et al. 08]:

but

… applicable not only to neural models

… reconstruction and encoding components can belong to different model families

… no need to reconstruct the entire input

Reconstruction-error minimizationTrained to minimize the

reconstruction error,

for example, e.g., Reconstructed

input

Encoding

Reconstruction

Input

y 2 RpLatent representation

x 2 Rm

x̃ 2 Rm

See Ammar et al. (NIPS 2014) and also Daumé (ICML 09)





( = Encoding)


demonstrator

Argument prediction




p(ai |a− i , r , f ,✓)

x

Tensor

factorization

A (structured)

linear

model



( = Encoding)


demonstrator

Argument prediction




p(ai |a− i , r , f ,✓)

x

Distributed vectors:

- encode semantic properties of argument a

- encode expectations about other argument given that a

is assigned to role r of frame f

The reconstruction model ('softmax'):

Component 1: argument reconstruction

May encode that

demonstrators are

similar to protestors

If Agent of Assault is the

police, then Patient can be

demonstrators or protestors

A role-specific

projection matrix



( = Encoding)


demonstrator

Argument prediction




p(ai |a− i , r , f ,✓)

x

Component 1: argument reconstruction

10

May encode that

demonstrators are

similar to protestors

A role-specific

projection matrix

Parallels to work on relation modeling (e.g., Bordes et al.,'11),

distributional semantics (e.g., Mikolov et al., '13) or (coupled)

tensor factorization (e.g., Yilmiz et al., '11)

If Agent of Assault is the

police, then Patient can be

demonstrators or protestors

Distributed vectors:

- encode semantic properties of argument a

- encode expectations about other argument given that a

is assigned to role r of frame f

Intuitively, score argument tuples according to the factorization:

The role and frame labeling model:

It can be any model as long as role and frame posteriors

and

can be computed (or approximated)

Component 2: frame + role prediction

A feature-rich representation encoding syntax-

semantics interface

The majority of supervised SRL models;

we used (Johansson and Nugues, '08)


Semantic frame prediction

( = Encoding)



x

For every structure, we aim to optimize the expectation of the

argument prediction quality given roles and frames:

Not very tractable in this exact form, usual 'tricks' are needed:

'mean field': substituting posterior means instead of marginalization

'negative sampling' (as, e.g., in Mikolov et al '13) instead of 'softmax'

Joint learning


Semantic frame prediction

( = Encoding)


demonstrator

Argument prediction




p(ai |a− i , r , f ,✓)

x

Training can be quite efficient as all models are linear (or

bilinear)

Outline

69





representations

Evaluation:



Issues? Other ideas?


induction

Preliminary experiments


charge(Agent: police, Patient: demonstrator, Instrument: baton)

demonstrator

Hidden

p(r |x, w)Feature-rich model


p(ai |a− i , r , v,✓)

x

Experiments: only role induction

Evaluate on a dataset annotated with roles (PropBank for En, SALSA for De)

Compare against previous models evaluated in this set-up

use clustering evaluation measures (purity, collocation, F1)

No frame induction – use

the predicate itself (v)

We replicate previous evaluation: datasets are fairly

small (e.g., ~ 90,000 predicate-argument structures for

English)

May not be the optimal

set-up for our

expressive model

English (F1)

Previous approaches evaluated

in the same setting

Optimal deterministic

mapping from syntactic

relations

The feature-rich model

Logistic: Lang and Lapata ('10)

GraphP: Lang and Lapata ('11a)

Linking: Fürstenau and Rambow ('12)

Aggl: Lang and Lapata ('11b)

Order: Garg and Henderson ('12)

Aggl+: Lang and Lapata ('14)

Bayes: Titov and Klementiev ('12)

Performs on par with best

methods (without

language-specific priors)

Induces fewer roles than

most other approaches but

under certain regimes,

roles start to capture verb

senses

A new framework for inducing frames

allowing for combine ideas from relation modeling and supervised role labeling

more data, more languages, other factorizations, …

The framework naturally supports:

Semi-supervised learning

In principle, the reconstruction objective can be easily extended with the conditional likelihood objective on labeled data

Learning for inference

Modeling entities as arguments and coupling factorizations across relations

Document level reasoning and disambiguation

Again, factorizations provide a natural framework here (entity representation are specific to a document)

REM framework See also

Titov and Khoddam '14:

http://arxiv.org/abs/1412.2812

Conclusions

73

We know that rule-based semantic parsing is not the way to go

But (fully) supervised open-domain semantic parsing is not very

promising as well

What kind of resources can we leverage?

Un-annotated text?

Ontologies?

Linking between the two?

…

What kind of models should we use?

Generative (Bayesian) models?

Feature-rich models?

…

Thank you!

Special thanks to Dipanjan Das, Mike Kozhevnikov, Alexis Palmer, Manfred Pinkal, …

… to YOU for inviting me here!

Funding:

Google Focused Award on Natural Language Understanding 2013

Google Faculty Research Award 2011

German Research Foundation (DFG) / MMCI

Date post:	14-Jul-2015
Category:	Documents
Upload:	yandex
View:	426 times
Download:	0 times

Иван Титов — Inducing Semantic Representations from Text with Little or No Supervision

Documents