Inducing Semantic Representations from
Text
with Little or No Supervision
Ivan Titov
ILLC, University of Amsterdam
2
Alex Klementiev(now at Amazon)
Contributors…
Ehsan Khoddam(U. Amsterdam)
Ashutosh Modi(Saarland Uni)
Diego Marcheggiani
(CNR, Pisa)
Why semantic representations?
3
Question Answering about knowledge in a collection of biomedical
publications:
Question: What does cyclosporin A suppress?
Answer: expression of EGR-2
Sentence: As with EGR-3 , expression of EGR-2 was blocked by cyclosporin A .
Question: What inhibits tnf-alpha?
Answer: IL -10
Sentence: Our previous studies in human monocytes have demonstrated that interleukin ( IL )
-10 inhibits lipopolysaccharide ( LPS ) -stimulated production of inflammatory
cytokines , IL-1 beta , IL-6 , IL-8 , and tumor necrosis factor alpha by blocking gene
transcription .
We need to abstract away from specific syntactic and lexical
realizations
Other applications
4
Shown beneficial in:
Machine Translation [Wu and Fung, 2009; Liu and Gildea, 2010, Gao and Vogel, 2011, …]
Dialogue Systems [Basili et al., 2009; van der Plas et al., 2011]
Predicting whether one text fragment logically follows from another, called
Textual Entailment [Sammons et al., 2009]
Authorship attribution [Hedegaard and Simonsen, 2011]
(among others)
Even though we are not yet very good at predicting these
representations
A more direct application – machine reading: extract knowledge from
texts into a form of knowledge base
Outline
5
Task: inducing semantic representations without labeled data
Signals: exploiting signals implicit in unlabeled data
Model and Inference: a hierarchical Bayesian model defining the
process of joint generation of semantic, syntactic and lexical
representations
Evaluation:
results on an human-annotated corpus
a question-answering task from the bio-medical domain
Shortcomings? Other ideas?
Reconstruction error minimization (REM) framework for semantic parser
induction
From Syntax to Semantics
6
Robust syntactic parsers [Collins 1999, Charniak 2001, Petrov and Klein 2006, McDonald
2005, Titov and Henderson 2007] available for tens of languages
However, syntactic analyses are a long way from representing the meaning of
sentences
In other words, they do not specify the underlying predicate argument structure
Specifically, they do not define Who did What to Whom (and
How, Where, When, Why, …)
Frame Semantics
7
A semantic frame [Fillmore 1968] is a conceptual structure describing a situation, object, or event along with associated properties and participants
Example: CLOSURE / OPENING frame
Jack opened the lock with a paper clip
Semantic Roles (aka Frame Elements):
AGENT – an initiator/doer in the event [Who?]
PATIENT - an affected entity [to Whom / to What?]
INSTRUMENT – the entity manipulated to accomplish the goal
Other roles for CLOSURE/OPENING frame: BENEFICIARY, FASTENER, DEGREE,
CIRCUMSTANCES, MANIPULATOR, PORTAL, …
Syntactic-Semantic Interface
8
Though syntactic and lexical representations are often predictive of
the predicate argument structure, this relation is far from trivial:
(1) John broke the window (4) John busted the window
(2) The window broke (5) The window was destroyed by John
(3) The window was broken by John (6) John tore down the window
Semantic Roles:
AGENT – an initiator/doer in the event [Who?]
PATIENT - an affected entity [to Whom / to What?]
The same relation is
encoded by different
predicates (incl. a
multiword expression)
Alternations
Supervised learning of semantic representations is challenging:
datasets provide low coverage, are domain-specific and available
only for a few languages
Our task
9
Semantics is encoded by semantic dependency graphs [Johansson, 2008]
Arguments often evoke their own frames
Arguments and predicates often expressed by multiword expressions
Induce these representations automatically from unannotated texts
woreMary an evening dress from Cardin
Wearer
Wearing
Clothing Creator
Style
GarmentPerson Occasion Brand
woreMary a dress
Wearer
Wearing
Clothing
For simplicity we assume that all
arguments evoke frames
gave an orderPeter the Great build castle
Speaker
Request
Message Created Entity
ConstructionPerson Buildings
to the
gave an orderPeter the Great build castle
9
621
3 11
23951121 3
to the
woreMary an evening dress from Cardin
12
573
5 21
7
8971121 445 1621
woreMary a dress
12
573
5
Our task
10
Semantics is encoded by semantic dependency graphs [Johansson, 2008]
Arguments often evoke their own frames
Arguments and predicates often expressed by multiword expressions
Induce these representations automatically from unannotated texts
For simplicity we assume that all
arguments evoke frames
Outline
11
Task: inducing semantic representations without labeled data
Signals: exploiting signals implicit in unlabeled data
Model and Inference: a hierarchical Bayesian model defining the
process of joint generation of semantic, syntactic and lexical
representations
Evaluation:
results on an human-annotated corpus
a question-answering task from the bio-medical domain
Shortcomings? Other ideas?
Reconstruction error minimization (REM) framework for semantic parser
induction
Induction of Frame-Semantic Information
12
The semantic induction task involves 3 sub-tasks
Construction of a transformed syntactic dependency graph (~ argument
identification)
gave Peter the Great build wooden fortified castlean order
Person Request BuildingsBeing_ProtectedConstruction
to
Material
a
Induction of Frame-Semantic Information
13
The semantic induction task involves 3 sub-tasks
Construction of a transformed syntactic dependency graph (~ argument
identification)
Induction of frames (and clusters of arguments)
gave Peter the Great build wooden fortified castlean order
Person Request
Speaker Message
Created Entity
BuildingsBeing_ProtectedConstruction
to
Material
Material
a
Type
Induction of Frame-Semantic Information
14
The semantic induction task involves 3 sub-tasks
Construction of a transformed syntactic dependency graph (~ argument
identification)
Induction of frames (and clusters of arguments)
Role Induction We model these sub-tasks
jointly within our probabilistic
model
Different from most previous work where each subtask
was tackled in isolation
Handled with a simple
heuristic or a simple
classifier
Outline
15
Task: inducing semantic representations without labeled data
Signals: exploiting signals implicit in unlabeled data
induction of semantic classes (frames and argument clusters)
induction of semantic roles
Model and Inference: a hierarchical Bayesian model defining the
process of joint generation of semantic, syntactic and lexical
representations
Evaluation:
results on an human-annotated corpus
a question-answering task from the bio-medical domain
Shortcomings? Other ideas?
Reconstruction error minimization (REM) framework for semantic parser
induction
Induction of Semantic Classes: Definition
16
Induction of frames and induction of argument clusters – the same task
We will refer to both of them as semantic classes
Induction of semantic classes involves:
Clustering of lexemes with similar meaning
break, bust, destroy should be clustered together
Detection of multi-word expression, i.e. expressions which are not (sufficiently)
compositional
these includes idiomatic expressions, terminology, proper nouns, …
E.g., hold a victory over, red herringLater, they can be clustered with atomic ones.
E.g., win + held a victory over
Now we will discuss 3 signals of semantic-relatedness we
encode in our model to induce the clusterings
Induction of Semantic Classes: Signal 1
17
Selectional preferences: roles for related predicates are filled with similar
arguments.
Top argument fillers from the PropBank dataset
to wear:
WEARER role: lawyer, employee, French, judge, woman, …
CLOTHING role: hat, uniform, suit, nothing, clothes, …
to dress (in):
WEARER role: woman, attendant, employee, defendant, investigator, …
CLOTHING role: uniform, jeans, shirts, leather, …
In this definition, we implicitly rely on the notion of roles but they are also latent
In fact, we model a distribution over (latent) semantic classes
PEOPLE, ANIMALS, …
GARMENT, HEADWEAR, …
PEOPLE, ANIMALS, …
GARMENT, …
joint learning of all semantic classes and roles may be
beneficial
Induction of Semantic Classes: Signal 2
18
Inverse of the previous one: similar arguments fill slots for similar sets of
predicates
from PropBank dataset
shirt:
an argument for predicates: dress, wear, buy, display, ..
uniform:
an argument for predicates: wear, have, dress, don, …
Again, we model not a distribution over lexemes but a distribution over (latent)
semantic classes
WEARING, COMMERCE_BUY, …
WEARING, POSSESSION, …
Induction of Semantic Classes: Signal 3
19
Levin classes, that is groups of expressions (esp. verbs) representing
similar mapping between syntactic and semantic roles are more likely to
be semantically related [Levin, 1993]:
Verbs of Transfer of a Message (teach, show, read, …)
Dative alternation:
John taught linguistics to the students
John taught the students linguistics
Constructions:
John taught the students that …
Again, relies on latent semantic roles
Not sufficient on its own but can be a useful signal
Only coarse modeling of this signal
Some of manually produced Levin
classes are not semantically
coherent [Levin, 1993]
Outline
20
Task: inducing semantic representations without labeled data
Signals: exploiting signals implicit in unlabeled data
induction of semantic classes (frames and argument clusters)
induction of semantic roles
Model and Inference: a hierarchical Bayesian model defining the
process of joint generation of semantic, syntactic and lexical
representations
Evaluation:
results on an human-annotated corpus
a question-answering task from the bio-medical domain
Shortcomings? Other ideas?
Reconstruction error minimization (REM) framework for semantic parser
induction
Though after argument and semantic class identification and we know where arguments are, we do not know their semantic roles
The step can be regarded as clustering of argument occurrences for a given semantic class
Induction of Semantic Roles: Definition
21
taughtJohn linguistics
Teaching
studentsto the
taughtDave the students
Teaching
machine learning
taughtattendants how to fly
Teaching
wereThe
Though after argument and semantic class identification and we know where arguments are, we do not know their semantic roles
The step can be regarded as clustering of argument occurrences for a given semantic class
The search space is huge – in realistic datasets frequents semantic classes appear tens of thousands times
Induction of Semantic Roles: Definition
22
We need to “color” themtaughtJohn linguistics
Teaching
studentsto the
taughtDave the students
Teaching
machine learning
taughtattendants how to fly
Teaching
wereThe
Role 1 Role 2 Role 3
Role 2Role 1 Role 3
Role 2Role 3
Argument Keys
23
We identify arg occurrences with syntactic signatures (argument
keys)
E.g., some simple alternations like locative preposition drop
Argument keys are designed so that to map mostly to a single role
Instead of clustering occurrences we cluster argument keys
Here, we would cluster ACTIVE:RIGHT:OBJ and ACTIVE:RIGHT:PMOD_up
together
More complex alternations require multiples pairs of arg keys clustered
ACTIVE:RIGHT:PMOD_up
ACTIVE:RIGHT:OBJ
climbedMary mountain
Motion
up the
climbedMary mountain
Motion
the
Role 1
Role 1
Role 2
Role 2
Induction of Semantic Roles: Signals
24
Selectional preferences:
Two argument keys are likely to correspond to the same role if the corresponding sets of arguments are similar
Duplicate roles are unlikely to occur
E.g., coloring the sentence as
is a bad idea
Conjunctions are handled
differentlytaughtJohn linguistics
Teaching
studentsto the
Role 1Role 2
Role 2
Modeling assumptions
25
(1) For roles, the distribution over classes of argument fillers is sparse
(2) Each semantic class can be verbalized as a sparse distribution over lexemes or syntactic tree fragments
(3) Semantically-similar predicates have the same linking between syntax and semantics
(4) The same semantic role rarely appears twice
(5) Argument key clusterings for different predicates are related
(Titov and Klementiev, ACL 2011, EACL 2012)
How to encode this in a statistical model?
Outline
26
Task: inducing semantic representations without labeled data
Signals: exploiting signals implicit in unlabeled data
Model and Inference: a hierarchical Bayesian model defining the
process of joint generation of semantic, syntactic and lexical
representations
Evaluation:
results on an human-annotated corpus
a question-answering task from the bio-medical domain
Shortcomings? Other ideas?
Reconstruction error minimization (REM) framework for semantic parser
induction
Inducing Semantics
27
Given a (large) collection of sentences annotated with
(transformed) syntactic dependencies
woreMary an evening dress from Cardin
…
Given a (large) collection of sentences annotated with
(transformed) syntactic dependencies
We want to induce its semantics , i.e. its segmentation
and clustering
gave Peter the Great build wooden fortified castlean order
Person Request
Speaker Message
Created Entity
BuildingsBeing_ProtectedConstruction
to
Material
Material
a
Type
woreMary an evening dress from Cardin
Wearer
Wearing
Clothing Creator
Style
GarmentPerson Occasion Brand
Inducing Semantics
28
…
gave Peter the Great build wooden fortified castlean order
1121 621
9 3
11
33332395
to
78
24
a
4
woreMary an evening dress from Cardin
12
573
5 21
7
8971121 445 1621
Given a (large) collection of sentences annotated with
(transformed) syntactic dependencies
We want to induce its semantics , i.e. its segmentation
and clustering
Inducing Semantics
29
…
Induction with a Generative Model
30
Define a family of generative models encoding our assumptions:
,
where is the model parameters
In the prior probability over parameters, we encode our beliefs:
We want to find the maximum a posteriori semantics given the
observable data
We use this to encode sparsity of
distributions: semantic classes can
be expressed only in a small
number of ways
The (Simplified) Model
31
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
The (Simplified) Model
32
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Request
The (Simplified) Model
33
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Draw synt/lex
realization
gave an order
Request
{ We use hierarchical Dirichlet processes
to represent distributions over tree
fragments }
The (Simplified) Model
34
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Draw synt/lex
realization
At least one
argument
gave an order
Request
Speaker
The (Simplified) Model
35
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Draw synt/lex
realization
At least one
argument
Draw first
argument
GenA r gument (c, t)
ac,t ∼ φc,t
cc,t ∼ θc,t
GenSemClass(cc,t )
gave an order
Request
Speaker
The (Simplified) Model
36
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Draw synt/lex
realization
At least one
argument
Draw first
argument
GenA r gument (c, t)
ac,t ∼ φc,t
cc,t ∼ θc,t
GenSemClass(cc,t )
Draw
argument key
gave an order
Request
Speaker
ACTIVE:LEFT:SBJ
The (Simplified) Model
37
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Draw synt/lex
realization
At least one
argument
Draw first
argument
GenA r gument (c, t)
ac,t ∼ φc,t
cc,t ∼ θc,t
GenSemClass(cc,t )
Draw
argument key
Draw
semantic
class for arg
gave an order
Person Request
Speaker
ACTIVE:LEFT:SBJ
The (Simplified) Model
38
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Draw synt/lex
realization
At least one
argument
Draw first
argument
GenA r gument (c, t)
ac,t ∼ φc,t
cc,t ∼ θc,t
GenSemClass(cc,t )
Draw
argument key
Draw
semantic
class for arg
Recurse
gave Peter the Great an order
Person Request
Speaker
ACTIVE:LEFT:SBJ
The (Simplified) Model
39
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Draw synt/lex
realization
At least one
argument
Draw first
argument
GenA r gument (c, t)
ac,t ∼ φc,t
cc,t ∼ θc,t
GenSemClass(cc,t )
Draw
argument key
Draw
semantic
class for arg
RecurseContinue
generation
gave Peter the Great an order
Person Request
Speaker
ACTIVE:LEFT:SBJ
The (Simplified) Model
40
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Draw synt/lex
realization
At least one
argument
Draw first
argument
GenA r gument (c, t)
ac,t ∼ φc,t
cc,t ∼ θc,t
GenSemClass(cc,t )
Draw
argument key
Draw
semantic
class for arg
RecurseContinue
generation
Draw more
arguments
gave Peter the Great buildan order
Person Request
Speaker Message
Constr.
to
ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ
The (Simplified) Model
41
GenSemClass(cr oot )
cr oot ∼ θr oot
f or each sentence :Draw
semantic
class for root
whi le [n ∼ ψ+c,t ] = 1 :
f or each r ole t = 1, . . . , T :
s ∼ φc
GenA r gument (c, t)
GenA r gument (c, t)
GenSemClass(c)
i f [n ∼ ψc,t ] = 1 :
Draw synt/lex
realization
At least one
argument
Draw first
argument
GenA r gument (c, t)
ac,t ∼ φc,t
cc,t ∼ θc,t
GenSemClass(cc,t )
Draw
argument key
Draw
semantic
class for arg
RecurseContinue
generation
Draw more
arguments
gave Peter the Great build fortified castlean order
Person Request
Speaker Message
Created Entity
BuildingsProtectedConstr.
to a
Type
ACTIVE:LEFT:SBJ ACTIVE:RIGHT:OBJ
ACTIVE:RIGHT:OBJ
-:LEFT:NMOD
Under the hood …
42
(1) For roles, the distribution over classes of argument fillers is sparse
We use a sparse prior, Hierarchical Dirichlet Processes [Teh et al, 05]
(2) Each semantic class can be verbalized as a sparse distribution over lexemes or syntactic tree fragments
Sparse priors over syntactic trees ( as in Bayesian TSGs [Cohn et al. 07])
(3) Semantically-similar predicates have the same linking between syntax and semantics
We use sparse Dirichlet priors to encode the linking
(4) The same semantic role rarely appears twice
Use a non-symmetric Dirichlet prior for the corresponding geom. distrib
(5) Argument key clusterings for different predicates are related
Induce a shared weighted graph used in a (distance-dependent) Chinese Restaurant Process [Blei and Frazer 11] prior for each clustering
(Titov and Klementiev, ACL 2011, EACL 2012)
Previous approaches induce roles for each predicate independently
These clusterings define permissible alternations
But many alternations are shared across verbs
Can we share this information across verbs?
Joint learning of roles across predicates
43
John gave the book to
Mary Mike threw the ball to me
Dative alternation
vs John gave Mary the
bookvs Mike threw me the ball
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:
LGS-by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
Idea: keep track of how likely a pair of argument keys should be
clustered
Define a similarity matrix (or similarity graph)
A Bayesian model for role labeling
44
Similarity score between PASS:LEFT:SBJ and ACT:RIGHT:OBJ
A Bayesian model for role labeling
45
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:
LGS-by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:
LGS-
by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:
LGS-by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
open overtake
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:
LGS-by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:
LGS-
by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:
LGS-by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
A Bayesian model for role labeling
46
open overtake
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:LG
S-by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:
LGS-
by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
...
...
...
... ... ...
ACT:
RIGHT:
OBJ
ACT:
LEFT
:SBJ
PASS
:RIG
HT:LG
S-by
PASS
:LEF
T:SB
J
ACT:RIGHT:OBJ
ACT:LEFT:SBJ
PASS:LEFT:SBJ
PASS:RIGHT:LGS-by
...
...
A Bayesian model for role labeling
47
open overtake
A formal way to encode this: dd-CRP
48
Can use CRP to define a prior on the partition of argument keys:
The first customer (argument key) sits the first table (role)
m-th customer sits at a table according to:
An extension is distance-dependent CRP (dd-CRP):
m-th customer chooses a customer to sit with according to:
. . .2
1
34
5
6
7
State of the
restaurant once m-1
customers are seated
Entire similarity
graph
Similarity between
customers m and j
1 2 3 4 5 6 7
Encodes rich-get-
richer dynamics but
not much more than
that
Footnote: marginal invariance, BNP view
Inference
49
We use an iterative sampling algorithm for inference
On every step, a sampler attempts a random change in labeling of latent
semantic representation .
Roughly, it keeps the relabeling if the probability increases, and rejects it,
otherwise.
Inference is challenging as the search space is huge
We define the following types of moves (‘relabelings’)
Role-Syntax alignment
Choose a new clustering of argument keys for a frame
Split – Merge
Merge 2 semantic classes together or split one class in two
Compose-Decompose
Compose fragments of syntactic tree to form a new realization or split a
fragment
break + bust
held + a victory = held a victory
See an alternative sampler for our model by Rabinovich and Ghahramani
(NIPS LS '14)
Outline
50
Task: inducing semantic representations without labeled data
Signals: exploiting signals implicit in unlabeled data
Model and Inference: a hierarchical Bayesian model defining the
process of joint generation of semantic, syntactic and lexical
representations
Evaluations:
results on an human-annotated corpus
a question-answering task from the bio-medical domain
Shortcomings? Other ideas?
Reconstruction error minimization (REM) framework for semantic parser
induction
70
75
80
85
90
Llogis c GraphPart SplitMergeMonoBayes SyntF
Benchmark Dataset: PropBank (CoNLL 08)
51
Evaluation of semantic role induction
Purity measures the degree to which each induced role contains arguments
sharing the same gold (“true”) role
Collocation evaluates the degree to
which arguments with the same gold
roles are assigned to a single induced role
Gold role Induced role
State-of-the-art
approaches
Optimal deterministic mapping
from syntactic relations
Our model
F1, Harmonic mean of PU and
CO
The improvement is large for this problem and we are the first
to outperform the syntactic baseline by a substantial margin
Benchmark Dataset: PropBank (CoNLL 08)
52
Looking into induced graph encoding ‘priors’ over clustering
arguments keys, the most highly ranked pairs encode (or
partially encode)
Passivization
Near-equivalence of subordinating conjunctions and
prepositions
E.g., whether and if
Benefactive alternation
Martha carved a doll for the baby
Martha carved the baby a doll
Dative alternation
I gave the book to Mary
I gave Mary the book
Recovery of unnecessary splits introduced by argument keys
Encoded as (ACTIVE:RIGHT:OBJ_if,
ACTIVE:RIGHT:OBJ_whether)
Application-based Evaluation
53
Question Answering about knowledge in a corpus of biomedical
abstracts
Dataset: 1999 biomedical abstracts from the Genia corpus (Kim et al, 2003)
Examples of induced semantic classes:Class Variations
1 motif, sequence, regulatory element, response element, element, dna sequence
2 donor, individual, subject
3 important, essential, critical
4 dose, concentration
5 activation, transcriptional activation, transactivation
6 b cell, t lymphocyte, thymocyte, b lymphocyte, t cell, t-cell line, human lymphocyte,
t-lymphocyte
7 indicate, reveal, document, suggest, demonstrate
8 augment, abolish, inhibit, convert, cause, abrogate, modulate, block, decrease,
reduce, diminish, suppress, up-regulate, impair, reverse, enhance
9 confirm, assess, examine, study, evaluate, test, resolve, determine, investigate
10 nf-kappab, nf-kappa b, nfkappab, nf-kb
Roughly “cause
change position
on a scale”
frame
Blood cells
Application-based Evaluation
54
Question Answering about knowledge in a corpus of biomedical
abstracts
Example questions and answers:
Question: What does cyclosporin A suppress?
Answer: expression of EGR-2
Sentence: As with EGR-3 , expression of EGR-2 was blocked by cyclosporin A .
Question: What inhibits tnf-alpha?
Answer: IL -10
Sentence: Our previous studies in human monocytes have demonstrated that interleukin ( IL )
-10 inhibits lipopolysaccharide ( LPS ) -stimulated production of inflammatory
cytokines , IL-1 beta , IL-6 , IL-8 , and tumor necrosis factor alpha by blocking gene
transcription .
Application-based Evaluation
55
Question Answering about knowledge in a corpus of biomedical
abstracts
More than 55% of mistakes are
due to over coarse clustering in
3 semantic classes (antonymy /
hyponymy)
This workKeyword
matching
Standard information extraction
methods
So far …
56
We proposed a method for learning semantics with no supervision
Joint induction of multiword expressions, semantic classes and roles
Substantially outperforms alternatives (pipelines / heuristic approaches /
…)
Extensions:
Semi-supervised learning [COLING 2012]
Cross-lingual extensions [ACL 2012,…]
Induction of `scripts' (how frames are organized into scenarios) [EACL
2014]
But …
[more details in ACL '11, EACL '12]
Unsupervised frame and role induction
The models rely on very restricted sets of features
not very effective in the semi-supervised set-up, and not very appropriate for languages with freer order than English
… over-rely on syntax
not going to induce, e.g., "X sent Y = Y is a shipment from X"
… use language-specific priors
a substantial drop in performance if no adaptation
… not (quite) appropriate for inference
not only no inference models but also opposites and antonyms (e.g., increase + decrease) are typically grouped together; induced granularity is often problematic; …
In contrast to supervised
methods to frame-semantic
parsing / semantic role labeling
How can we induce frames in a less restrictive feature-rich framework
and tackle other challenges along the way?
Outline
58
Task: inducing semantic representations without labeled data
Signals: exploiting signals implicit in unlabeled data
Model and Inference: a hierarchical Bayesian model defining the
process of joint generation of semantic, syntactic and lexical
representations
Evaluation:
results on an human-annotated corpus
a question-answering task from the bio-medical domain
Issues? Other ideas?
Reconstruction error minimization (REM) framework for semantic parser
induction
Preliminary experiments
Feature-rich models of semantic frames
Consider a frame realization For simplicity: focus on frame
and role labeling (no
identification + one frame per
sentence)
[Titov and Khoddam, '14]
Consider a frame realization
Feature-rich models of semantic frames
How can we define a feature-rich model for
unsupervised induction of roles and frames?
For simplicity: focus on frame
and role labeling (no
identification + one frame per
sentence)
[Titov and Khoddam, '14]
Feature representation of "The police charged... " ( )
Semantic role prediction
( = Encoding)
Assault(Agent: police, Patient: demonstrator, Instrument: baton)
demonstrator
Argument prediction
( = Reconstruction)Hidden
p(r , f |x, w)Feature-rich model
"Argument prediction" model
p(ai |a− i , r , f ,✓)
x
Argument reconstruction
Consider a frame realization
Any existing
supervised role labeler
would do
Hypothesis: semantic roles and frames are the latent
representation which helps to reconstruct arguments
[Titov and Khoddam, '14]
Argument reconstruction
Consider a frame realization
How do the components look like and how
do we estimate them jointly?
[Titov and Khoddam, '14]
Feature representation of "The police charged... " ( )
Semantic role prediction
( = Encoding)
Assault(Agent: police, Patient: demonstrator, Instrument: baton)
demonstrator
Argument prediction
( = Reconstruction)Hidden
p(r , f |x, w)Feature-rich model
"Argument prediction" model
p(ai |a− i , r , f ,✓)
xAny existing
supervised role labeler
would do
Neural autoencoders [Hinton '99, Vincent et al. 08]:
but
… applicable not only to neural models
… reconstruction and encoding components can belong to different model families
… no need to reconstruct the entire input
Reconstruction-error minimizationTrained to minimize the
reconstruction error,
for example, e.g., Reconstructed
input
Encoding
Reconstruction
Input
y 2 RpLatent representation
x 2 Rm
x̃ 2 Rm
See Ammar et al. (NIPS 2014) and also Daumé (ICML 09)
Argument reconstruction
Consider a frame realization
Feature representation of "The police charged... " ( )
Semantic role prediction
( = Encoding)
Assault(Agent: police, Patient: demonstrator, Instrument: baton)
demonstrator
Argument prediction
( = Reconstruction)Hidden
p(r , f |x, w)Feature-rich model
"Argument prediction" model
p(ai |a− i , r , f ,✓)
x
Tensor
factorization
A (structured)
linear
model
Feature representation of "The police charged... " ( )
Semantic role prediction
( = Encoding)
Assault(Agent: police, Patient: demonstrator, Instrument: baton)
demonstrator
Argument prediction
( = Reconstruction)Hidden
p(r , f |x, w)Feature-rich model
"Argument prediction" model
p(ai |a− i , r , f ,✓)
x
Distributed vectors:
- encode semantic properties of argument a
- encode expectations about other argument given that a
is assigned to role r of frame f
The reconstruction model ('softmax'):
Component 1: argument reconstruction
May encode that
demonstrators are
similar to protestors
If Agent of Assault is the
police, then Patient can be
demonstrators or protestors
A role-specific
projection matrix
Feature representation of "The police charged... " ( )
Semantic role prediction
( = Encoding)
Assault(Agent: police, Patient: demonstrator, Instrument: baton)
demonstrator
Argument prediction
( = Reconstruction)Hidden
p(r , f |x, w)Feature-rich model
"Argument prediction" model
p(ai |a− i , r , f ,✓)
x
Component 1: argument reconstruction
10
May encode that
demonstrators are
similar to protestors
A role-specific
projection matrix
Parallels to work on relation modeling (e.g., Bordes et al.,'11),
distributional semantics (e.g., Mikolov et al., '13) or (coupled)
tensor factorization (e.g., Yilmiz et al., '11)
If Agent of Assault is the
police, then Patient can be
demonstrators or protestors
Distributed vectors:
- encode semantic properties of argument a
- encode expectations about other argument given that a
is assigned to role r of frame f
Intuitively, score argument tuples according to the factorization:
The role and frame labeling model:
It can be any model as long as role and frame posteriors
and
can be computed (or approximated)
Component 2: frame + role prediction
A feature-rich representation encoding syntax-
semantics interface
The majority of supervised SRL models;
we used (Johansson and Nugues, '08)
Feature representation of "The police charged... " ( )
Semantic frame prediction
( = Encoding)
Assault(Agent: police, Patient: demonstrator, Instrument: baton)
p(r , f |x, w)Feature-rich model
x
For every structure, we aim to optimize the expectation of the
argument prediction quality given roles and frames:
Not very tractable in this exact form, usual 'tricks' are needed:
'mean field': substituting posterior means instead of marginalization
'negative sampling' (as, e.g., in Mikolov et al '13) instead of 'softmax'
Joint learning
Feature representation of "The police charged... " ( )
Semantic frame prediction
( = Encoding)
Assault(Agent: police, Patient: demonstrator, Instrument: baton)
demonstrator
Argument prediction
( = Reconstruction)Hidden
p(r , f |x, w)Feature-rich model
"Argument prediction" model
p(ai |a− i , r , f ,✓)
x
Training can be quite efficient as all models are linear (or
bilinear)
Outline
69
Task: inducing semantic representations without labeled data
Signals: exploiting signals implicit in unlabeled data
Model and Inference: a hierarchical Bayesian model defining the
process of joint generation of semantic, syntactic and lexical
representations
Evaluation:
results on an human-annotated corpus
a question-answering task from the bio-medical domain
Issues? Other ideas?
Reconstruction error minimization (REM) framework for semantic parser
induction
Preliminary experiments
Feature representation of "The police charged... " ( )
charge(Agent: police, Patient: demonstrator, Instrument: baton)
demonstrator
Hidden
p(r |x, w)Feature-rich model
"Argument prediction" model
p(ai |a− i , r , v,✓)
x
Experiments: only role induction
Evaluate on a dataset annotated with roles (PropBank for En, SALSA for De)
Compare against previous models evaluated in this set-up
use clustering evaluation measures (purity, collocation, F1)
No frame induction – use
the predicate itself (v)
We replicate previous evaluation: datasets are fairly
small (e.g., ~ 90,000 predicate-argument structures for
English)
May not be the optimal
set-up for our
expressive model
English (F1)
Previous approaches evaluated
in the same setting
Optimal deterministic
mapping from syntactic
relations
The feature-rich model
Logistic: Lang and Lapata ('10)
GraphP: Lang and Lapata ('11a)
Linking: Fürstenau and Rambow ('12)
Aggl: Lang and Lapata ('11b)
Order: Garg and Henderson ('12)
Aggl+: Lang and Lapata ('14)
Bayes: Titov and Klementiev ('12)
Performs on par with best
methods (without
language-specific priors)
Induces fewer roles than
most other approaches but
under certain regimes,
roles start to capture verb
senses
A new framework for inducing frames
allowing for combine ideas from relation modeling and supervised role labeling
more data, more languages, other factorizations, …
The framework naturally supports:
Semi-supervised learning
In principle, the reconstruction objective can be easily extended with the conditional likelihood objective on labeled data
Learning for inference
Modeling entities as arguments and coupling factorizations across relations
Document level reasoning and disambiguation
Again, factorizations provide a natural framework here (entity representation are specific to a document)
REM framework See also
Titov and Khoddam '14:
http://arxiv.org/abs/1412.2812
Conclusions
73
We know that rule-based semantic parsing is not the way to go
But (fully) supervised open-domain semantic parsing is not very
promising as well
What kind of resources can we leverage?
Un-annotated text?
Ontologies?
Linking between the two?
…
What kind of models should we use?
Generative (Bayesian) models?
Feature-rich models?
…