Compositional Lexical Semantics for Natural Language Inference
Thesis Defense Ellie Pavlick
Department of Computer and Information Science University of Pennsylvania
what is the population of new york city?
what is the population of new york city?
what is the population of new york city?
what is the population of new york city?
“What is Saturday afternoon’s forecast?”
“Will it be sunny this weekend in Miami?”
“What’s the weather going to be like this
weekend?”
how many people live in nyc?
new york city population
how big is new york city? how crowded is ny?
number of residents of nyc
“Is it going to be nice out on Saturday?”
Human language is highly variable.
In leaked audio, Clinton talks about Sanders supporters “living in basement”
in hacked fundraiser recording • in leaked
recording • in audio from hacked email • privately •
hacked audio:
mocks • said • insults • characterizes • comments on •
gives frank take on • slams • calls • knocks • describes
bernie supporters • millennials •
sanders supporters • young voters • bernie sanders
supporters • bernie kids • bernie fans
losers who live in their parents'
basements • basement dwellers • frustrated basement-
dwellers • basement-dwellers & baristas
Hillary • Hillary Clinton • HRC
In leaked audio, Clinton talks about Sanders supporters “living in basement”
in hacked fundraiser recording • in leaked
recording • in audio from hacked email • privately •
hacked audio:
mocks • said • insults • characterizes • comments on •
gives frank take on • slams • calls • knocks • describes
bernie supporters • millennials •
sanders supporters • young voters • bernie sanders
supporters • bernie kids • bernie fans
losers who live in their parents'
basements • basement dwellers • frustrated basement-
dwellers • basement-dwellers & baristas
Hillary • Hillary Clinton • HRC
In leaked audio, Clinton talks about Sanders supporters “living in basement”
How to we know when two different expressions in natural language have
the same meaning?
in hacked fundraiser recording • in leaked
recording • in audio from hacked email • privately •
hacked audio:
mocks • said • insults • characterizes • comments on •
gives frank take on • slams • calls • knocks • describes
bernie supporters • millennials •
sanders supporters • young voters • bernie sanders
supporters • bernie kids • bernie fans
losers who live in their parents'
basements • basement dwellers • frustrated basement-
dwellers • basement-dwellers & baristas
Hillary • Hillary Clinton • HRC
In leaked audio, Clinton talks about Sanders supporters “living in basement”
How to we know when two similar expressions in natural language have
a different meaning?
In leaked audio, Clinton talks about Sanders supporters “living in basement”
In leaked recording=
Logical Inference
In leaked audio, Clinton talks about Sanders supporters “living in basement”
In hacked fundraiser recording
In leaked recording⊂
=
Logical Inference
In leaked audio, Clinton talks about Sanders supporters “living in basement”
PrivatelyIn hacked fundraiser recording
In leaked recording⊂ ⊂
=
Logical Inference
In leaked audio, Clinton talks about Sanders supporters “living in basement”
PrivatelyIn hacked fundraiser recording
In leaked recording⊂ ⊂
=
Logical Inference
Common Sense Inference
In leaked audio, Clinton talks about Sanders supporters “living in parents’ basement”
PrivatelyIn hacked fundraiser recording
In leaked recording⊂ ⊂
=
basement-dwellers
=
Logical Inference
Common Sense Inference
Stylistics
In leaked audio, Clinton talks about Sanders supporters “living in parents’ basement”
PrivatelyIn hacked fundraiser recording
In leaked recording⊂ ⊂
=
basement-dwellers
=
Logical Inference
Common Sense Inference Stylistics
In leaked audio, Clinton talks about Sanders supporters “living in parents’ basement”
Natural Language Inference
Natural Language Inference(aka Recognizing Textual Entailment)
Natural Language Inference
In leaked audio, Clinton talks about Sanders supporters living in basement
(aka Recognizing Textual Entailment)
Natural Language Inference
In leaked audio, Clinton talks about Sanders supporters living in basement
Hillary Clinton privately slams millennials as basement-dwellers
(aka Recognizing Textual Entailment)
Natural Language Inference
In leaked audio, Clinton talks about Sanders supporters living in basement
Hillary Clinton privately slams millennials as basement-dwellers
premise
hypothesis
(aka Recognizing Textual Entailment)
Natural Language Inference
In leaked audio, Clinton talks about Sanders supporters living in basement
Hillary Clinton privately slams millennials as basement-dwellers
(aka Recognizing Textual Entailment)
p entails h if “typically, a human reading p would infer that h is most likely true.”
The Pascal Recognising Textual Entailment Challenge. Dagan et al. (2006)
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
Introduction
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
artist
composer
Introduction
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun CompositionAmerican composer
Introduction
American composer
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
composer
Introduction
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
American composer
Charles Mingus
Introduction
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
Introduction
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
artist
composer
Introduction
Natural Language Inference
In leaked audio, Clinton talks about Sanders supporters living in basement
Hillary Clinton privately slams millennials as basement-dwellers
Natural Language Inference
In leaked audio, Clinton talks about Sanders supporters living in basement
Hillary Clinton privately slams millennials as basement-dwellers
lives in basement
is a basement-dwellerEquivalence
Natural Language Inference
In leaked audio, Clinton talks about Sanders supporters living in basement
Hillary Clinton privately slams millennials as basement-dwellers
privatelyForward Entailment in leaked
audio
Natural Language Inference
In leaked audio, Clinton talks about Sanders supporters living in basement
Hillary Clinton privately slams millennials as basement-dwellers
talks aboutReverse Entailment
slams
Natural Language Inference
In leaked audio, Clinton talks about Sanders supporters living in basement
Hillary Clinton privately slams millennials as basement-dwellers
Independent millennialsSanders supporters
Natural Language Inference
At a press conference, Clinton talks about Sanders supporters living in basement
Hillary Clinton privately slams millennials as basement-dwellers
Exclusion privatelyat a press conference
Equivalence x ⟺ y
Reverse Entailment x ⇒ y
Forward Entailment y ⇒ x
Independence x ⇏ y ⋀ y ⇏ x
Exclusion x ⇒ ¬y ⋀ y ⇒ ¬x
felinecat
animal
cat
cat pet
cat dog
animal
cat
Lexical Semantics Resources
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
WordNet. Fellbaum (1998)
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
Lexical Semantics Resources
Bilingual Pivoting
ahogados a la playa ...
get washed up on beaches ...
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
Lexical Semantics Resources
Paraphrasing with bilingual parallel corpora. Bannard and Callison-Burch (2005)
Vector Space Models
Bilingual Pivoting
ahogados a la playa ...
get washed up on beaches ...
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
Lexical Semantics Resources
Vector Space Models
Bilingual Pivoting
ahogados a la playa ...
get washed up on beaches ...
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
x ⇒ y ⋀ y ⇏ x
Lexical Semantics Resources
Vector Space Models
Bilingual Pivoting
ahogados a la playa ...
get washed up on beaches ...
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
x shares some translation with y
Lexical Semantics Resources
Vector Space Models
Bilingual Pivoting
ahogados a la playa ...
get washed up on beaches ...
... fünf Landwirte , weil
... 5 farmers were in Ireland ...
...
oder wurden , gefoltert
or have been , tortured
festgenommen
thrown into jail
festgenommen
imprisoned
...
... ...
...
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
x appears in similar contexts as y
Lexical Semantics Resources
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer
talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicate
talk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subject
talk about≈express
talk about≈see
talk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture talk about≈explain
talk about≈job
talk about≈issue
talk about≈relate
talk about≈sustain
talk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocate
talk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈givetalk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
Lexical Semantics Resources
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer
talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicate
talk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subject
talk about≈express
talk about≈see
talk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture talk about≈explain
talk about≈job
talk about≈issue
talk about≈relate
talk about≈sustain
talk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocate
talk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈givetalk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
Data-Driven ModelsPrecise but Small
Big but Noisy
Lexical Semantics Resources
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer
talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicate
talk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subject
talk about≈express
talk about≈see
talk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture talk about≈explain
talk about≈job
talk about≈issue
talk about≈relate
talk about≈sustain
talk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocate
talk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈givetalk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
Data-Driven ModelsPrecise but Small
Big but Noisy
Lexical Semantics ResourcesCan we build lexical entailment resources automatically and at scale…
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer
talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicate
talk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subject
talk about≈express
talk about≈see
talk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture talk about≈explain
talk about≈job
talk about≈issue
talk about≈relate
talk about≈sustain
talk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocate
talk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈givetalk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
WordNet
act
communicate
address
harangue
rant
perform
practice
walk through scrimmage
relay
talk about
descant
Data-Driven ModelsPrecise but Small
Big but Noisy
Lexical Semantics ResourcesCan we build lexical entailment resources automatically and at scale…
…while maintaining WordNet-level
precision and interpretability?
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicatetalk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subjecttalk about≈express
talk about≈seetalk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture
talk about≈explain
talk about≈job
talk about≈issue
talk about≈relatetalk about≈sustaintalk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocatetalk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈give
talk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
The Paraphrase Database
PPDB: The Paraphrase Database. Ganitkevich et al. (2013)
The Paraphrase Database
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicatetalk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subjecttalk about≈express
talk about≈see talk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture
talk about≈explain
talk about≈job
talk about≈issue
talk about≈relatetalk about≈sustaintalk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocatetalk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈give
talk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
Entailment
EquivalenceExclusion
Independent
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicatetalk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subjecttalk about≈express
talk about≈seetalk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture
talk about≈explain
talk about≈job
talk about≈issue
talk about≈relatetalk about≈sustaintalk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocatetalk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈give
talk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
The Paraphrase Database
Distributional Signals of Semantics
Distributional Signals of Semantics
Monolingual Contextual SimilaritiesLin and Pantel, 2001 (Alberta) Mikolov et al., 2013 (Google)
Pennington et al., 2014 (Stanford)
Distributional Signals of Semantics
…converted from classical work to abstract expressionism after hearing Russian composer Igor
Stravinsky's "Rite of Spring”…
…South African contemporary artist, with abstract expressionism work featuring key aesthetics of the
most sought after artists…
Monolingual Contextual SimilaritiesLin and Pantel, 2001 (Alberta) Mikolov et al., 2013 (Google)
Pennington et al., 2014 (Stanford)
Distributional Signals of Semantics
…converted from classical work to abstract expressionism after hearing Russian composer Igor
Stravinsky's "Rite of Spring”…
…South African contemporary artist, with abstract expressionism work featuring key aesthetics of the
most sought after artists…
Monolingual Contextual SimilaritiesLin and Pantel, 2001 (Alberta) Mikolov et al., 2013 (Google)
Pennington et al., 2014 (Stanford)
Contextual Similarities
Stre
ngth
sWea
knes
ses
Stre
ngth
sWea
knes
ses
dad/fathervs.
dad/lychee
Contextual Similarities
Stre
ngth
sWea
knes
ses
dad/fathervs.
dad/lychee
dad/fathervs.
dad/mom
Contextual Similarities
Distributional Signals of Semantics
Bilingual Translational Similarity Bannard and Callison-Burch, 2005 (Edinburgh)
Kok and Brockett, 2010 (MSR) Ganitkevitch et al., 2013 (Hopkins)
Distributional Signals of Semantics
…the directive include the extension to the period of
protection for composers…
…to favour the position of artists who have to travel
throughout the community…
…la directive comprennent la prolongation de la durée
de protection pour les artistes…
…favoriser la position des artistes qui doivent voyager à travers la communauté…
Bilingual Translational Similarity Bannard and Callison-Burch, 2005 (Edinburgh)
Kok and Brockett, 2010 (MSR) Ganitkevitch et al., 2013 (Hopkins)
Distributional Signals of Semantics
…the directive include the extension to the period of
protection for composers…
…to favour the position of artists who have to travel
throughout the community…
…la directive comprennent la prolongation de la durée
de protection pour les artistes…
…favoriser la position des artistes qui doivent voyager à travers la communauté…
Bilingual Translational Similarity Bannard and Callison-Burch, 2005 (Edinburgh)
Kok and Brockett, 2010 (MSR) Ganitkevitch et al., 2013 (Hopkins)
Bilingual TranslationsStre
ngth
sWea
knes
ses
dad/fathervs.
dad/lychee
dad/fathervs.
dad/mom
Contextual Similarities
Stre
ngth
sWea
knes
ses
Bilingual Translations
Contextual Similarities
dad/fathervs.
dad/lychee
dad/fathervs.
dad/mom
dad/fathervs.
dad/mom
Stre
ngth
sWea
knes
ses
Bilingual Translations
Contextual Similarities
dad/fathervs.
dad/lychee
dad/fathervs.
dad/mom
dad/fathervs.
dad/mom
dad/parent vs.
dad/lychee
Distributional Signals of Semantics
Lexico-Syntactic Patterns Hearst, 1992 (Berkeley)
Snow et al., 2006 (Stanford) Movshovitz-Attias and Cohen, 2015 (CMU)
Distributional Signals of Semantics
How do composers and other artists survive and work in today's musical theatre scene?
As Luciano Berio did in his “Recital for Cathy”, creative artists such as composers, theatre
directors, choreographs, video artists or even circus ...
Lexico-Syntactic Patterns Hearst, 1992 (Berkeley)
Snow et al., 2006 (Stanford) Movshovitz-Attias and Cohen, 2015 (CMU)
Distributional Signals of Semantics
How do composers and other artists survive and work in today's musical theatre scene?
As Luciano Berio did in his “Recital for Cathy”, creative artists such as composers, theatre directors, choreographs, video artists or even
circus ...
Lexico-Syntactic Patterns Hearst, 1992 (Berkeley)
Snow et al., 2006 (Stanford) Movshovitz-Attias and Cohen, 2015 (CMU)
Stre
ngth
sWea
knes
ses
Bilingual Translations
Contextual Similarities
dad/fathervs.
dad/lychee
dad/fathervs.
dad/mom
dad/fathervs.
dad/mom
dad/parent vs.
dad/lychee
Lexico-Syntactic Patterns
Stre
ngth
sWea
knes
ses
Bilingual Translations
Contextual Similarities
dad/fathervs.
dad/lychee
dad/fathervs.
dad/mom
dad/fathervs.
dad/mom
dad/parent vs.
dad/lychee
dad/parent vs.
dad/lychee
Lexico-Syntactic Patterns
Stre
ngth
sWea
knes
ses
Bilingual Translations
Contextual Similarities
dad/fathervs.
dad/lychee
dad/fathervs.
dad/mom
dad/fathervs.
dad/mom
dad/parent vs.
dad/lychee
dad/parent vs.
dad/lychee
Lexico-Syntactic Patterns
dad/fathervs.
dad/lychee
[ ]Lexico-Syntactic PatternsBilingual Translations
Logistic Regression
=
1 + e
.1
[ ] Contextual Similaritiesw1 w2 w3
[ ]P(equivalent) P(entailment) P(exclusion)
P(independent)
[ ]Lexico-Syntactic PatternsBilingual Translations
Logistic Regression
=
1 + e
.1
[ ] Contextual Similaritiesw1 w2 w3
[ ]P(equivalent) P(entailment) P(exclusion)
P(independent)
Predict a probability distribution based over entailment relations…
[ ]Lexico-Syntactic PatternsBilingual Translations
Logistic Regression
=
1 + e
.1
[ ] Contextual Similaritiesw1 w2 w3
[ ]P(equivalent) P(entailment) P(exclusion)
P(independent)
…based on all of the data-driven signals available.
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicatetalk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subjecttalk about≈express
talk about≈seetalk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture
talk about≈explain
talk about≈job
talk about≈issue
talk about≈relatetalk about≈sustaintalk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocatetalk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈give
talk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
The Paraphrase Database
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicatetalk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subjecttalk about≈express
talk about≈seetalk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture
talk about≈explain
talk about≈job
talk about≈issue
talk about≈relatetalk about≈sustaintalk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocatetalk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈give
talk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
The Paraphrase Database
talk about≈speak
talk about≈discuss
talk about≈tell
talk about≈say
talk about≈mention
talk about≈about
talk about≈mean
talk about≈spoken
talk about≈address
talk about≈said
talk about≈discussion
talk about≈refer talk about≈talking
talk about≈raise
talk about≈alone
talk about≈debate
talk about≈chat
talk about≈told
talk about≈argued
talk about≈feel
talk about≈maintaintalk about≈comment
talk about≈make
talk about≈add
talk about≈approach
talk about≈causetalk about≈deliberations
talk about≈ask
talk about≈added
talk about≈tackletalk about≈bet
talk about≈betcha
talk about≈treat
talk about≈communicatetalk about≈described
talk about≈know
talk about≈to
talk about≈stated
talk about≈deal
talk about≈topic
talk about≈subjecttalk about≈express
talk about≈seetalk about≈highlighttalk about≈consider
talk about≈question
talk about≈touch
talk about≈sound
talk about≈noted
talk about≈nurture
talk about≈explain
talk about≈job
talk about≈issue
talk about≈relatetalk about≈sustaintalk about≈insert
talk about≈causing
talk about≈confront talk about≈time
talk about≈covered
talk about≈put
talk about≈will
talk about≈cite
talk about≈advocatetalk about≈indicate
talk about≈please
talk about≈regard
talk about≈hear
talk about≈kidding
talk about≈read
talk about≈dispute
talk about≈give
talk about≈say nothing
talk about≈say nothing of
talk about≈is done
talk about≈doesn’t say
talk about≈don’t speak
talk about≈doesn’t say
The Paraphrase Database
Can we build a resource like WordNet automatically, at scale,
and without loss of precision?
Improving End-to-End RTE
p entails h if typically, a human reading p would infer that h is
most likely true.
Improving End-to-End RTE
p entails h if typically, a human reading p would infer that h is
most likely true.
p = “A man is having a conversation.” h = “Some women are talking.”
Improving End-to-End RTE
p entails h if typically, a human reading p would infer that h is
most likely true.
p = “A man is having a conversation.” h = “Some women are talking.”
No
x1
man(x1)
x2 x3
patient(x2,x3) agent(x2,x1) have(x2) conversation(x3)
A man is having a conversation. Some woman are talking.
x1 x2
agent(x1,x2) talk(x1) woman(x2)
Improving End-to-End RTE
x1
man(x1)
x2 x3
patient(x2,x3) agent(x2,x1) have(x2) conversation(x3)
A man is having a conversation. Some woman are talking.
x1 x2
agent(x1,x2) talk(x1) woman(x2)
∀x(man(x)⇒¬woman(x))
Improving End-to-End RTE
x1
man(x1)
x2 x3
patient(x2,x3) agent(x2,x1) have(x2) conversation(x3)
A man is having a conversation. Some woman are talking.
x1 x2
agent(x1,x2) talk(x1) woman(x2)
∀x,h,c,t(have(h)⋀conversation(c)⋀talk(t) ⋀agent(h,x)⇒agent(t,x))
Improving End-to-End RTE
0.0
0.2
0.4
0.5
0.7 0.66
0.49
No Axioms
Using PPDBPe
rform
ance
(F1
Scor
e)Improving End-to-End RTE
0.0
0.2
0.4
0.5
0.7 0.660.61
0.49
No Axioms
Using PPDB
Using WordNetPe
rform
ance
(F1
Scor
e)Improving End-to-End RTE
0.0
0.2
0.4
0.5
0.7 0.660.660.61
0.49
No Axioms
Using PPDB
Using WordNet
Human OraclePe
rform
ance
(F1
Scor
e)Improving End-to-End RTE
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
artist
composer
Introduction
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun CompositionAmerican composer
Introduction
artist
Non-Compositional Semantics
artist
composer
Non-Compositional Semantics
artist
American composer composer
Non-Compositional Semantics
artist
American composer 1950s American jazz composer
composer
Non-Compositional Semantics
⟦modifier1 modifier2 … modifierk noun⟧
Non-Compositional Semantics
O(NMk)
Non-Compositional Semantics
American jazz composer
~270,000,000,000,000
O(NMk)
Non-Compositional Semantics
American jazz composer
~270,000,000,000,000
O(NMk)
Non-Compositional Semantics
Problem #1: scalability
Non-Compositional Semantics
“composer”
Non-Compositional Semantics
“1950s American jazz composer”
Non-Compositional Semantics
“1950s American jazz composer”
Problem #2: sparsity
American composer
composer
Non-Compositional Semantics
American composer
composer
Non-Compositional Semantics
American actor
actor
American composer
composer
Non-Compositional Semantics
American actor
actor
American author
author
American composer
composer
Non-Compositional Semantics
American actor
actor
American author
author
American singer
singer
?
? ? ?
?
American composer
composer
Non-Compositional Semantics
American actor
actor
American author
author
American singer
singer
?
? ? ?
? Problem #3: generalizability
American composer
composer
Compositional Semantics
American composercomposer
Compositional Semantics
American composercomposer
Compositional Semantics
American
American composercomposer
Compositional Semantics
American
American composercomposer
Compositional Semantics
Semantic Containment
American composer
Compositional Semantics
American
Class-Instance Identification
composer
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun CompositionAmerican composer
Introduction
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
composer
American composer
Introduction
Classes of Modifiers
American composer
composer
Classes of Modifiers
Subsective
MH ⇒ H
American composer
composer
Classes of Modifiers
Subsective
criminal
alleged criminal
Plain Non-Subsective
MH ⇒ H MH ⇏ H
American composer
composer
Classes of Modifiers
Subsective
criminal
alleged criminal
Plain Non-Subsective
gun
Privative
fake gun
MH ⇒ ¬H MH ⇒ H MH ⇏ H
American composer
composer
Equivalence MH ⟺ H It is her favorite book in the entire world.
Reverse Entailment
MH ⇒ H ⋀H ⇏ MH
She is an American composer.
Forward Entailment
MH ⇏ H ⋀ H ⇒ MH
She is the president’s potential
successor.
Independence MH ⇏ H ⋀ H ⇏ MH
She is the alleged hacker.
Exclusion MH ⇒ ¬H ⋀ H ⇒ ¬MH
She is a former senator.
Eddy is a cat.
Natural Language Inference
Eddy is a cat.
Eddy is a domestic cat.
Natural Language Inference
Eddy is a cat.
Eddy is a domestic cat.
catdomestic
cat
Natural Language Inference
Eddy is a cat.
Eddy is a domestic cat.
catdomestic
cat
Natural Language Inference
⇏
Eddy is a domestic cat sitting on the ground looking out through a clear door screen.
⇏
Eddy is a cat sitting on the ground looking out through a clear door screen.
Natural Language Inference
Eddy is a domestic cat sitting on the ground looking out through a clear door screen.
⇏
Eddy is a cat sitting on the ground looking out through a clear door screen.
p entails h if typically, a human reading p would infer that h is
most likely true.
Natural Language Inference
Eddy is a domestic cat sitting on the ground looking out through a clear door screen.
Eddy is a cat sitting on the ground looking out through a clear door screen.
p entails h if typically, a human reading p would infer that h is
most likely true.
Natural Language Inference
⇒
Natural Language Inference
Eddy is a domestic cat sitting on the ground looking out through a clear door screen.
Eddy is a cat sitting on the ground looking out through a clear door screen.
What types of inference rules
govern human inferences in practice?
p entails h if typically, a human reading p would infer that h is
most likely true.
⇒
⇒
Natural Language Inference
p entails h if typically, a human reading p would infer that h is
most likely true.
Eddy is a domestic cat sitting on the ground looking out through a clear door screen.
Eddy is a cat sitting on the ground looking out through a clear door screen.
What, if any, generalizations can be
made to aide systems in performing natural language
inference?
What types of inference rules
govern human inferences in practice?
Human Annotation of MH Compositions
Human Annotation of MH Compositions
Eddy is a cat.
Eddy is a domestic cat.
H ⇒ MH?
Human Annotation of MH Compositions
Eddy is a cat.
Eddy is a domestic cat.
MH ⇒ H?
MH ⇒ H H ⇒ MH
Equiv. Yes Yes It is her favorite book in the entire world.
Rev. Ent. Yes Unk Eddy is a gray cat.
For. Ent. Unk Yes She is the president’s potential successor.
Indep. Unk Unk She is the alleged hacker.
Excl. No No She is a former senator.
7%1%7%
62%
23%
EquivalenceReverse EntailmentIndependenceForward EntailmentExclusionUndefined
7%1%7%
62%
23%
EquivalenceReverse EntailmentIndependenceForward EntailmentExclusionUndefined
normal subsective modifiers
7%1%7%
62%
23%
EquivalenceReverse EntailmentIndependenceForward EntailmentExclusionUndefined
noun entails modifier?
7%1%7%
62%
23%
EquivalenceReverse EntailmentIndependenceForward EntailmentExclusionUndefined
noun contradicts modifier?
News
7%1%7%
62%
23%
Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined
Images
4%2%
87%
7%
Literature
5%2%
66%
26%
Debate Forums
9%1%6%
53%
31%
News
7%1%7%
62%
23%
Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined
Images
4%2%
87%
7%
Literature
5%2%
66%
26%
Debate Forums
9%1%6%
53%
31%
H ⇒ MH?
News
7%1%7%
62%
23%
Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined
Images
4%2%
87%
7%
Literature
5%2%
66%
26%
Debate Forums
9%1%6%
53%
31%
The deadly attack killed at least 12
civilians. The new series will premiere in January.
A woman rides a bike on an outdoor trail
through a field.
H ⇒ MH?
News
7%1%7%
62%
23%
Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined
Images
4%2%
87%
7%
Literature
5%2%
66%
26%
Debate Forums
9%1%6%
53%
31%
I simply love the actual experience of being one with
the ocean and the life in it.
The entire bill is now subject to approval by the parliament.
H ⇒ MH?
Greenberg also was put under investigation for his
crucial role at the company.
H ⇒ MH?
Entities are assumed to be real and relevant.
H ⇒ MH?
Entities are assumed to be real and relevant.
H ⇒ MH?
Entities are assumed to be prototypical.
Empirical Analysis
News
7%1%7%
62%
23%
Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined
Images
4%2%
87%
7%
Literature
5%2%
66%
26%
Debate Forums
9%1%6%
53%
31%
H ⇒ ¬MH?
gun fake gun
H ⇒ ¬MH?
H MH
H ⇒ ¬MH?
H ⇒ ¬MH MH ⇒ ¬H
H MH
H ⇒ ¬MH?
H ⇒ ¬MH MH ⇒ ¬H
Undefined Relations
H
Undefined Relations
MH ⇒ H
MH
(like subsective)
Undefined Relations
H ⇒ ¬MH(like privative)
H MH
Equiv. Yes Yes It is her favorite book in the entire world.
Rev. Ent. Yes Unk Eddy is a gray cat.
For. Ent. Unk Yes She is the president’s potential successor.
Indep. Unk Unk She is the alleged hacker.
Excl. No No She is a former senator.
Undef. Yes No ?????
MH ⇒ H H ⇒ MH
H ⇒ ¬MH
Undefined Relations
Bush travels Monday to Michigan to remark on the economy.
Bush travels Monday to Michigan to remark on the Japanese economy.
Bush travels Monday to Michigan to remark on the Japanese economy.
MH ⇒ H
Undefined Relations
Bush travels Monday to Michigan to remark on the economy.
criminal
alleged criminal
gun
fake gunAmerican composer
composer
Classes of Modifiers Revisited
MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative
Classes of Modifiers Revisited
50% 50%
Equivalence Reverse Entailment IndependenceForward Entailment Exclusion Undefined
100% 100%
MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative
5%14%
7%
54%
19%
Equivalence Reverse Entailment IndependenceForward Entailment Exclusion Undefined
4%1%
67%
28%37%
16% 1%16%
28%
3%
Classes of Modifiers Revisited
MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative
H ⇒ ¬MH
Privative Modifiers
Wilson signed off to pay the debts to the company.
Wilson signed off to pay the debts to the fictitious company.
MH ⇒ H
Wilson signed off to pay the debts to the fictitious company.
Wilson signed off to pay the debts to the company.
Privative Modifiers
5%14%
7%
54%
19%
Equivalence Reverse Entailment IndependenceForward Entailment Exclusion Undefined
4%1%
67%
28%37%
16% 1%16%
28%
3%
Classes of Modifiers Revisited
MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative
Classes of Modifiers Revisited
MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative
Generalizations based on the class of the modifier lead to incorrect predictions more often than not.
Modern Inference Systems
p entails h if typically, a human reading p would infer that h is
most likely true.
Modern Inference Systems p = “The crowd roared.”
h = “The enthusiastic crowd roared.”
p entails h if typically, a human reading p would infer that h is
most likely true.
Modern Inference Systems
Yes
p = “The crowd roared.” h = “The enthusiastic crowd roared.”
p entails h if typically, a human reading p would infer that h is
most likely true.
20
36
52
68
84
100
86.886.687.386.685.38685.385.3
Accu
racy
Modern Inference Systems Ra
ndom
Gue
ssin
g
Tran
sfor
mat
ion-
base
d St
ern
and
Dag
an (2
012)
Bag
of W
ords
Logi
stic
Reg
ress
ion
Mag
nini
et a
l. (2
014)
Bag
of V
ecto
rs
RNN
LSTM
LSTM
+ T
rans
fer
Bow
man
et a
l. (2
015)
20
36
52
68
84
100
86.886.687.386.685.38685.385.3
Accu
racy
Rand
om G
uess
ing
Tran
sfor
mat
ion-
base
d St
ern
and
Dag
an (2
012)
Bag
of W
ords
Logi
stic
Reg
ress
ion
Mag
nini
et a
l. (2
014)
Bag
of V
ecto
rs
RNN
LSTM
LSTM
+ T
rans
fer
Bow
man
et a
l. (2
015)
Partially proof-based
Modern Inference Systems
20
36
52
68
84
100
86.886.687.386.685.38685.385.3
Accu
racy
Rand
om G
uess
ing
Tran
sfor
mat
ion-
base
d St
ern
and
Dag
an (2
012)
Bag
of W
ords
Logi
stic
Reg
ress
ion
Mag
nini
et a
l. (2
014)
Bag
of V
ecto
rs
RNN
LSTM
LSTM
+ T
rans
fer
Bow
man
et a
l. (2
015)
Partially proof-based
Supervised Learning
Modern Inference Systems
20
36
52
68
84
100
86.886.687.386.685.38685.385.3
Accu
racy
Rand
om G
uess
ing
Tran
sfor
mat
ion-
base
d St
ern
and
Dag
an (2
012)
Bag
of W
ords
Logi
stic
Reg
ress
ion
Mag
nini
et a
l. (2
014)
Bag
of V
ecto
rs
RNN
LSTM
LSTM
+ T
rans
fer
Bow
man
et a
l. (2
015)
Partially proof-based
Supervised Learning
Deep Learning
Modern Inference Systems
20
36
52
68
84
100
86.886.687.386.685.38685.385.3
Accu
racy
Rand
om G
uess
ing
Tran
sfor
mat
ion-
base
d St
ern
and
Dag
an (2
012)
Bag
of W
ords
Logi
stic
Reg
ress
ion
Mag
nini
et a
l. (2
014)
Bag
of V
ecto
rs
RNN
LSTM
LSTM
+ T
rans
fer
Bow
man
et a
l. (2
015)
Correct representation is difficult to capture explicitly and it is not currently being learned implicitly.
Modern Inference Systems
20
36
52
68
84
100
86.886.687.386.685.38685.385.3
Accu
racy
Rand
om G
uess
ing
Tran
sfor
mat
ion-
base
d St
ern
and
Dag
an (2
012)
Bag
of W
ords
Logi
stic
Reg
ress
ion
Mag
nini
et a
l. (2
014)
Bag
of V
ecto
rs
RNN
LSTM
LSTM
+ T
rans
fer
Bow
man
et a
l. (2
015)
Correct representation is difficult to capture explicitly and is currently not being learned implicitly.
Modern Inference Systems
Discussion
Discussion
The roared.crowd
Discussion
The roared.
enthusiastic crowd
crowd
Discussion
enthusiastic crowd
crowd
Set Containment
Discussion
Set Containment
DiscussionThe roared.crowd
DiscussionThe ___ roared.crowd
P(enthusiastic) P(silent) P(imaginary)
Language Modeling
Discussion
Word Sense Disambiguation
The roared.crowd
enthusiastic crowd
silent crowd
imaginary crowd
DiscussionThe roared.crowd
Reference
DiscussionThe roared.crowd
Reference
DiscussionThe roared.crowd
enthusiastic crowd
Reference
DiscussionThe roared.crowd
enthusiastic crowd
real
DiscussionThe roared.crowd
enthusiastic crowd
real
human
DiscussionThe roared.crowd
enthusiastic crowd
real
making noise human
DiscussionThe roared.crowd
enthusiastic crowd
real
humanmaking noise
excited/happy
DiscussionThe roared.crowd
enthusiastic crowd
real
humanmaking noise
excited/happy
excited/happy making noise
clapping yelling human
DiscussionThe roared.crowd
enthusiastic crowd
real
humanmaking noise
excited/happy
excited/happy making noise
clapping yelling human
DiscussionThe roared.crowd
enthusiastic crowd
real
humanmaking noise
excited/happy
excited/happy making noise
clapping yelling human
Assigning intrinsic
meaning to modifiers…
DiscussionThe roared.crowd
enthusiastic crowd
real
humanmaking noise
excited/happy
excited/happy making noise
clapping yelling human
Determining whether they hold for individual
entities
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
composer
American composer
Introduction
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
American composer
Charles Mingus
Introduction
American composercomposer
Compositional Semantics
American
American composercomposer
Compositional Semantics
American composercomposer
Compositional Semantics
American composercomposer
Compositional SemanticsCan we assign intrinsic meaning to modifiers…
American composercomposer
Compositional Semantics
…in such a way that we can we determine whether the modifier
holds for individual entities in practice?
Can we assign intrinsic meaning to modifiers…
Step 1: Modifier InterpretationDetermine the properties entailed by the
modifier in the context of the head
American jazz composerborn in America
influential in America prolific while in America
a product of America lived in America visited America
popular in America
Step 1: Modifier InterpretationDetermine the properties entailed by the
modifier in the context of the head
born in America influential in America
prolific while in America a product of America
lived in America visited America
popular in America
Step 2: Class-Instance IdentificationDetermine, for a specific instance, whether
the necessary properties hold…Mingus's intricate, complex,
compositions in the genres of jazz and classical music illustrate his ability to be dynamic in both the
strings and the swing. Mingus truly was a product of America in all its historic complexities. His mother, Harriet, was half black and half
Chinese, and his father, Charles Sr., was half black and half Swedish, making Mingus a true reflection of
the hybrid nature of our divided nation…
American jazz composer
Modifier InterpretationAmerican composer
American composer
Modifier Interpretation
Americacomposer *
American composer
Modifier Interpretation
Americacomposer *
composer from America composer born in America composer popular in America composer active in America
American composer
Modifier Interpretation
Americacomposer *
⟨composer from America, 3702⟩ ⟨composer born in America, 1389⟩ ⟨composer popular in America, 1292⟩ ⟨composer active in America, 2041⟩
American composer
Modifier Interpretation
Americacomposer *
⟨composer from America, 3702⟩ ⟨composer born in America, 1389⟩ ⟨composer popular in America, 1292⟩ ⟨composer active in America, 2041⟩
P(Y|X) = 11 + eXβ
American composer
Modifier Interpretation
Americacomposer *
⟨composer from America, 0.93⟩ ⟨composer born in America, 0.94⟩ ⟨composer popular in America, 0.45⟩ ⟨composer active in America, 0.52⟩
P(Y|X) = 11 + eXβ
American composer
Modifier Interpretation
Americacomposer *
⟨composer born in America, 0.94⟩ ⟨composer from America, 0.93⟩ ⟨composer active in America, 0.52⟩ ⟨composer popular in America, 0.45⟩
P(Y|X) = 11 + eXβ
Modifier Interpretation
American composer born in America
American company based in America
American novel written in America
Produces good
results…
Modifier Interpretation
child actor has child
risk manager takes risks
machine gun used by machine
…but not perfect.
Class-Instance Identification
Class-Instance Identification
American composer
⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩
Weighted modifier interpretations
Americacomposer *
American composer
* is a composer
J.S. Bach Charles Mingus John Cage W.A. Mozart
Candidate instances
Class-Instance Identification
⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩
“J.S. Bach born in America”
J.S. Bach Charles Mingus John Cage W.A. Mozart
Class-Instance Identification
American composer
⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩
Confidence = 0.94x21 + 0.93x34 + 0.52x329 + 0.45x4,043
“J.S. Bach from America”
J.S. Bach Charles Mingus John Cage W.A. Mozart
Class-Instance Identification
⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩
American composer
Confidence = 0.94x21 + 0.93x34 + 0.52x329 + 0.45x4,043
“J.S. Bach active in America”
J.S. Bach Charles Mingus John Cage W.A. Mozart
Class-Instance Identification
⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩
American composer
Confidence = 0.94x21 + 0.93x34 + 0.52x329 + 0.45x4,043
“J.S. Bach popular in America”
J.S. Bach Charles Mingus John Cage W.A. Mozart
Confidence = 0.94x21 + 0.93x34 + 0.52x329 + 0.45x4,043
Class-Instance Identification
⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩
American composer
Class-Instance IdentificationAmerican composer jazz composer
JS Bach 0.21 0.04Charles Mingus 0.89 0.93John Cage 0.96 0.52WA Mozart 0.19 0.13Libby Larsen 0.72 0.24Duke Ellington 0.76 0.97Palestrina 0.04 0.03Ludwig van Beethoven 0.09 0.12Morton Feldman 0.88 0.31Frederick Chopin 0.33 0.32Barack Obama 0.14 0.35Herbie Hancock 0.62 0.95
Class-Instance IdentificationAmerican jazz composer
JS Bach 0.25Charles Mingus 1.82John Cage 1.48WA Mozart 0.32Libby Larsen 0.96Duke Ellington 1.73Palestrina 0.07Ludwig van Beethoven 0.21Morton Feldman 1.19Frederick Chopin 0.65Barack Obama 0.49Herbie Hancock 1.57
Class-Instance IdentificationAmerican jazz composer
Charles Mingus 1.82Duke Ellington 1.73Herbie Hancock 1.57John Cage 1.48Morton Feldman 1.19Libby Larsen 0.96Frederick Chopin 0.65Barack Obama 0.49WA Mozart 0.32JS Bach 0.25Ludwig van Beethoven 0.21Palestrina 0.07
Reconstructing Wikipedia
10
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.
(a) Uniform random sample. (b) Weighted random sample.
Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.
sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.
Uniform WeightedAUC Rec. AUC Recall
Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10
Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).
7 Conclusion
We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.
TODO: Add two or three lines of futurework.
TODO: Break some of the longer sen-tences containing “which” or “that”.
References
Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.
Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.
Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.
Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.
Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.
E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial
Reconstructing Wikipedia
10
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.
(a) Uniform random sample. (b) Weighted random sample.
Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.
sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.
Uniform WeightedAUC Rec. AUC Recall
Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10
Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).
7 Conclusion
We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.
TODO: Add two or three lines of futurework.
TODO: Break some of the longer sen-tences containing “which” or “that”.
References
Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.
Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.
Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.
Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.
Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.
E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial
Reconstructing Wikipedia
Best Existing Non-Compositional Method
(Lexico-Syntactic Patterns) AUC = 0.57
10
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.
(a) Uniform random sample. (b) Weighted random sample.
Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.
sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.
Uniform WeightedAUC Rec. AUC Recall
Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10
Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).
7 Conclusion
We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.
TODO: Add two or three lines of futurework.
TODO: Break some of the longer sen-tences containing “which” or “that”.
References
Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.
Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.
Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.
Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.
Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.
E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial
Reconstructing WikipediaBest Proposed Compositional Method
AUC = 0.73
Best Existing Non-Compositional Method
(Lexico-Syntactic Patterns) AUC = 0.57
Lexical Entailment
Semantic Containment
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
American composer
Charles Mingus
Introduction
Lexical Entailment
Semantic Containment
Introduction
Summary and Future Work
Class-Instance Identification
Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)
Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)
Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)
Modifier-Noun Composition
Lexical Entailment
Semantic Containment
Class-Instance Identification
Lexical Entailment
Semantic Containment
Class-Instance Identification
Equivalence Reverse Entailment
Forward Entailment Independent Exclusion
Lexical Entailment
Semantic Containment
Class-Instance Identification
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.61
0.49
No Axioms
Using PPDB
Using WordNet
Human Oracle
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
Plain Non-Subsective
5%14%
7%
54%
19%
Subsective
4%1%
67%
28%
Privative
37%
16% 1%16%
28%
3%
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
20
36
52
68
84
100
86.886.687.386.685.38685.385.3
Rand
om G
uess
ing
Tran
sfor
mat
ion-
base
d St
ern
and
Dag
an (2
012)
Bag
of W
ords
Logi
stic
Reg
ress
ion
Mag
nini
et a
l. (2
014)
Bag
of V
ecto
rs
RNN
LSTM
LSTM
+ T
rans
fer
Bow
man
et a
l. (2
015)
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
2036526884
100
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
2036526884
100
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
2036526884
100
composersAmerican
composers
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
2036526884
100
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
2036526884
100
10
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.
(a) Uniform random sample. (b) Weighted random sample.
Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.
sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.
Uniform WeightedAUC Rec. AUC Recall
Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10
Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).
7 Conclusion
We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.
TODO: Add two or three lines of futurework.
TODO: Break some of the longer sen-tences containing “which” or “that”.
References
Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.
Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.
Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.
Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.
Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.
E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial
Lexical Entailment
Semantic Containment
Class-Instance Identification
0.0
0.2
0.4
0.5
0.7 0.660.660.610.49
2036526884
100
10
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.
(a) Uniform random sample. (b) Weighted random sample.
Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.
sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.
Uniform WeightedAUC Rec. AUC Recall
Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10
Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).
7 Conclusion
We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.
TODO: Add two or three lines of futurework.
TODO: Break some of the longer sen-tences containing “which” or “that”.
References
Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.
Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.
Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.
Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.
Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.
E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial
Future Directions
The roared.crowd
enthusiastic crowd
real
humanmaking noise
excited/happy
excited/happy making noise
clapping yelling human
Future Directions
Future DirectionsThe roared.crowd
real
humanmaking noise
excited/happy
Future DirectionsThe roared.crowd
real
humanmaking noise
happy
enthusiastic
Future DirectionsThe red circle.
Future Directions“common
sense knowledge”
Future Directions“common
sense knowledge”What is it?
World Knowledge? Pragmatics?
How do we represent it?
Distributional? Symbolic? Triple stores? Probability distributions?
How is it learned?Is it distributional?
Is text enough?
When/how is it accessed?What can be
precomputed? What happens at
“runtime”?
Thank you! Questions!