Compositional Lexical Semantics for Natural Language Inference · Compositional Lexical Semantics...

Post on 28-Jul-2020

15 views 0 download

transcript

Compositional Lexical Semantics for Natural Language Inference

Thesis Defense Ellie Pavlick

Department of Computer and Information Science University of Pennsylvania

what is the population of new york city?

what is the population of new york city?

what is the population of new york city?

what is the population of new york city?

“What is Saturday afternoon’s forecast?”

“Will it be sunny this weekend in Miami?”

“What’s the weather going to be like this

weekend?”

how many people live in nyc?

new york city population

how big is new york city? how crowded is ny?

number of residents of nyc

“Is it going to be nice out on Saturday?”

Human language is highly variable.

In leaked audio, Clinton talks about Sanders supporters “living in basement”

in hacked fundraiser recording • in leaked

recording • in audio from hacked email • privately •

hacked audio:

mocks • said • insults • characterizes • comments on •

gives frank take on • slams • calls • knocks • describes

bernie supporters • millennials •

sanders supporters • young voters • bernie sanders

supporters • bernie kids • bernie fans

losers who live in their parents'

basements • basement dwellers • frustrated basement-

dwellers • basement-dwellers & baristas

Hillary • Hillary Clinton • HRC

In leaked audio, Clinton talks about Sanders supporters “living in basement”

in hacked fundraiser recording • in leaked

recording • in audio from hacked email • privately •

hacked audio:

mocks • said • insults • characterizes • comments on •

gives frank take on • slams • calls • knocks • describes

bernie supporters • millennials •

sanders supporters • young voters • bernie sanders

supporters • bernie kids • bernie fans

losers who live in their parents'

basements • basement dwellers • frustrated basement-

dwellers • basement-dwellers & baristas

Hillary • Hillary Clinton • HRC

In leaked audio, Clinton talks about Sanders supporters “living in basement”

How to we know when two different expressions in natural language have

the same meaning?

in hacked fundraiser recording • in leaked

recording • in audio from hacked email • privately •

hacked audio:

mocks • said • insults • characterizes • comments on •

gives frank take on • slams • calls • knocks • describes

bernie supporters • millennials •

sanders supporters • young voters • bernie sanders

supporters • bernie kids • bernie fans

losers who live in their parents'

basements • basement dwellers • frustrated basement-

dwellers • basement-dwellers & baristas

Hillary • Hillary Clinton • HRC

In leaked audio, Clinton talks about Sanders supporters “living in basement”

How to we know when two similar expressions in natural language have

a different meaning?

In leaked audio, Clinton talks about Sanders supporters “living in basement”

In leaked recording=

Logical Inference

In leaked audio, Clinton talks about Sanders supporters “living in basement”

In hacked fundraiser recording

In leaked recording⊂

=

Logical Inference

In leaked audio, Clinton talks about Sanders supporters “living in basement”

PrivatelyIn hacked fundraiser recording

In leaked recording⊂ ⊂

=

Logical Inference

In leaked audio, Clinton talks about Sanders supporters “living in basement”

PrivatelyIn hacked fundraiser recording

In leaked recording⊂ ⊂

=

Logical Inference

Common Sense Inference

In leaked audio, Clinton talks about Sanders supporters “living in parents’ basement”

PrivatelyIn hacked fundraiser recording

In leaked recording⊂ ⊂

=

basement-dwellers

=

Logical Inference

Common Sense Inference

Stylistics

In leaked audio, Clinton talks about Sanders supporters “living in parents’ basement”

PrivatelyIn hacked fundraiser recording

In leaked recording⊂ ⊂

=

basement-dwellers

=

Logical Inference

Common Sense Inference Stylistics

In leaked audio, Clinton talks about Sanders supporters “living in parents’ basement”

Natural Language Inference

Natural Language Inference(aka Recognizing Textual Entailment)

Natural Language Inference

In leaked audio, Clinton talks about Sanders supporters living in basement

(aka Recognizing Textual Entailment)

Natural Language Inference

In leaked audio, Clinton talks about Sanders supporters living in basement

Hillary Clinton privately slams millennials as basement-dwellers

(aka Recognizing Textual Entailment)

Natural Language Inference

In leaked audio, Clinton talks about Sanders supporters living in basement

Hillary Clinton privately slams millennials as basement-dwellers

premise

hypothesis

(aka Recognizing Textual Entailment)

Natural Language Inference

In leaked audio, Clinton talks about Sanders supporters living in basement

Hillary Clinton privately slams millennials as basement-dwellers

(aka Recognizing Textual Entailment)

p entails h if “typically, a human reading p would infer that h is most likely true.”

The Pascal Recognising Textual Entailment Challenge. Dagan et al. (2006)

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

Introduction

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

artist

composer

Introduction

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun CompositionAmerican composer

Introduction

American composer

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

composer

Introduction

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

American composer

Charles Mingus

Introduction

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

Introduction

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

artist

composer

Introduction

Natural Language Inference

In leaked audio, Clinton talks about Sanders supporters living in basement

Hillary Clinton privately slams millennials as basement-dwellers

Natural Language Inference

In leaked audio, Clinton talks about Sanders supporters living in basement

Hillary Clinton privately slams millennials as basement-dwellers

lives in basement

is a basement-dwellerEquivalence

Natural Language Inference

In leaked audio, Clinton talks about Sanders supporters living in basement

Hillary Clinton privately slams millennials as basement-dwellers

privatelyForward Entailment in leaked

audio

Natural Language Inference

In leaked audio, Clinton talks about Sanders supporters living in basement

Hillary Clinton privately slams millennials as basement-dwellers

talks aboutReverse Entailment

slams

Natural Language Inference

In leaked audio, Clinton talks about Sanders supporters living in basement

Hillary Clinton privately slams millennials as basement-dwellers

Independent millennialsSanders supporters

Natural Language Inference

At a press conference, Clinton talks about Sanders supporters living in basement

Hillary Clinton privately slams millennials as basement-dwellers

Exclusion privatelyat a press conference

Equivalence x ⟺ y

Reverse Entailment x ⇒ y

Forward Entailment y ⇒ x

Independence x ⇏ y ⋀ y ⇏ x

Exclusion x ⇒ ¬y ⋀ y ⇒ ¬x

felinecat

animal

cat

cat pet

cat dog

animal

cat

Lexical Semantics Resources

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

WordNet. Fellbaum (1998)

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

Lexical Semantics Resources

Bilingual Pivoting

ahogados a la playa ...

get washed up on beaches ...

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

Lexical Semantics Resources

Paraphrasing with bilingual parallel corpora. Bannard and Callison-Burch (2005)

Vector Space Models

Bilingual Pivoting

ahogados a la playa ...

get washed up on beaches ...

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

Lexical Semantics Resources

Vector Space Models

Bilingual Pivoting

ahogados a la playa ...

get washed up on beaches ...

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

x ⇒ y ⋀ y ⇏ x

Lexical Semantics Resources

Vector Space Models

Bilingual Pivoting

ahogados a la playa ...

get washed up on beaches ...

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

x shares some translation with y

Lexical Semantics Resources

Vector Space Models

Bilingual Pivoting

ahogados a la playa ...

get washed up on beaches ...

... fünf Landwirte , weil

... 5 farmers were in Ireland ...

...

oder wurden , gefoltert

or have been , tortured

festgenommen

thrown into jail

festgenommen

imprisoned

...

... ...

...

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

x appears in similar contexts as y

Lexical Semantics Resources

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer

talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicate

talk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subject

talk about≈express

talk about≈see

talk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture talk about≈explain

talk about≈job

talk about≈issue

talk about≈relate

talk about≈sustain

talk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocate

talk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈givetalk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

Lexical Semantics Resources

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer

talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicate

talk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subject

talk about≈express

talk about≈see

talk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture talk about≈explain

talk about≈job

talk about≈issue

talk about≈relate

talk about≈sustain

talk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocate

talk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈givetalk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

Data-Driven ModelsPrecise but Small

Big but Noisy

Lexical Semantics Resources

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer

talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicate

talk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subject

talk about≈express

talk about≈see

talk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture talk about≈explain

talk about≈job

talk about≈issue

talk about≈relate

talk about≈sustain

talk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocate

talk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈givetalk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

Data-Driven ModelsPrecise but Small

Big but Noisy

Lexical Semantics ResourcesCan we build lexical entailment resources automatically and at scale…

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer

talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicate

talk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subject

talk about≈express

talk about≈see

talk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture talk about≈explain

talk about≈job

talk about≈issue

talk about≈relate

talk about≈sustain

talk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocate

talk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈givetalk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

WordNet

act

communicate

address

harangue

rant

perform

practice

walk through scrimmage

relay

talk about

descant

Data-Driven ModelsPrecise but Small

Big but Noisy

Lexical Semantics ResourcesCan we build lexical entailment resources automatically and at scale…

…while maintaining WordNet-level

precision and interpretability?

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicatetalk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subjecttalk about≈express

talk about≈seetalk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture

talk about≈explain

talk about≈job

talk about≈issue

talk about≈relatetalk about≈sustaintalk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocatetalk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈give

talk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

The Paraphrase Database

PPDB: The Paraphrase Database. Ganitkevich et al. (2013)

The Paraphrase Database

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicatetalk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subjecttalk about≈express

talk about≈see talk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture

talk about≈explain

talk about≈job

talk about≈issue

talk about≈relatetalk about≈sustaintalk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocatetalk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈give

talk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

Entailment

EquivalenceExclusion

Independent

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicatetalk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subjecttalk about≈express

talk about≈seetalk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture

talk about≈explain

talk about≈job

talk about≈issue

talk about≈relatetalk about≈sustaintalk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocatetalk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈give

talk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

The Paraphrase Database

Distributional Signals of Semantics

Distributional Signals of Semantics

Monolingual Contextual SimilaritiesLin and Pantel, 2001 (Alberta) Mikolov et al., 2013 (Google)

Pennington et al., 2014 (Stanford)

Distributional Signals of Semantics

…converted from classical work to abstract expressionism after hearing Russian composer Igor

Stravinsky's "Rite of Spring”…

…South African contemporary artist, with abstract expressionism work featuring key aesthetics of the

most sought after artists…

Monolingual Contextual SimilaritiesLin and Pantel, 2001 (Alberta) Mikolov et al., 2013 (Google)

Pennington et al., 2014 (Stanford)

Distributional Signals of Semantics

…converted from classical work to abstract expressionism after hearing Russian composer Igor

Stravinsky's "Rite of Spring”…

…South African contemporary artist, with abstract expressionism work featuring key aesthetics of the

most sought after artists…

Monolingual Contextual SimilaritiesLin and Pantel, 2001 (Alberta) Mikolov et al., 2013 (Google)

Pennington et al., 2014 (Stanford)

Contextual Similarities

Stre

ngth

sWea

knes

ses

Stre

ngth

sWea

knes

ses

dad/fathervs.

dad/lychee

Contextual Similarities

Stre

ngth

sWea

knes

ses

dad/fathervs.

dad/lychee

dad/fathervs.

dad/mom

Contextual Similarities

Distributional Signals of Semantics

Bilingual Translational Similarity Bannard and Callison-Burch, 2005 (Edinburgh)

Kok and Brockett, 2010 (MSR) Ganitkevitch et al., 2013 (Hopkins)

Distributional Signals of Semantics

…the directive include the extension to the period of

protection for composers…

…to favour the position of artists who have to travel

throughout the community…

…la directive comprennent la prolongation de la durée

de protection pour les artistes…

…favoriser la position des artistes qui doivent voyager à travers la communauté…

Bilingual Translational Similarity Bannard and Callison-Burch, 2005 (Edinburgh)

Kok and Brockett, 2010 (MSR) Ganitkevitch et al., 2013 (Hopkins)

Distributional Signals of Semantics

…the directive include the extension to the period of

protection for composers…

…to favour the position of artists who have to travel

throughout the community…

…la directive comprennent la prolongation de la durée

de protection pour les artistes…

…favoriser la position des artistes qui doivent voyager à travers la communauté…

Bilingual Translational Similarity Bannard and Callison-Burch, 2005 (Edinburgh)

Kok and Brockett, 2010 (MSR) Ganitkevitch et al., 2013 (Hopkins)

Bilingual TranslationsStre

ngth

sWea

knes

ses

dad/fathervs.

dad/lychee

dad/fathervs.

dad/mom

Contextual Similarities

Stre

ngth

sWea

knes

ses

Bilingual Translations

Contextual Similarities

dad/fathervs.

dad/lychee

dad/fathervs.

dad/mom

dad/fathervs.

dad/mom

Stre

ngth

sWea

knes

ses

Bilingual Translations

Contextual Similarities

dad/fathervs.

dad/lychee

dad/fathervs.

dad/mom

dad/fathervs.

dad/mom

dad/parent vs.

dad/lychee

Distributional Signals of Semantics

Lexico-Syntactic Patterns Hearst, 1992 (Berkeley)

Snow et al., 2006 (Stanford) Movshovitz-Attias and Cohen, 2015 (CMU)

Distributional Signals of Semantics

How do composers and other artists survive and work in today's musical theatre scene?

As Luciano Berio did in his “Recital for Cathy”, creative artists such as composers, theatre

directors, choreographs, video artists or even circus ...

Lexico-Syntactic Patterns Hearst, 1992 (Berkeley)

Snow et al., 2006 (Stanford) Movshovitz-Attias and Cohen, 2015 (CMU)

Distributional Signals of Semantics

How do composers and other artists survive and work in today's musical theatre scene?

As Luciano Berio did in his “Recital for Cathy”, creative artists such as composers, theatre directors, choreographs, video artists or even

circus ...

Lexico-Syntactic Patterns Hearst, 1992 (Berkeley)

Snow et al., 2006 (Stanford) Movshovitz-Attias and Cohen, 2015 (CMU)

Stre

ngth

sWea

knes

ses

Bilingual Translations

Contextual Similarities

dad/fathervs.

dad/lychee

dad/fathervs.

dad/mom

dad/fathervs.

dad/mom

dad/parent vs.

dad/lychee

Lexico-Syntactic Patterns

Stre

ngth

sWea

knes

ses

Bilingual Translations

Contextual Similarities

dad/fathervs.

dad/lychee

dad/fathervs.

dad/mom

dad/fathervs.

dad/mom

dad/parent vs.

dad/lychee

dad/parent vs.

dad/lychee

Lexico-Syntactic Patterns

Stre

ngth

sWea

knes

ses

Bilingual Translations

Contextual Similarities

dad/fathervs.

dad/lychee

dad/fathervs.

dad/mom

dad/fathervs.

dad/mom

dad/parent vs.

dad/lychee

dad/parent vs.

dad/lychee

Lexico-Syntactic Patterns

dad/fathervs.

dad/lychee

[ ]Lexico-Syntactic PatternsBilingual Translations

Logistic Regression

=

1 + e

.1

[ ] Contextual Similaritiesw1 w2 w3

[ ]P(equivalent) P(entailment) P(exclusion)

P(independent)

[ ]Lexico-Syntactic PatternsBilingual Translations

Logistic Regression

=

1 + e

.1

[ ] Contextual Similaritiesw1 w2 w3

[ ]P(equivalent) P(entailment) P(exclusion)

P(independent)

Predict a probability distribution based over entailment relations…

[ ]Lexico-Syntactic PatternsBilingual Translations

Logistic Regression

=

1 + e

.1

[ ] Contextual Similaritiesw1 w2 w3

[ ]P(equivalent) P(entailment) P(exclusion)

P(independent)

…based on all of the data-driven signals available.

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicatetalk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subjecttalk about≈express

talk about≈seetalk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture

talk about≈explain

talk about≈job

talk about≈issue

talk about≈relatetalk about≈sustaintalk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocatetalk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈give

talk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

The Paraphrase Database

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicatetalk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subjecttalk about≈express

talk about≈seetalk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture

talk about≈explain

talk about≈job

talk about≈issue

talk about≈relatetalk about≈sustaintalk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocatetalk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈give

talk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

The Paraphrase Database

talk about≈speak

talk about≈discuss

talk about≈tell

talk about≈say

talk about≈mention

talk about≈about

talk about≈mean

talk about≈spoken

talk about≈address

talk about≈said

talk about≈discussion

talk about≈refer talk about≈talking

talk about≈raise

talk about≈alone

talk about≈debate

talk about≈chat

talk about≈told

talk about≈argued

talk about≈feel

talk about≈maintaintalk about≈comment

talk about≈make

talk about≈add

talk about≈approach

talk about≈causetalk about≈deliberations

talk about≈ask

talk about≈added

talk about≈tackletalk about≈bet

talk about≈betcha

talk about≈treat

talk about≈communicatetalk about≈described

talk about≈know

talk about≈to

talk about≈stated

talk about≈deal

talk about≈topic

talk about≈subjecttalk about≈express

talk about≈seetalk about≈highlighttalk about≈consider

talk about≈question

talk about≈touch

talk about≈sound

talk about≈noted

talk about≈nurture

talk about≈explain

talk about≈job

talk about≈issue

talk about≈relatetalk about≈sustaintalk about≈insert

talk about≈causing

talk about≈confront talk about≈time

talk about≈covered

talk about≈put

talk about≈will

talk about≈cite

talk about≈advocatetalk about≈indicate

talk about≈please

talk about≈regard

talk about≈hear

talk about≈kidding

talk about≈read

talk about≈dispute

talk about≈give

talk about≈say nothing

talk about≈say nothing of

talk about≈is done

talk about≈doesn’t say

talk about≈don’t speak

talk about≈doesn’t say

The Paraphrase Database

Can we build a resource like WordNet automatically, at scale,

and without loss of precision?

Improving End-to-End RTE

p entails h if typically, a human reading p would infer that h is

most likely true.

Improving End-to-End RTE

p entails h if typically, a human reading p would infer that h is

most likely true.

p = “A man is having a conversation.” h = “Some women are talking.”

Improving End-to-End RTE

p entails h if typically, a human reading p would infer that h is

most likely true.

p = “A man is having a conversation.” h = “Some women are talking.”

No

x1

man(x1)

x2 x3

patient(x2,x3) agent(x2,x1) have(x2) conversation(x3)

A man is having a conversation. Some woman are talking.

x1 x2

agent(x1,x2) talk(x1) woman(x2)

Improving End-to-End RTE

x1

man(x1)

x2 x3

patient(x2,x3) agent(x2,x1) have(x2) conversation(x3)

A man is having a conversation. Some woman are talking.

x1 x2

agent(x1,x2) talk(x1) woman(x2)

∀x(man(x)⇒¬woman(x))

Improving End-to-End RTE

x1

man(x1)

x2 x3

patient(x2,x3) agent(x2,x1) have(x2) conversation(x3)

A man is having a conversation. Some woman are talking.

x1 x2

agent(x1,x2) talk(x1) woman(x2)

∀x,h,c,t(have(h)⋀conversation(c)⋀talk(t) ⋀agent(h,x)⇒agent(t,x))

Improving End-to-End RTE

0.0

0.2

0.4

0.5

0.7 0.66

0.49

No Axioms

Using PPDBPe

rform

ance

(F1

Scor

e)Improving End-to-End RTE

0.0

0.2

0.4

0.5

0.7 0.660.61

0.49

No Axioms

Using PPDB

Using WordNetPe

rform

ance

(F1

Scor

e)Improving End-to-End RTE

0.0

0.2

0.4

0.5

0.7 0.660.660.61

0.49

No Axioms

Using PPDB

Using WordNet

Human OraclePe

rform

ance

(F1

Scor

e)Improving End-to-End RTE

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

artist

composer

Introduction

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun CompositionAmerican composer

Introduction

artist

Non-Compositional Semantics

artist

composer

Non-Compositional Semantics

artist

American composer composer

Non-Compositional Semantics

artist

American composer 1950s American jazz composer

composer

Non-Compositional Semantics

⟦modifier1 modifier2 … modifierk noun⟧

Non-Compositional Semantics

O(NMk)

Non-Compositional Semantics

American jazz composer

~270,000,000,000,000

O(NMk)

Non-Compositional Semantics

American jazz composer

~270,000,000,000,000

O(NMk)

Non-Compositional Semantics

Problem #1: scalability

Non-Compositional Semantics

“composer”

Non-Compositional Semantics

“1950s American jazz composer”

Non-Compositional Semantics

“1950s American jazz composer”

Problem #2: sparsity

American composer

composer

Non-Compositional Semantics

American composer

composer

Non-Compositional Semantics

American actor

actor

American composer

composer

Non-Compositional Semantics

American actor

actor

American author

author

American composer

composer

Non-Compositional Semantics

American actor

actor

American author

author

American singer

singer

?

? ? ?

?

American composer

composer

Non-Compositional Semantics

American actor

actor

American author

author

American singer

singer

?

? ? ?

? Problem #3: generalizability

American composer

composer

Compositional Semantics

American composercomposer

Compositional Semantics

American composercomposer

Compositional Semantics

American

American composercomposer

Compositional Semantics

American

American composercomposer

Compositional Semantics

Semantic Containment

American composer

Compositional Semantics

American

Class-Instance Identification

composer

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun CompositionAmerican composer

Introduction

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

composer

American composer

Introduction

Classes of Modifiers

American composer

composer

Classes of Modifiers

Subsective

MH ⇒ H

American composer

composer

Classes of Modifiers

Subsective

criminal

alleged criminal

Plain Non-Subsective

MH ⇒ H MH ⇏ H

American composer

composer

Classes of Modifiers

Subsective

criminal

alleged criminal

Plain Non-Subsective

gun

Privative

fake gun

MH ⇒ ¬H MH ⇒ H MH ⇏ H

American composer

composer

Equivalence MH ⟺ H It is her favorite book in the entire world.

Reverse Entailment

MH ⇒ H ⋀H ⇏ MH

She is an American composer.

Forward Entailment

MH ⇏ H ⋀ H ⇒ MH

She is the president’s potential

successor.

Independence MH ⇏ H ⋀ H ⇏ MH

She is the alleged hacker.

Exclusion MH ⇒ ¬H ⋀ H ⇒ ¬MH

She is a former senator.

Eddy is a cat.

Natural Language Inference

Eddy is a cat.

Eddy is a domestic cat.

Natural Language Inference

Eddy is a cat.

Eddy is a domestic cat.

catdomestic

cat

Natural Language Inference

Eddy is a cat.

Eddy is a domestic cat.

catdomestic

cat

Natural Language Inference

Eddy is a domestic cat sitting on the ground looking out through a clear door screen.

Eddy is a cat sitting on the ground looking out through a clear door screen.

Natural Language Inference

Eddy is a domestic cat sitting on the ground looking out through a clear door screen.

Eddy is a cat sitting on the ground looking out through a clear door screen.

p entails h if typically, a human reading p would infer that h is

most likely true.

Natural Language Inference

Eddy is a domestic cat sitting on the ground looking out through a clear door screen.

Eddy is a cat sitting on the ground looking out through a clear door screen.

p entails h if typically, a human reading p would infer that h is

most likely true.

Natural Language Inference

Natural Language Inference

Eddy is a domestic cat sitting on the ground looking out through a clear door screen.

Eddy is a cat sitting on the ground looking out through a clear door screen.

What types of inference rules

govern human inferences in practice?

p entails h if typically, a human reading p would infer that h is

most likely true.

Natural Language Inference

p entails h if typically, a human reading p would infer that h is

most likely true.

Eddy is a domestic cat sitting on the ground looking out through a clear door screen.

Eddy is a cat sitting on the ground looking out through a clear door screen.

What, if any, generalizations can be

made to aide systems in performing natural language

inference?

What types of inference rules

govern human inferences in practice?

Human Annotation of MH Compositions

Human Annotation of MH Compositions

Eddy is a cat.

Eddy is a domestic cat.

H ⇒ MH?

Human Annotation of MH Compositions

Eddy is a cat.

Eddy is a domestic cat.

MH ⇒ H?

MH ⇒ H H ⇒ MH

Equiv. Yes Yes It is her favorite book in the entire world.

Rev. Ent. Yes Unk Eddy is a gray cat.

For. Ent. Unk Yes She is the president’s potential successor.

Indep. Unk Unk She is the alleged hacker.

Excl. No No She is a former senator.

7%1%7%

62%

23%

EquivalenceReverse EntailmentIndependenceForward EntailmentExclusionUndefined

7%1%7%

62%

23%

EquivalenceReverse EntailmentIndependenceForward EntailmentExclusionUndefined

normal subsective modifiers

7%1%7%

62%

23%

EquivalenceReverse EntailmentIndependenceForward EntailmentExclusionUndefined

noun entails modifier?

7%1%7%

62%

23%

EquivalenceReverse EntailmentIndependenceForward EntailmentExclusionUndefined

noun contradicts modifier?

News

7%1%7%

62%

23%

Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined

Images

4%2%

87%

7%

Literature

5%2%

66%

26%

Debate Forums

9%1%6%

53%

31%

News

7%1%7%

62%

23%

Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined

Images

4%2%

87%

7%

Literature

5%2%

66%

26%

Debate Forums

9%1%6%

53%

31%

H ⇒ MH?

News

7%1%7%

62%

23%

Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined

Images

4%2%

87%

7%

Literature

5%2%

66%

26%

Debate Forums

9%1%6%

53%

31%

The deadly attack killed at least 12

civilians. The new series will premiere in January.

A woman rides a bike on an outdoor trail

through a field.

H ⇒ MH?

News

7%1%7%

62%

23%

Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined

Images

4%2%

87%

7%

Literature

5%2%

66%

26%

Debate Forums

9%1%6%

53%

31%

I simply love the actual experience of being one with

the ocean and the life in it.

The entire bill is now subject to approval by the parliament.

H ⇒ MH?

Greenberg also was put under investigation for his

crucial role at the company.

H ⇒ MH?

Entities are assumed to be real and relevant.

H ⇒ MH?

Entities are assumed to be real and relevant.

H ⇒ MH?

Entities are assumed to be prototypical.

Empirical Analysis

News

7%1%7%

62%

23%

Equivalence Reverse EntailmentIndependence Forward EntailmentExclusion Undefined

Images

4%2%

87%

7%

Literature

5%2%

66%

26%

Debate Forums

9%1%6%

53%

31%

H ⇒ ¬MH?

gun fake gun

H ⇒ ¬MH?

H MH

H ⇒ ¬MH?

H ⇒ ¬MH MH ⇒ ¬H

H MH

H ⇒ ¬MH?

H ⇒ ¬MH MH ⇒ ¬H

Undefined Relations

H

Undefined Relations

MH ⇒ H

MH

(like subsective)

Undefined Relations

H ⇒ ¬MH(like privative)

H MH

Equiv. Yes Yes It is her favorite book in the entire world.

Rev. Ent. Yes Unk Eddy is a gray cat.

For. Ent. Unk Yes She is the president’s potential successor.

Indep. Unk Unk She is the alleged hacker.

Excl. No No She is a former senator.

Undef. Yes No ?????

MH ⇒ H H ⇒ MH

H ⇒ ¬MH

Undefined Relations

Bush travels Monday to Michigan to remark on the economy.

Bush travels Monday to Michigan to remark on the Japanese economy.

Bush travels Monday to Michigan to remark on the Japanese economy.

MH ⇒ H

Undefined Relations

Bush travels Monday to Michigan to remark on the economy.

criminal

alleged criminal

gun

fake gunAmerican composer

composer

Classes of Modifiers Revisited

MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative

Classes of Modifiers Revisited

50% 50%

Equivalence Reverse Entailment IndependenceForward Entailment Exclusion Undefined

100% 100%

MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative

5%14%

7%

54%

19%

Equivalence Reverse Entailment IndependenceForward Entailment Exclusion Undefined

4%1%

67%

28%37%

16% 1%16%

28%

3%

Classes of Modifiers Revisited

MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative

H ⇒ ¬MH

Privative Modifiers

Wilson signed off to pay the debts to the company.

Wilson signed off to pay the debts to the fictitious company.

MH ⇒ H

Wilson signed off to pay the debts to the fictitious company.

Wilson signed off to pay the debts to the company.

Privative Modifiers

5%14%

7%

54%

19%

Equivalence Reverse Entailment IndependenceForward Entailment Exclusion Undefined

4%1%

67%

28%37%

16% 1%16%

28%

3%

Classes of Modifiers Revisited

MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative

Classes of Modifiers Revisited

MH ⇒ ¬H MH ⇒ H MH ⇏ H Subsective Plain Non-Subsective Privative

Generalizations based on the class of the modifier lead to incorrect predictions more often than not.

Modern Inference Systems

p entails h if typically, a human reading p would infer that h is

most likely true.

Modern Inference Systems p = “The crowd roared.”

h = “The enthusiastic crowd roared.”

p entails h if typically, a human reading p would infer that h is

most likely true.

Modern Inference Systems

Yes

p = “The crowd roared.” h = “The enthusiastic crowd roared.”

p entails h if typically, a human reading p would infer that h is

most likely true.

20

36

52

68

84

100

86.886.687.386.685.38685.385.3

Accu

racy

Modern Inference Systems Ra

ndom

Gue

ssin

g

Tran

sfor

mat

ion-

base

d St

ern

and

Dag

an (2

012)

Bag

of W

ords

Logi

stic

Reg

ress

ion

Mag

nini

et a

l. (2

014)

Bag

of V

ecto

rs

RNN

LSTM

LSTM

+ T

rans

fer

Bow

man

et a

l. (2

015)

20

36

52

68

84

100

86.886.687.386.685.38685.385.3

Accu

racy

Rand

om G

uess

ing

Tran

sfor

mat

ion-

base

d St

ern

and

Dag

an (2

012)

Bag

of W

ords

Logi

stic

Reg

ress

ion

Mag

nini

et a

l. (2

014)

Bag

of V

ecto

rs

RNN

LSTM

LSTM

+ T

rans

fer

Bow

man

et a

l. (2

015)

Partially proof-based

Modern Inference Systems

20

36

52

68

84

100

86.886.687.386.685.38685.385.3

Accu

racy

Rand

om G

uess

ing

Tran

sfor

mat

ion-

base

d St

ern

and

Dag

an (2

012)

Bag

of W

ords

Logi

stic

Reg

ress

ion

Mag

nini

et a

l. (2

014)

Bag

of V

ecto

rs

RNN

LSTM

LSTM

+ T

rans

fer

Bow

man

et a

l. (2

015)

Partially proof-based

Supervised Learning

Modern Inference Systems

20

36

52

68

84

100

86.886.687.386.685.38685.385.3

Accu

racy

Rand

om G

uess

ing

Tran

sfor

mat

ion-

base

d St

ern

and

Dag

an (2

012)

Bag

of W

ords

Logi

stic

Reg

ress

ion

Mag

nini

et a

l. (2

014)

Bag

of V

ecto

rs

RNN

LSTM

LSTM

+ T

rans

fer

Bow

man

et a

l. (2

015)

Partially proof-based

Supervised Learning

Deep Learning

Modern Inference Systems

20

36

52

68

84

100

86.886.687.386.685.38685.385.3

Accu

racy

Rand

om G

uess

ing

Tran

sfor

mat

ion-

base

d St

ern

and

Dag

an (2

012)

Bag

of W

ords

Logi

stic

Reg

ress

ion

Mag

nini

et a

l. (2

014)

Bag

of V

ecto

rs

RNN

LSTM

LSTM

+ T

rans

fer

Bow

man

et a

l. (2

015)

Correct representation is difficult to capture explicitly and it is not currently being learned implicitly.

Modern Inference Systems

20

36

52

68

84

100

86.886.687.386.685.38685.385.3

Accu

racy

Rand

om G

uess

ing

Tran

sfor

mat

ion-

base

d St

ern

and

Dag

an (2

012)

Bag

of W

ords

Logi

stic

Reg

ress

ion

Mag

nini

et a

l. (2

014)

Bag

of V

ecto

rs

RNN

LSTM

LSTM

+ T

rans

fer

Bow

man

et a

l. (2

015)

Correct representation is difficult to capture explicitly and is currently not being learned implicitly.

Modern Inference Systems

Discussion

Discussion

The roared.crowd

Discussion

The roared.

enthusiastic crowd

crowd

Discussion

enthusiastic crowd

crowd

Set Containment

Discussion

Set Containment

DiscussionThe roared.crowd

DiscussionThe ___ roared.crowd

P(enthusiastic) P(silent) P(imaginary)

Language Modeling

Discussion

Word Sense Disambiguation

The roared.crowd

enthusiastic crowd

silent crowd

imaginary crowd

DiscussionThe roared.crowd

Reference

DiscussionThe roared.crowd

Reference

DiscussionThe roared.crowd

enthusiastic crowd

Reference

DiscussionThe roared.crowd

enthusiastic crowd

real

DiscussionThe roared.crowd

enthusiastic crowd

real

human

DiscussionThe roared.crowd

enthusiastic crowd

real

making noise human

DiscussionThe roared.crowd

enthusiastic crowd

real

humanmaking noise

excited/happy

DiscussionThe roared.crowd

enthusiastic crowd

real

humanmaking noise

excited/happy

excited/happy making noise

clapping yelling human

DiscussionThe roared.crowd

enthusiastic crowd

real

humanmaking noise

excited/happy

excited/happy making noise

clapping yelling human

DiscussionThe roared.crowd

enthusiastic crowd

real

humanmaking noise

excited/happy

excited/happy making noise

clapping yelling human

Assigning intrinsic

meaning to modifiers…

DiscussionThe roared.crowd

enthusiastic crowd

real

humanmaking noise

excited/happy

excited/happy making noise

clapping yelling human

Determining whether they hold for individual

entities

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

composer

American composer

Introduction

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

American composer

Charles Mingus

Introduction

American composercomposer

Compositional Semantics

American

American composercomposer

Compositional Semantics

American composercomposer

Compositional Semantics

American composercomposer

Compositional SemanticsCan we assign intrinsic meaning to modifiers…

American composercomposer

Compositional Semantics

…in such a way that we can we determine whether the modifier

holds for individual entities in practice?

Can we assign intrinsic meaning to modifiers…

Step 1: Modifier InterpretationDetermine the properties entailed by the

modifier in the context of the head

American jazz composerborn in America

influential in America prolific while in America

a product of America lived in America visited America

popular in America

Step 1: Modifier InterpretationDetermine the properties entailed by the

modifier in the context of the head

born in America influential in America

prolific while in America a product of America

lived in America visited America

popular in America

Step 2: Class-Instance IdentificationDetermine, for a specific instance, whether

the necessary properties hold…Mingus's intricate, complex,

compositions in the genres of jazz and classical music illustrate his ability to be dynamic in both the

strings and the swing. Mingus truly was a product of America in all its historic complexities. His mother, Harriet, was half black and half

Chinese, and his father, Charles Sr., was half black and half Swedish, making Mingus a true reflection of

the hybrid nature of our divided nation…

American jazz composer

Modifier InterpretationAmerican composer

American composer

Modifier Interpretation

Americacomposer *

American composer

Modifier Interpretation

Americacomposer *

composer from America composer born in America composer popular in America composer active in America

American composer

Modifier Interpretation

Americacomposer *

⟨composer from America, 3702⟩ ⟨composer born in America, 1389⟩ ⟨composer popular in America, 1292⟩ ⟨composer active in America, 2041⟩

American composer

Modifier Interpretation

Americacomposer *

⟨composer from America, 3702⟩ ⟨composer born in America, 1389⟩ ⟨composer popular in America, 1292⟩ ⟨composer active in America, 2041⟩

P(Y|X) = 11 + eXβ

American composer

Modifier Interpretation

Americacomposer *

⟨composer from America, 0.93⟩ ⟨composer born in America, 0.94⟩ ⟨composer popular in America, 0.45⟩ ⟨composer active in America, 0.52⟩

P(Y|X) = 11 + eXβ

American composer

Modifier Interpretation

Americacomposer *

⟨composer born in America, 0.94⟩ ⟨composer from America, 0.93⟩ ⟨composer active in America, 0.52⟩ ⟨composer popular in America, 0.45⟩

P(Y|X) = 11 + eXβ

Modifier Interpretation

American composer born in America

American company based in America

American novel written in America

Produces good

results…

Modifier Interpretation

child actor has child

risk manager takes risks

machine gun used by machine

…but not perfect.

Class-Instance Identification

Class-Instance Identification

American composer

⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩

Weighted modifier interpretations

Americacomposer *

American composer

* is a composer

J.S. Bach Charles Mingus John Cage W.A. Mozart

Candidate instances

Class-Instance Identification

⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩

“J.S. Bach born in America”

J.S. Bach Charles Mingus John Cage W.A. Mozart

Class-Instance Identification

American composer

⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩

Confidence = 0.94x21 + 0.93x34 + 0.52x329 + 0.45x4,043

“J.S. Bach from America”

J.S. Bach Charles Mingus John Cage W.A. Mozart

Class-Instance Identification

⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩

American composer

Confidence = 0.94x21 + 0.93x34 + 0.52x329 + 0.45x4,043

“J.S. Bach active in America”

J.S. Bach Charles Mingus John Cage W.A. Mozart

Class-Instance Identification

⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩

American composer

Confidence = 0.94x21 + 0.93x34 + 0.52x329 + 0.45x4,043

“J.S. Bach popular in America”

J.S. Bach Charles Mingus John Cage W.A. Mozart

Confidence = 0.94x21 + 0.93x34 + 0.52x329 + 0.45x4,043

Class-Instance Identification

⟨___ born in America, 0.94⟩ ⟨___ from America, 0.93⟩ ⟨___ active in America, 0.52⟩ ⟨___ popular in America, 0.45⟩

American composer

Class-Instance IdentificationAmerican composer jazz composer

JS Bach 0.21 0.04Charles Mingus 0.89 0.93John Cage 0.96 0.52WA Mozart 0.19 0.13Libby Larsen 0.72 0.24Duke Ellington 0.76 0.97Palestrina 0.04 0.03Ludwig van Beethoven 0.09 0.12Morton Feldman 0.88 0.31Frederick Chopin 0.33 0.32Barack Obama 0.14 0.35Herbie Hancock 0.62 0.95

Class-Instance IdentificationAmerican jazz composer

JS Bach 0.25Charles Mingus 1.82John Cage 1.48WA Mozart 0.32Libby Larsen 0.96Duke Ellington 1.73Palestrina 0.07Ludwig van Beethoven 0.21Morton Feldman 1.19Frederick Chopin 0.65Barack Obama 0.49Herbie Hancock 1.57

Class-Instance IdentificationAmerican jazz composer

Charles Mingus 1.82Duke Ellington 1.73Herbie Hancock 1.57John Cage 1.48Morton Feldman 1.19Libby Larsen 0.96Frederick Chopin 0.65Barack Obama 0.49WA Mozart 0.32JS Bach 0.25Ludwig van Beethoven 0.21Palestrina 0.07

Reconstructing Wikipedia

10

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.

(a) Uniform random sample. (b) Weighted random sample.

Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.

sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.

Uniform WeightedAUC Rec. AUC Recall

Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10

Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).

7 Conclusion

We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.

TODO: Add two or three lines of futurework.

TODO: Break some of the longer sen-tences containing “which” or “that”.

References

Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.

Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.

Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.

Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.

Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.

E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial

Reconstructing Wikipedia

10

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.

(a) Uniform random sample. (b) Weighted random sample.

Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.

sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.

Uniform WeightedAUC Rec. AUC Recall

Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10

Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).

7 Conclusion

We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.

TODO: Add two or three lines of futurework.

TODO: Break some of the longer sen-tences containing “which” or “that”.

References

Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.

Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.

Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.

Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.

Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.

E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial

Reconstructing Wikipedia

Best Existing Non-Compositional Method

(Lexico-Syntactic Patterns) AUC = 0.57

10

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.

(a) Uniform random sample. (b) Weighted random sample.

Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.

sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.

Uniform WeightedAUC Rec. AUC Recall

Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10

Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).

7 Conclusion

We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.

TODO: Add two or three lines of futurework.

TODO: Break some of the longer sen-tences containing “which” or “that”.

References

Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.

Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.

Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.

Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.

Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.

E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial

Reconstructing WikipediaBest Proposed Compositional Method

AUC = 0.73

Best Existing Non-Compositional Method

(Lexico-Syntactic Patterns) AUC = 0.57

Lexical Entailment

Semantic Containment

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

American composer

Charles Mingus

Introduction

Lexical Entailment

Semantic Containment

Introduction

Summary and Future Work

Class-Instance Identification

Adding Semantics to Data-Driven Paraphrasing. Pavlick et al. ACL (2015)

Compositional Entailment in Adjective Nouns. Pavlick and Callison-Burch. ACL (2016) So-Called Non-Subsective Adjectives. Pavlick and Callison-Burch. *SEM (2016)

Fine-Grained Class Extraction via Modifier Composition. Pavlick and Pasca. ACL (2017)

Modifier-Noun Composition

Lexical Entailment

Semantic Containment

Class-Instance Identification

Lexical Entailment

Semantic Containment

Class-Instance Identification

Equivalence Reverse Entailment

Forward Entailment Independent Exclusion

Lexical Entailment

Semantic Containment

Class-Instance Identification

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.61

0.49

No Axioms

Using PPDB

Using WordNet

Human Oracle

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

Plain Non-Subsective

5%14%

7%

54%

19%

Subsective

4%1%

67%

28%

Privative

37%

16% 1%16%

28%

3%

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

20

36

52

68

84

100

86.886.687.386.685.38685.385.3

Rand

om G

uess

ing

Tran

sfor

mat

ion-

base

d St

ern

and

Dag

an (2

012)

Bag

of W

ords

Logi

stic

Reg

ress

ion

Mag

nini

et a

l. (2

014)

Bag

of V

ecto

rs

RNN

LSTM

LSTM

+ T

rans

fer

Bow

man

et a

l. (2

015)

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

2036526884

100

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

2036526884

100

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

2036526884

100

composersAmerican

composers

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

2036526884

100

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

2036526884

100

10

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.

(a) Uniform random sample. (b) Weighted random sample.

Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.

sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.

Uniform WeightedAUC Rec. AUC Recall

Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10

Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).

7 Conclusion

We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.

TODO: Add two or three lines of futurework.

TODO: Break some of the longer sen-tences containing “which” or “that”.

References

Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.

Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.

Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.

Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.

Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.

E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial

Lexical Entailment

Semantic Containment

Class-Instance Identification

0.0

0.2

0.4

0.5

0.7 0.660.660.610.49

2036526884

100

10

900

901

902

903

904

905

906

907

908

909

910

911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937

938

939

940

941

942

943

944

945

946

947

948

949

950

951

952

953

954

955

956

957

958

959

960

961

962

963

964

965

966

967

968

969

970

971

972

973

974

975

976

977

978

979

980

981

982

983

984

985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

EACL 2017 Submission ***. Confidential review copy. DO NOT DISTRIBUTE.

(a) Uniform random sample. (b) Weighted random sample.

Figure 3: ROC curves for selected methods. Given a list of instances associated with confidence scores,ROC curves show the relationship between the number of true positives and the number of false positivesthat are retained by setting various threshold confidence values. The curve becomes linear once allremaining instances in the list have the same score (e.g., 0) as this makes it impossible to choose athreshold which adds true positives to the list without also including all remaining false positives.

sentences. Thus, it can provide non-zero scoresfor many more candidate instances. This enablesthe proposed methods to achieve a better trade-o↵ between extracting true positives versus falsepositives, than the baseline models do.

Uniform WeightedAUC Rec. AUC Recall

Baseline 0.55 0.23 0.53 0.28Hearst 0.56 0.03 0.52 0.02Hearst\ 0.57 0.04 0.53 0.02ModsH 0.68 0.08 0.60 0.06ModsI 0.71 0.09 0.65 0.09Hearst\+ModsH 0.70 0.09 0.61 0.08Hearst\+ModsI 0.73 0.10 0.66 0.10

Table 7: Recall of instances listed on Wikipediacategory pages. “Rec” is the recall against the en-tire set of instances appearing on the Wikipediapages. AUC captures the tradeo↵ between truepositives and false positives (see Figure 3).

7 Conclusion

We have presented an approach to IsA extractionwhich takes advantage of the compositionality ofnatural language. Existing approaches often treatclass labels as atomic units which must be observedin full in order to be populated with instances. Asa result, current methods are not able to handlethe infinite number of classes describable in natu-ral language, most of which never appear in text.Our method works by reasoning about each modi-fier in the label individually, in terms of the prop-erties that it implies about the instances. Thisapproach allows us to harness information that isspread across multiple sentences, and results in asignificant increase in the number of fine-grainedclasses which we are able to populate.

TODO: Add two or three lines of futurework.

TODO: Break some of the longer sen-tences containing “which” or “that”.

References

Anonymous. 2016. Unsupervised interpretation ofmultiple-modifier noun phrases. In submission.

Mohit Bansal, David Burkett, Gerard de Melo, andDan Klein. 2014. Structured learning for taxon-omy induction with belief propagation. In Pro-ceedings of the 52nd Annual Meeting of the Asso-ciation for Computational Linguistics (Volume1: Long Papers), pages 1041–1051, Baltimore,Maryland, June. Association for ComputationalLinguistics.

Marco Baroni and Roberto Zamparelli. 2010.Nouns are vectors, adjectives are matrices: Rep-resenting adjective-noun constructions in seman-tic space. In Proceedings of the 2010 Conferenceon Empirical Methods in Natural Language Pro-cessing, pages 1183–1193. Association for Com-putational Linguistics.

Lidong Bing, Sneha Chaudhari, Richard Wang,andWilliam Cohen. 2015. Improving distant su-pervision for information extraction using labelpropagation through lists. In Proceedings of the2015 Conference on Empirical Methods in Natu-ral Language Processing, pages 524–529, Lisbon,Portugal, September. Association for Computa-tional Linguistics.

Kurt Bollacker, Colin Evans, Praveen Paritosh,Tim Sturge, and Jamie Taylor. 2008. Free-base: a collaboratively created graph databasefor structuring human knowledge. In Proceed-ings of the 2008 ACM SIGMOD internationalconference on Management of data, pages 1247–1250. ACM.

E. Choi, T. Kwiatkowski, and L. Zettlemoyer.2015. Scalable semantic parsing with partial

Future Directions

The roared.crowd

enthusiastic crowd

real

humanmaking noise

excited/happy

excited/happy making noise

clapping yelling human

Future Directions

Future DirectionsThe roared.crowd

real

humanmaking noise

excited/happy

Future DirectionsThe roared.crowd

real

humanmaking noise

happy

enthusiastic

Future DirectionsThe red circle.

Future Directions“common

sense knowledge”

Future Directions“common

sense knowledge”What is it?

World Knowledge? Pragmatics?

How do we represent it?

Distributional? Symbolic? Triple stores? Probability distributions?

How is it learned?Is it distributional?

Is text enough?

When/how is it accessed?What can be

precomputed? What happens at

“runtime”?

Thank you! Questions!