1 Textual Entailment as a Framework for Applied Semantics Ido DaganBar-Ilan University, Israel Joint...

Post on 19-Dec-2015

218 views 0 download

Tags:

transcript

1

Textual Entailment as a Framework

for Applied Semantics

Ido Dagan Bar-Ilan University, Israel

Joint works with:Oren Glickman, Idan Szpektor, Roy Bar Haim, Maayan Geffet, Moshe Koppel, Efrat Marmorshtein, Bar Ilan UniversityShachar Mirkin Hebrew University, IsraelHristo Tanev, Bernardo Magnini, Alberto Lavelli, Lorenza Romano ITC-irst, ItalyBonaventura Coppola, Milen Kouylekov

University of Trento and ITC-irst, ItalyDanilo Giampiccolo, CELCT, Italy Dan Roth, UIUC

2

Applied Semantics forText Understanding/Reading

• Understanding text meaning refers to the semantic level of language

• An applied computational framework for semantics is needed

• Such common framework is still missing

3

Desiderata for Modeling Framework

• A framework for a target level of language processing should provide:

1) Generic module for applications2) Unified paradigm for investigating language

phenomena3) Unified knowledge representation

• Most semantics research is scattered – WSD, NER, SRL, lexical semantics relations…

(e.g. vs. syntax)– Dominating approach - interpretation

4

Outline

• The textual entailment task – what and why?• Evaluation – PASCAL RTE Challenges• Modeling approach:

– Knowledge acquisition

– Inference (briefly)

– Application example

• An alternative framework for investigating semantics

5

Natural Language and Meaning

Meaning

Language

Ambiguity

Variability

6

Variability of Semantic Expression

Model variability as relations between text expressions:• Equivalence: expr1 expr2 (paraphrasing)• Entailment: expr1 expr2 – the general case

– Incorporates inference as well

Dow ends up

Dow climbs 255

The Dow Jones Industrial Average closed up 255

Stock market hits a record high

Dow gains 255 points

7

Typical Application Inference

Overture’s acquisition by Yahoo

Yahoo bought Overture

Question Expected answer formWho bought Overture? >> X bought Overture

• Similar for IE: X buy Y

• Similar for “semantic” IR: t: Overture was bought …

• Summarization (multi-document) – identify redundant info

• MT evaluation (and recent ideas for MT)

• Educational applications

text hypothesized answer

entails

8

KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS

(IJCAI-05)

CFP:– Reasoning aspects:

    * information fusion,    * search criteria expansion models     * summarization and intensional answers,    * reasoning under uncertainty or with incomplete

knowledge,– Knowledge representation and integration:

    * levels of knowledge involved (e.g. ontologies, domain knowledge),

    * knowledge extraction models and techniques to optimize response accuracy

… but similar needs for other applications – can entailment provide a common empirical task?

9

Classical Entailment Definition

• Chierchia & McConnell-Ginet (2001):A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true

• Strict entailment - doesn't account for some uncertainty allowed in applications

10

“Almost certain” Entailments

t: The technological triumph known as GPS … was incubated in the mind of Ivan Getting.

h: Ivan Getting invented the GPS.

11

Applied Textual Entailment• Directional relation between two text

fragments: Text (t) and Hypothesis (h):

t entails h (th) if, typically, a human reading t would infer that h is most likely true

• Operational (applied) definition:– Human gold standard - as in NLP applications– Assuming common background knowledge – which

is indeed expected from applications!

12

Probabilistic InterpretationDefinition: • t probabilistically entails h if:

– P(h is true | t) > P(h is true)• t increases the likelihood of h being true • ≡ Positive PMI – t provides information on h’s truth

• P(h is true | t ): entailment confidence– The relevant entailment score for applications– In practice: “most likely” entailment expected

13

The Role of Knowledge

• For textual entailment to hold we require:– text AND knowledge h

but – knowledge should not entail h alone

• Systems are not supposed to validate h’s truth without utilizing t

14

PASCAL Recognizing Textual Entailment (RTE) Challenges

EU FP-6 Funded PASCAL NOE 2004-7

Bar-Ilan University ITC-irst and CELCT, TrentoMITRE Microsoft Research

15

Generic Dataset by Application Use

• 7 application settings in RTE-1, 4 in RTE-2/3– QA – IE– “Semantic” IR– Comparable documents / multi-doc summarization– MT evaluation– Reading comprehension – Paraphrase acquisition

• Most data created from actual applications output• RTE-2: 800 examples in development and test sets• 50-50% YES/NO split

16

Some Examples

TEXTHYPOTHESISTASKENTAIL-

MENT

1Regan attended a ceremony in Washington to commemorate the landings in Normandy.

Washington is located inNormandy.

IEFalse

2Google files for its long awaited IPO.Google goes public.IRTrue

3

…: a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others.

Cardinal Juan Jesus Posadas Ocampo died in 1993.

QATrue

4

The SPD got just 21.5% of the votein the European Parliament elections,while the conservative opposition partiespolled 44.5%.

The SPD is defeated by

the opposition parties.IETrue

17

Participation and Impact• Very successful challenges, world wide:

– RTE-1 – 17 groups – RTE-2 – 23 groups

• 30 groups in total• ~150 downloads!

– RTE-3 underway – 25 groups• Joint workshop at ACL-07

• High interest in the research community– Papers, conference sessions and areas, PhD’s, influence

on funded projects– Textual Entailment special issue at JNLE– ACL-07 tutorial

18

Methods and Approaches (RTE-2)• Measure similarity match between t and h

(coverage of h by t): – Lexical overlap (unigram, N-gram, subsequence)– Lexical substitution (WordNet, statistical)– Syntactic matching/transformations– Lexical-syntactic variations (“paraphrases”)– Semantic role labeling and matching– Global similarity parameters (e.g. negation, modality)

• Cross-pair similarity• Detect mismatch (for non-entailment)• Logical interpretation and inference (vs. matching)

19

Dominant approach: Supervised Learning

• Features model similarity and mismatch• Classifier determines relative weights of information sources• Train on development set and auxiliary t-h corpora

t,hSimilarity Features:

Lexical, n-gram,syntacticsemantic, global

Feature vector

Classifier

YES

NO

20

Results

First Author (Group)AccuracyAverage Precision

Hickl (LCC)75.4%80.8%

Tatu (LCC)73.8%71.3%

Zanzotto (Milan & Rome)63.9%64.4%

Adams (Dallas)62.6%62.8%

Bos (Rome & Leeds)61.6%66.9%

11 groups58.1%-60.5%

7 groups52.9%-55.6%

Average: 60%Median: 59%

21

Analysis• For the first time: deeper methods (semantic/

syntactic/ logical) clearly outperform shallow methods (lexical/n-gram)

Cf. Kevin Knight’s invited talk at EACL-06, titled:

Isn’t linguistic Structure Important, Asked the Engineer

• Still, most systems based on deep analysis did not score significantly better than the lexical baseline

22

Why?• System reports point at:

– Lack of knowledge (syntactic transformation rules, paraphrases, lexical relations, etc.)

– Lack of training data

• It seems that systems that coped better with these issues performed best:– Hickl et al. - acquisition of large entailment corpora

for training– Tatu et al. – large knowledge bases (linguistic and

world knowledge)

23

Some suggested research directions

• Knowledge acquisition– Unsupervised acquisition of linguistic and world

knowledge from general corpora and web– Acquiring larger entailment corpora– Manual resources and knowledge engineering

• Inference– Principled framework for inference and fusing

information levels– Are we happy with bags of features?

24

Complementary Evaluation Modes

• Entailment subtasks evaluations– Lexical, lexical-syntactic, logical, alignment…

• “Seek” mode:– Input: h and corpus– Output: All entailing t’s in corpus– Captures information seeking needs, but requires post-

run annotation (TREC style)

• Contribution to specific applications!– QA – Harabagiu & Hickl, ACL-06;

RE – Romano et al., EACL-06

25

Our Own Research Directions

AcquisitionInference

Applications

26

Learning Entailment Rules

Text:Aspirin prevents

Heart Attacks

Q: What reduces the risk of Heart Attacks?

Entailment Rule:X prevent Y ⇨ X reduce risk of Y

Hypothesis: Aspirin reduces the risk of

Heart Attacks

Need a large knowledge base of entailment rules

template template

27

TEASE – Algorithm Flow

WEB

LexiconInput template:

Xsubj-accuse-objY

Sample corpus for input template:Paula Jones accused Clinton…Sanhedrin accused St.Paul……

Anchor sets:{Paula Jonessubj; Clintonobj}{Sanhedrinsubj; St.Paulobj}…

Sample corpus for anchor sets:Paula Jones called Clinton indictable…St.Paul defended before the Sanhedrin …

Templates:X call Y indictableY defend before X…

TEASE

Anchor Set Extraction

(ASE)

Template Extraction

(TE)

iterate

28

Sample of ExtractedAnchor-Sets for X prevent Y

X=‘sunscreens’, Y=‘sunburn’

X=‘sunscreens’, Y=‘skin cancer’

X=‘vitamin e’, Y=‘heart disease’

X=‘aspirin’, Y=‘heart attack’

X=‘vaccine candidate’, Y=‘infection’

X=‘universal precautions’, Y=‘HIV’

X=‘safety device’, Y=‘fatal injuries’

X=‘hepa filtration’, Y=‘contaminants’

X=‘low cloud cover’, Y= ‘measurements’

X=‘gene therapy’, Y=‘blindness’

X=‘cooperation’, Y=‘terrorism’

X=‘safety valve’, Y=‘leakage’

X=‘safe sex’, Y=‘cervical cancer’

X=‘safety belts’, Y=‘fatalities’

X=‘security fencing’, Y=‘intruders’

X=‘soy protein’, Y=‘bone loss’

X=‘MWI’, Y=‘pollution’

X=‘vitamin C’, Y=‘colds’

29

Sample of Extracted Templates for X prevent Y

X reduce Y

X protect against Y

X eliminate Y

X stop Y

X avoid Y

X for prevention of Y

X provide protection against Y

X combat Y

X ward Y

X lower risk of Y

X be barrier against Y

X fight Y

X reduce Y risk

X decrease the risk of Y

relationship between X and Y

X guard against Y

X be cure for Y

X treat Y

X in war on Y

X in the struggle against Y

X a day keeps Y away

X eliminate the possibility of Y

X cut risk Y

X inhibit Y

30

Experiment and Evaluation

• 48 randomly chosen input verbs• 1392 templates extracted ; human judgments Encouraging Results:

• Future work: precision, estimate probabilities

Average Yieldper verb

29 correct templates per verb

Average Precisionper verb

45.3%

31

Acquiring Lexical Entailment Relations

• COLING-04, ACL-05Lexical entailment via distributional similarity– Individual features characterize semantic propertiesObtain characteristic features via bootstrappingTest characteristic feature inclusion (vs. overlap)

• COLING-ACL-06Integrate pattern-based extraction– NP such as NP1, NP2, …– Complementary information to distributional evidence– Integration using ML with minimal supervision (10 words)

32

Acquisition Example

• Does not overlap traditional ontological relations

•Top-ranked entailments for “company”:

firm, bank, group, subsidiary, unit, business, supplier, carrier, agency, airline, division, giant,

entity, financial institution, manufacturer, corporation, commercial bank, joint venture, maker, producer, factory …

33

Initial Probabilistic Lexical Co-occurrence Models

• Alignment-based (RTE-1 & ACL-05 Workshop)– The probability that a term in h is entailed by a

particular term in t

• Bayesian classification (AAAI-05)– The probability that a term in h is entailed by (fits in)

the entire text of t

– An unsupervised text categorization setting – each term is a category

• Demonstrate directions for probabilistic modeling and unsupervised estimation

34

Manual Syntactic Transformations Example: ‘X prevent Y ’

sunscreen

which

prevents

moles

and sunburns

()

subj

obj

conjmod

N1

N2and

N2

rel

rel

mod conj

• Sunscreen, which prevents moles and sunburns, ….

preventsubj obj

X Y

35

Syntactic Variability Phenomena

Template: X activate Y

PhenomenonExample

Passive formY is activated by X

AppositionX activates its companion, Y

ConjunctionX activates Z and Y

SetX activates two proteins: Y and Z

Relative clauseX, which activates Y

CoordinationX binds and activates Y

Transparent headX activates a fragment of Y

Co-referenceX is a kinase, though it activates Y

36

Takeout

• Promising potential for creating huge entailment knowledge bases– Mostly by unsupervised approaches– Manually encoded– Derived from lexical resources

• Potential for uniform representations, such as entailment rules, for different types of semantic and world knowledge

37

Inference

• Goal: infer hypothesis from text– Match and apply available entailment knowledge

– Heuristically bridge inference gaps

• Our approach: mapping language constructs– Vs. semantic interpretation

– Lexical-syntactic structures as meaning representation• Amenable for unsupervised learning

– Entailment rule transformations over syntactic trees

38

Application:Unsupervised

Relation Extraction

EACL 2006

39

Relation Extraction

• Subfield of Information Extraction• Identify different ways of expressing a target relation

– Examples: Management Succession, Birth - Death, Mergers and Acquisitions, Protein Interaction

• Traditionally performed in a supervised manner– Requires dozens-hundreds examples per relation– Examples should cover broad semantic variability

• Costly - Feasible???

• Little work on unsupervised approaches

40

Our Goals

Entailment Approachfor

Relation Extraction

UnsupervisedRelation Extraction

System

Evaluation Framework forEntailment Rule Acquisition

and Matching

41

Proposed Approach

Input TemplateX prevent Y

Entailment Rule Acquisition

TemplatesX prevention for Y, X treat Y, X reduce Y

Syntactic Matcher

Relation Instances<sunscreen, sunburns>

TEASE

TransformationRules

42

Dataset

• Bunescu 2005

• Recognizing interactions between annotated proteins pairs– 200 Medline abstracts– Gold standard dataset of protein pairs

• Input template : X interact with Y

43

Manual Analysis - Results• 93% of interacting protein pairs can be identified with lexical syntactic

templates

Phenomenon%Phenomenon%

transparent head34relative clause8

apposition24co-reference7

conjunction24coordination7

set13passive form2

R(%)# templatesR(%)# templates

1026039

2047073

30680107

401190141

5021100175

Frequency of syntactic phenomena:

Number of templates vs. recall (within 93%):

44

TEASE Output for X interact with Y

A sample of correct templates learned:

X bind to YX binding to Y

X activate YX Y interaction

X stimulate YX attach to Y

X couple to YX interaction with Y

interaction between X and YX trap Y

X become trapped in YX recruit Y

X Y complexX associate with Y

X recognize YX be linked to Y

X block YX target Y

45

• Iterative - taking the top 5 ranked templates as input• Morph - recognizing morphological derivations

(cf. semantic role labeling vs. matching)

ExperimentRecall

input39%

input + iterative49%

input + iterative + morph

63%

TEASE algorithm - Potential Recall on Training Set

46

Results for Full System

Error sources:

• Dependency parser and syntactic matching errors

• No morphological derivation recognition

• TEASE limited precision (incorrect templates)

RecallPrecisionF1

Input18%62%0.28

input + iterative

29%42%0.34

47

Vs Supervised Approaches• 180 training abstracts

48

49

50

51

Textual Entailment as a Framework for

Investigating Semantics

52

Classical Approach = Interpretation

Stipulated Meaning

Representation(by scholar)

Language(by nature)

Variability

Logical forms, word senses, semantic roles, named entity types, … - scattered tasks

Feasible/suitable framework for applied semantics?

53

Textual Entailment = Text Mapping

Assumed Meaning (by humans)

Language(by nature)

Variability

54

General Case – Inference

MeaningRepresentation

Language

Entailment mapping is the actual applied goal - but also a touchstone for understanding!

Interpretation becomes a possible mean

Inference

Interpretation

Textual Entailment

55

Some perspectives

• Issues with interpretation approach:– Hard to agree on a representation language– Costly to annotate semantic representations for

training

• Textual entailment refers to texts– Texts are theory neutral– Amenable for unsupervised learning– “Proof is in the pudding” test

56

Opens up a framework for investigating semantic issues

• Classical problems can be cast (linguistics)– All boys are nice All tall boys are nice

But also…

• A new slant at old problems

• Exposing many new ones

57

Making sense of (implicit) senses

• What is the RIGHT set of senses?– Any concrete set is problematic/subjective

– … but WSD forces you to choose one

• A lexical entailment perspective:– Instead of identifying an explicitly stipulated sense of a

word occurrence …

– identify whether a word occurrence (i.e. its implicit sense) entails another word occurrence, in context

– ACL-2006

58

Lexical Matching for Applications

• Sense equivalence

T1: IKEA announced a new comfort chair

Q: announcement of new models of chairs

T2: MIT announced a new CS chair position

T1: IKEA announced a new comfort chair

Q: announcement of new models of furnitures

T2: MIT announced a new CS chair position

• Sense entailment in substitution

59

Synonym Substitution

Source = record Target = disc

This is anyway a stunning disc, thanks to the playing of the Moscow Virtuosi with Spivakov.

He said computer networks would not be affected and copies of information should be made on floppy discs.

Before the dead soldier was placed in the ditch his personal possessions were removed, leaving one disc on the body for identification purposes.

positive

negative

negative

60

Investigated Methods

• Matching: indirect direct

Learning: supervised unsupervised

Task: classification ranking

61

Unsupervised Direct: kNN-ranking

• Test example score: Average Cosine similarity of target example with k most similar instances of source word

• Rational:– positive examples of target will be similar to some

source occurrence (of corresponding sense)

– negative examples won’t be similar to source

• Rank test examples by score– A classification slant on language modeling

62

Results (for synonyms): Ranking

kNN improves 8-18% precision up to 25% recall

63

Other Projected and New Problems

• Named Entity Classification – by any textual type– Which pickup trucks are produced by Mitsubishi?

Magnum pickup truck• Lexical semantic relationships (e.g. Wordnet)

– Which relations contribute to entailment inference? How?• Semantic role mapping (vs. labeling)

• Recognize transparent heads• Topical entailment – entailing textually defined topics• …

64

Textual Entailment as Goal

• The essence of our proposal: – Formulate various semantic problems as entailment tasks

– Base applied inference on entailment “engines” and KBs

• Interpretations and mapping methods may compete• Open question: which inference

– can be represented at language level?

– requires logical or specialized representation and inference? (temporal, spatial, mathematical, …)

65

Meeting the knowledge challenge – by a coordinated effort?

• A vast amount of “entailment rules” needed• Speculation: is it possible to have a public effort

for knowledge acquisition?– Simple, uniform representations

– Assuming mostly automatic acquisition (millions of rules?)

– Human Genome Project analogy

• Preliminary: RTE-3 Resources Pool at ACLWiki

66

Textual Entailment ≈ Human Reading Comprehension

• From a children’s English learning book(Sela and Greenberg):

Reference Text: “…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida. …”

Hypothesis (True/False?): The Bermuda Triangle is near the United States

???

67

Optimistic Conclusions: Textual Entailment…

is a promising framework for applied semantics:– Defines new semantic problems to work on– May be modeled probabilistically– Appealing potential for knowledge acquisition

Thank you!