Post on 19-Dec-2015
transcript
1
Textual Entailment as a Framework
for Applied Semantics
Ido Dagan Bar-Ilan University, Israel
Joint works with:Oren Glickman, Idan Szpektor, Roy Bar Haim, Maayan Geffet, Moshe Koppel, Efrat Marmorshtein, Bar Ilan UniversityShachar Mirkin Hebrew University, IsraelHristo Tanev, Bernardo Magnini, Alberto Lavelli, Lorenza Romano ITC-irst, ItalyBonaventura Coppola, Milen Kouylekov
University of Trento and ITC-irst, ItalyDanilo Giampiccolo, CELCT, Italy Dan Roth, UIUC
2
Applied Semantics forText Understanding/Reading
• Understanding text meaning refers to the semantic level of language
• An applied computational framework for semantics is needed
• Such common framework is still missing
3
Desiderata for Modeling Framework
• A framework for a target level of language processing should provide:
1) Generic module for applications2) Unified paradigm for investigating language
phenomena3) Unified knowledge representation
• Most semantics research is scattered – WSD, NER, SRL, lexical semantics relations…
(e.g. vs. syntax)– Dominating approach - interpretation
4
Outline
• The textual entailment task – what and why?• Evaluation – PASCAL RTE Challenges• Modeling approach:
– Knowledge acquisition
– Inference (briefly)
– Application example
• An alternative framework for investigating semantics
5
Natural Language and Meaning
Meaning
Language
Ambiguity
Variability
6
Variability of Semantic Expression
Model variability as relations between text expressions:• Equivalence: expr1 expr2 (paraphrasing)• Entailment: expr1 expr2 – the general case
– Incorporates inference as well
Dow ends up
Dow climbs 255
The Dow Jones Industrial Average closed up 255
Stock market hits a record high
Dow gains 255 points
7
Typical Application Inference
Overture’s acquisition by Yahoo
Yahoo bought Overture
Question Expected answer formWho bought Overture? >> X bought Overture
• Similar for IE: X buy Y
• Similar for “semantic” IR: t: Overture was bought …
• Summarization (multi-document) – identify redundant info
• MT evaluation (and recent ideas for MT)
• Educational applications
text hypothesized answer
entails
8
KRAQ'05 Workshop - KNOWLEDGE and REASONING for ANSWERING QUESTIONS
(IJCAI-05)
CFP:– Reasoning aspects:
* information fusion, * search criteria expansion models * summarization and intensional answers, * reasoning under uncertainty or with incomplete
knowledge,– Knowledge representation and integration:
* levels of knowledge involved (e.g. ontologies, domain knowledge),
* knowledge extraction models and techniques to optimize response accuracy
… but similar needs for other applications – can entailment provide a common empirical task?
9
Classical Entailment Definition
• Chierchia & McConnell-Ginet (2001):A text t entails a hypothesis h if h is true in every circumstance (possible world) in which t is true
• Strict entailment - doesn't account for some uncertainty allowed in applications
10
“Almost certain” Entailments
t: The technological triumph known as GPS … was incubated in the mind of Ivan Getting.
h: Ivan Getting invented the GPS.
11
Applied Textual Entailment• Directional relation between two text
fragments: Text (t) and Hypothesis (h):
t entails h (th) if, typically, a human reading t would infer that h is most likely true
• Operational (applied) definition:– Human gold standard - as in NLP applications– Assuming common background knowledge – which
is indeed expected from applications!
12
Probabilistic InterpretationDefinition: • t probabilistically entails h if:
– P(h is true | t) > P(h is true)• t increases the likelihood of h being true • ≡ Positive PMI – t provides information on h’s truth
• P(h is true | t ): entailment confidence– The relevant entailment score for applications– In practice: “most likely” entailment expected
13
The Role of Knowledge
• For textual entailment to hold we require:– text AND knowledge h
but – knowledge should not entail h alone
• Systems are not supposed to validate h’s truth without utilizing t
14
PASCAL Recognizing Textual Entailment (RTE) Challenges
EU FP-6 Funded PASCAL NOE 2004-7
Bar-Ilan University ITC-irst and CELCT, TrentoMITRE Microsoft Research
15
Generic Dataset by Application Use
• 7 application settings in RTE-1, 4 in RTE-2/3– QA – IE– “Semantic” IR– Comparable documents / multi-doc summarization– MT evaluation– Reading comprehension – Paraphrase acquisition
• Most data created from actual applications output• RTE-2: 800 examples in development and test sets• 50-50% YES/NO split
16
Some Examples
TEXTHYPOTHESISTASKENTAIL-
MENT
1Regan attended a ceremony in Washington to commemorate the landings in Normandy.
Washington is located inNormandy.
IEFalse
2Google files for its long awaited IPO.Google goes public.IRTrue
3
…: a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others.
Cardinal Juan Jesus Posadas Ocampo died in 1993.
QATrue
4
The SPD got just 21.5% of the votein the European Parliament elections,while the conservative opposition partiespolled 44.5%.
The SPD is defeated by
the opposition parties.IETrue
17
Participation and Impact• Very successful challenges, world wide:
– RTE-1 – 17 groups – RTE-2 – 23 groups
• 30 groups in total• ~150 downloads!
– RTE-3 underway – 25 groups• Joint workshop at ACL-07
• High interest in the research community– Papers, conference sessions and areas, PhD’s, influence
on funded projects– Textual Entailment special issue at JNLE– ACL-07 tutorial
18
Methods and Approaches (RTE-2)• Measure similarity match between t and h
(coverage of h by t): – Lexical overlap (unigram, N-gram, subsequence)– Lexical substitution (WordNet, statistical)– Syntactic matching/transformations– Lexical-syntactic variations (“paraphrases”)– Semantic role labeling and matching– Global similarity parameters (e.g. negation, modality)
• Cross-pair similarity• Detect mismatch (for non-entailment)• Logical interpretation and inference (vs. matching)
19
Dominant approach: Supervised Learning
• Features model similarity and mismatch• Classifier determines relative weights of information sources• Train on development set and auxiliary t-h corpora
t,hSimilarity Features:
Lexical, n-gram,syntacticsemantic, global
Feature vector
Classifier
YES
NO
20
Results
First Author (Group)AccuracyAverage Precision
Hickl (LCC)75.4%80.8%
Tatu (LCC)73.8%71.3%
Zanzotto (Milan & Rome)63.9%64.4%
Adams (Dallas)62.6%62.8%
Bos (Rome & Leeds)61.6%66.9%
11 groups58.1%-60.5%
7 groups52.9%-55.6%
Average: 60%Median: 59%
21
Analysis• For the first time: deeper methods (semantic/
syntactic/ logical) clearly outperform shallow methods (lexical/n-gram)
Cf. Kevin Knight’s invited talk at EACL-06, titled:
Isn’t linguistic Structure Important, Asked the Engineer
• Still, most systems based on deep analysis did not score significantly better than the lexical baseline
22
Why?• System reports point at:
– Lack of knowledge (syntactic transformation rules, paraphrases, lexical relations, etc.)
– Lack of training data
• It seems that systems that coped better with these issues performed best:– Hickl et al. - acquisition of large entailment corpora
for training– Tatu et al. – large knowledge bases (linguistic and
world knowledge)
23
Some suggested research directions
• Knowledge acquisition– Unsupervised acquisition of linguistic and world
knowledge from general corpora and web– Acquiring larger entailment corpora– Manual resources and knowledge engineering
• Inference– Principled framework for inference and fusing
information levels– Are we happy with bags of features?
24
Complementary Evaluation Modes
• Entailment subtasks evaluations– Lexical, lexical-syntactic, logical, alignment…
• “Seek” mode:– Input: h and corpus– Output: All entailing t’s in corpus– Captures information seeking needs, but requires post-
run annotation (TREC style)
• Contribution to specific applications!– QA – Harabagiu & Hickl, ACL-06;
RE – Romano et al., EACL-06
25
Our Own Research Directions
AcquisitionInference
Applications
26
Learning Entailment Rules
Text:Aspirin prevents
Heart Attacks
Q: What reduces the risk of Heart Attacks?
Entailment Rule:X prevent Y ⇨ X reduce risk of Y
Hypothesis: Aspirin reduces the risk of
Heart Attacks
Need a large knowledge base of entailment rules
template template
27
TEASE – Algorithm Flow
WEB
LexiconInput template:
Xsubj-accuse-objY
Sample corpus for input template:Paula Jones accused Clinton…Sanhedrin accused St.Paul……
Anchor sets:{Paula Jonessubj; Clintonobj}{Sanhedrinsubj; St.Paulobj}…
Sample corpus for anchor sets:Paula Jones called Clinton indictable…St.Paul defended before the Sanhedrin …
Templates:X call Y indictableY defend before X…
TEASE
Anchor Set Extraction
(ASE)
Template Extraction
(TE)
iterate
28
Sample of ExtractedAnchor-Sets for X prevent Y
X=‘sunscreens’, Y=‘sunburn’
X=‘sunscreens’, Y=‘skin cancer’
X=‘vitamin e’, Y=‘heart disease’
X=‘aspirin’, Y=‘heart attack’
X=‘vaccine candidate’, Y=‘infection’
X=‘universal precautions’, Y=‘HIV’
X=‘safety device’, Y=‘fatal injuries’
X=‘hepa filtration’, Y=‘contaminants’
X=‘low cloud cover’, Y= ‘measurements’
X=‘gene therapy’, Y=‘blindness’
X=‘cooperation’, Y=‘terrorism’
X=‘safety valve’, Y=‘leakage’
X=‘safe sex’, Y=‘cervical cancer’
X=‘safety belts’, Y=‘fatalities’
X=‘security fencing’, Y=‘intruders’
X=‘soy protein’, Y=‘bone loss’
X=‘MWI’, Y=‘pollution’
X=‘vitamin C’, Y=‘colds’
29
Sample of Extracted Templates for X prevent Y
X reduce Y
X protect against Y
X eliminate Y
X stop Y
X avoid Y
X for prevention of Y
X provide protection against Y
X combat Y
X ward Y
X lower risk of Y
X be barrier against Y
X fight Y
X reduce Y risk
X decrease the risk of Y
relationship between X and Y
X guard against Y
X be cure for Y
X treat Y
X in war on Y
X in the struggle against Y
X a day keeps Y away
X eliminate the possibility of Y
X cut risk Y
X inhibit Y
30
Experiment and Evaluation
• 48 randomly chosen input verbs• 1392 templates extracted ; human judgments Encouraging Results:
• Future work: precision, estimate probabilities
Average Yieldper verb
29 correct templates per verb
Average Precisionper verb
45.3%
31
Acquiring Lexical Entailment Relations
• COLING-04, ACL-05Lexical entailment via distributional similarity– Individual features characterize semantic propertiesObtain characteristic features via bootstrappingTest characteristic feature inclusion (vs. overlap)
• COLING-ACL-06Integrate pattern-based extraction– NP such as NP1, NP2, …– Complementary information to distributional evidence– Integration using ML with minimal supervision (10 words)
32
Acquisition Example
• Does not overlap traditional ontological relations
•Top-ranked entailments for “company”:
firm, bank, group, subsidiary, unit, business, supplier, carrier, agency, airline, division, giant,
entity, financial institution, manufacturer, corporation, commercial bank, joint venture, maker, producer, factory …
33
Initial Probabilistic Lexical Co-occurrence Models
• Alignment-based (RTE-1 & ACL-05 Workshop)– The probability that a term in h is entailed by a
particular term in t
• Bayesian classification (AAAI-05)– The probability that a term in h is entailed by (fits in)
the entire text of t
– An unsupervised text categorization setting – each term is a category
• Demonstrate directions for probabilistic modeling and unsupervised estimation
34
Manual Syntactic Transformations Example: ‘X prevent Y ’
sunscreen
which
prevents
moles
and sunburns
()
subj
obj
conjmod
N1
N2and
N2
rel
rel
mod conj
• Sunscreen, which prevents moles and sunburns, ….
preventsubj obj
X Y
35
Syntactic Variability Phenomena
Template: X activate Y
PhenomenonExample
Passive formY is activated by X
AppositionX activates its companion, Y
ConjunctionX activates Z and Y
SetX activates two proteins: Y and Z
Relative clauseX, which activates Y
CoordinationX binds and activates Y
Transparent headX activates a fragment of Y
Co-referenceX is a kinase, though it activates Y
36
Takeout
• Promising potential for creating huge entailment knowledge bases– Mostly by unsupervised approaches– Manually encoded– Derived from lexical resources
• Potential for uniform representations, such as entailment rules, for different types of semantic and world knowledge
37
Inference
• Goal: infer hypothesis from text– Match and apply available entailment knowledge
– Heuristically bridge inference gaps
• Our approach: mapping language constructs– Vs. semantic interpretation
– Lexical-syntactic structures as meaning representation• Amenable for unsupervised learning
– Entailment rule transformations over syntactic trees
38
Application:Unsupervised
Relation Extraction
EACL 2006
39
Relation Extraction
• Subfield of Information Extraction• Identify different ways of expressing a target relation
– Examples: Management Succession, Birth - Death, Mergers and Acquisitions, Protein Interaction
• Traditionally performed in a supervised manner– Requires dozens-hundreds examples per relation– Examples should cover broad semantic variability
• Costly - Feasible???
• Little work on unsupervised approaches
40
Our Goals
Entailment Approachfor
Relation Extraction
UnsupervisedRelation Extraction
System
Evaluation Framework forEntailment Rule Acquisition
and Matching
41
Proposed Approach
Input TemplateX prevent Y
Entailment Rule Acquisition
TemplatesX prevention for Y, X treat Y, X reduce Y
Syntactic Matcher
Relation Instances<sunscreen, sunburns>
TEASE
TransformationRules
42
Dataset
• Bunescu 2005
• Recognizing interactions between annotated proteins pairs– 200 Medline abstracts– Gold standard dataset of protein pairs
• Input template : X interact with Y
43
Manual Analysis - Results• 93% of interacting protein pairs can be identified with lexical syntactic
templates
Phenomenon%Phenomenon%
transparent head34relative clause8
apposition24co-reference7
conjunction24coordination7
set13passive form2
R(%)# templatesR(%)# templates
1026039
2047073
30680107
401190141
5021100175
Frequency of syntactic phenomena:
Number of templates vs. recall (within 93%):
44
TEASE Output for X interact with Y
A sample of correct templates learned:
X bind to YX binding to Y
X activate YX Y interaction
X stimulate YX attach to Y
X couple to YX interaction with Y
interaction between X and YX trap Y
X become trapped in YX recruit Y
X Y complexX associate with Y
X recognize YX be linked to Y
X block YX target Y
45
• Iterative - taking the top 5 ranked templates as input• Morph - recognizing morphological derivations
(cf. semantic role labeling vs. matching)
ExperimentRecall
input39%
input + iterative49%
input + iterative + morph
63%
TEASE algorithm - Potential Recall on Training Set
46
Results for Full System
Error sources:
• Dependency parser and syntactic matching errors
• No morphological derivation recognition
• TEASE limited precision (incorrect templates)
RecallPrecisionF1
Input18%62%0.28
input + iterative
29%42%0.34
47
Vs Supervised Approaches• 180 training abstracts
48
49
50
51
Textual Entailment as a Framework for
Investigating Semantics
52
Classical Approach = Interpretation
Stipulated Meaning
Representation(by scholar)
Language(by nature)
Variability
Logical forms, word senses, semantic roles, named entity types, … - scattered tasks
Feasible/suitable framework for applied semantics?
53
Textual Entailment = Text Mapping
Assumed Meaning (by humans)
Language(by nature)
Variability
54
General Case – Inference
MeaningRepresentation
Language
Entailment mapping is the actual applied goal - but also a touchstone for understanding!
Interpretation becomes a possible mean
Inference
Interpretation
Textual Entailment
55
Some perspectives
• Issues with interpretation approach:– Hard to agree on a representation language– Costly to annotate semantic representations for
training
• Textual entailment refers to texts– Texts are theory neutral– Amenable for unsupervised learning– “Proof is in the pudding” test
56
Opens up a framework for investigating semantic issues
• Classical problems can be cast (linguistics)– All boys are nice All tall boys are nice
But also…
• A new slant at old problems
• Exposing many new ones
57
Making sense of (implicit) senses
• What is the RIGHT set of senses?– Any concrete set is problematic/subjective
– … but WSD forces you to choose one
• A lexical entailment perspective:– Instead of identifying an explicitly stipulated sense of a
word occurrence …
– identify whether a word occurrence (i.e. its implicit sense) entails another word occurrence, in context
– ACL-2006
58
Lexical Matching for Applications
• Sense equivalence
T1: IKEA announced a new comfort chair
Q: announcement of new models of chairs
T2: MIT announced a new CS chair position
T1: IKEA announced a new comfort chair
Q: announcement of new models of furnitures
T2: MIT announced a new CS chair position
• Sense entailment in substitution
59
Synonym Substitution
Source = record Target = disc
This is anyway a stunning disc, thanks to the playing of the Moscow Virtuosi with Spivakov.
He said computer networks would not be affected and copies of information should be made on floppy discs.
Before the dead soldier was placed in the ditch his personal possessions were removed, leaving one disc on the body for identification purposes.
positive
negative
negative
60
Investigated Methods
• Matching: indirect direct
Learning: supervised unsupervised
Task: classification ranking
61
Unsupervised Direct: kNN-ranking
• Test example score: Average Cosine similarity of target example with k most similar instances of source word
• Rational:– positive examples of target will be similar to some
source occurrence (of corresponding sense)
– negative examples won’t be similar to source
• Rank test examples by score– A classification slant on language modeling
62
Results (for synonyms): Ranking
kNN improves 8-18% precision up to 25% recall
63
Other Projected and New Problems
• Named Entity Classification – by any textual type– Which pickup trucks are produced by Mitsubishi?
Magnum pickup truck• Lexical semantic relationships (e.g. Wordnet)
– Which relations contribute to entailment inference? How?• Semantic role mapping (vs. labeling)
• Recognize transparent heads• Topical entailment – entailing textually defined topics• …
64
Textual Entailment as Goal
• The essence of our proposal: – Formulate various semantic problems as entailment tasks
– Base applied inference on entailment “engines” and KBs
• Interpretations and mapping methods may compete• Open question: which inference
– can be represented at language level?
– requires logical or specialized representation and inference? (temporal, spatial, mathematical, …)
65
Meeting the knowledge challenge – by a coordinated effort?
• A vast amount of “entailment rules” needed• Speculation: is it possible to have a public effort
for knowledge acquisition?– Simple, uniform representations
– Assuming mostly automatic acquisition (millions of rules?)
– Human Genome Project analogy
• Preliminary: RTE-3 Resources Pool at ACLWiki
66
Textual Entailment ≈ Human Reading Comprehension
• From a children’s English learning book(Sela and Greenberg):
Reference Text: “…The Bermuda Triangle lies in the Atlantic Ocean, off the coast of Florida. …”
Hypothesis (True/False?): The Bermuda Triangle is near the United States
???
67
Optimistic Conclusions: Textual Entailment…
is a promising framework for applied semantics:– Defines new semantic problems to work on– May be modeled probabilistically– Appealing potential for knowledge acquisition
Thank you!