TextInfer 2011 – Bar Ilan University 1
Towards a probabilistic Model for Lexical Entailment
Eyal Shnarch, Jacob Goldberger, Ido Dagan
TextInfer 2011 – Bar Ilan University 2
Entailment at the lexical level
Obama gave a speech last night in the Israeli lobby
conference
Obama gave a speech last night in the Israeli lobby
conference
In his speech at the American Israel Public
Affairs Committee yesterday, the president
challenged …
In his speech at the American Israel Public
Affairs Committee yesterday, the president
challenged … Barack Obama’s AIPAC address ...Barack Obama’s AIPAC address ...AIPAC
Israeli lobby
American Israel Public Affairs Committee
address
speech
Barack Obama the president
Obama
TextInfer 2011 – Bar Ilan University 3
Lexical-level systems are very handy
• Important component within a full inference system
• Pose hard-to-beat baselines
– (Mirkin et. al 2009, Majumdar and Bhattacharyya 2010)
• Can be used in cases where there are no deep analysis
tools for target language
– e.g. no parser
TextInfer 2011 – Bar Ilan University 4
The president’s car got stuck in Ireland, surrounded by many peopleThe president’s car got stuck in Ireland, surrounded by many people
Obama’s Cadillac got stuck in Dublin in a large Irish crowdObama’s Cadillac got stuck in Dublin in a large Irish crowd
social group
social group
Modeling entailment at the lexical level
TextInfer 2011 – Bar Ilan University 5
Mostly heuristic:
• Percent covered/un-covered– (Majumdar and Bhattacharyya, 2010, Clark and Harrison, 2010)
• Similarity estimation– (Corley and Mihalcea, 2005; Zanzotto and Moschitti,2006)
• Vector space– (MacKinlay and Baldwin, 2009)
Lexical entailment scores
TextInfer 2011 – Bar Ilan University 6
The president’s car got stuck in Ireland, surrounded by many peopleThe president’s car got stuck in Ireland, surrounded by many people
Obama’s Cadillac got stuck in Dublin in a large Irish crowdObama’s Cadillac got stuck in Dublin in a large Irish crowd
social group
social group
Terminology
rule
lexical resource
chain
rule2
rule1
TextInfer 2011 – Bar Ilan University 7
The president’s car got stuck in Ireland, surrounded by many peopleThe president’s car got stuck in Ireland, surrounded by many people
Obama’s Cadillac got stuck in Dublin in a large Irish crowdObama’s Cadillac got stuck in Dublin in a large Irish crowd
social group
social group
Goal – a probabilistic model
1. Distinguish resources reliability levels
2. Consider transitive chains length
3. Consider multiple evidence
Addressing:
TextInfer 2011 – Bar Ilan University 8
Entailment validation process
t1 tmti
h1 hnhj
t’chain
… …
……
A hypothesis is entailed if all its terms are entailed
A single term is entailed if at least one of its evidence is a valid entailment chain
A chain is valid if all its rule steps are valid
The validity of a rule depends on the reliability of the resource which provided it
TextInfer 2011 – Bar Ilan University 9
Probabilistic model for Lexical Entailment
t1 tmti
h1 hnhj
t’
OR
chain
… …
……
validity prob. of a rule step r is the reliability of the resource R(r) which suggested it
EM to estimate parameter setifentailment holds
TextInfer 2011 – Bar Ilan University 10
Let’s try a concrete example
The president’s car got stuck in Ireland, surrounded by many peopleThe president’s car got stuck in Ireland, surrounded by many people
Obama’s Cadillac got stuck in Dublin in a large Irish crowdObama’s Cadillac got stuck in Dublin in a large Irish crowd
social group
social group
* numbers in blue are parameter values found by our model
TextInfer 2011 – Bar Ilan University 11
Results on RTE are nice, but…
ModelF1 %
RTE 5RTE 6
Avg. of all systems30.533.8
Base Prob.36.238.5
Best lexical system44.447.6
Best full system45.648.0
30.5
33.836.2
38.5
44.4
47.645.6
48
20
30
40
50
RTE 5 RTE 6
avg. of all systemsbase prob.best lexical systembest full system
F1
TextInfer 2011 – Bar Ilan University 12
Extension 1: relaxing with noisy-AND
noisy-
•final AND gate demands the entailment of all hypothesis terms
•sentence level entailment is possible even if not all terms are entailed
•this strict demand is especially unfair for longer hypotheses
TextInfer 2011 – Bar Ilan University 13
Better results with extension 1
ModelF1 %
RTE 5RTE 6
Avg. of all systems30.533.8
Base Prob.36.238.5
Base Prob. + noisy-AND44.643.1
Best lexical system44.447.6
Best full system45.648.030.5
33.8
44.643.1
44.4
47.645.6
48
20
30
40
50
RTE 5 RTE 6
avg. of all systems
base prob. +noisy-ANDbest lexical system
best full system
* *
* significant improvement over base prob. according to Mc-Nemar’s test with p<0.01
F1
TextInfer 2011 – Bar Ilan University 14
Extension 2: terms independence assumption
uncovered termcovered term
As T covers more terms of H – our belief in each rule application increases
TextInfer 2011 – Bar Ilan University 15
Same (better) results with extension 2
ModelF1 %
RTE 5RTE 6
Avg. of all systems30.533.8
Base Prob.36.238.5
Base Prob. + noisy-AND44.643.1
Base Prob. + coverage normalization42.844.7
Best lexical system44.447.6
Best full system45.648.0
30.5
33.8
42.844.744.4
47.645.6
48
20
30
40
50
RTE 5 RTE 6
avg. of all systems
base prob. +coverage normbest lexical system
best full system
* *
F1
TextInfer 2011 – Bar Ilan University 16
30.5
33.8
48.345.6
44.4
47.645.6
48
20
30
40
50
RTE 5 RTE 6
avg. of all systems
full prob.(noisy-AND+ coverage norm)
best lexical system
best full system
Putting it all together is best
Negative result: F1 usually decreases when allowing chains
**
F1
TextInfer 2011 – Bar Ilan University 17
Summary
• Learns for each lexical resource an individual
reliability value
• Considers multiple evidence and chain length
• Two extensions which brings us to…
• Performance is in line with best entailment
systems
A probabilistic model:
noisy-
TextInfer 2011 – Bar Ilan University 18
Future work
• Better model for transitivity
• noisy-AND for chains too
• Verify rule application in a specific context
• next talk by Shachar Mirkin
• Test with other application data sets
• passage retrieval for QA
• Integrate into a full entailment system
Thank you!Thank you!