On the Proper Treatment of Quantifiers in Probabilistic Logic Semantics Islam Beltagy and Katrin Erk...

transcript

On the Proper Treatment of Quantifiers in Probabilistic Logic Semantics

Islam Beltagy and Katrin Erk

The University of Texas at Austin

IWCS 2015

Logic-based Semantics

• First-order logic and theorem proving

• Deep semantic representation:

– Negation, Quantifiers, Conjunction, Disjunction ….

Probabilistic Logic Semantics

• Logic

• Reasoning with Uncertainty

– Confidence rating of Word Sense Disambiguation

– Weight of Paraphrase rules

– Distributional similarity values [Beltagy et al., 2013]

• baby toddler | w1

• eating doll playing with a toy | w2

– ...

Probabilistic Logic Semantics

• Quantifiers and Negations do not work as expected

• Domain Closure Assumption: finite domain

– Problems with quantifiers

– “Tweety is a bird and it flies” “All birds fly”

• Closed-World Assumption: low prior probabilities

– Problems with negations

– “All birds fly” “The sky is not blue”

Outline

• Probabilistic Logic Semantics (overview of previous work)

– Markov Logic Networks (MLNs)

– Recognizing Textual Entailment (RTE)

• Domain Closure Assumption

– Definition

– Inference problems with Quantifiers

• Closed-World Assumption

• Evaluation

• Future work and Conclusion

Outline

– Markov Logic Networks

– Recognizing Textual Entailment

– Definition

• Evaluation

Probabilistic Logic

• Frameworks that combine logical and statistical knowledge [Nilsson, 1986], [Getoor and Taskar, 2007]

• Use weighted first-order logic rules

– Weighted rules are soft rules (compared to hard logical constraints)

• Provide a mechanism for probabilistic inference: P(Q|E, KB)• Bayesian Logic Programs (BLP) [Kersting & De Raedt, 2001]

• Markov Logic Networks (MLN) [Richardson and Domingos, 2006]

• Probabilistic Soft Logic (PSL) [Kimmig et al., NIPS 2012]

Markov Logic Networks[Richardson and Domingos, 2006]

x. smoke(x) cancer(x) | 1.5x,y. friend(x,y) (smoke(x) smoke(y)) | 1.1

• Two constants: Anna (A) and Bob (B)

• P(Cancer(Anna) | Friends(Anna,Bob), Smokes(Bob))Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Outline

– Definition

• Evaluation

Recognizing Textual Entailment (RTE)

• RTE requires deep semantic understanding [Dagan et al., 2013]

• Given two sentences Text (T) and Hypothesis (H), finding if T Entails, Contradicts or not related (Neutral) to H

• Examples (from the SICK dataset) [Marelli et al., 2014]

– Entailment: T: “A man is walking through the woods.

H: “A man is walking through a wooded area.”

– Contradiction: T: “A man is jumping into an empty pool.”

H: “A man is jumping into a full pool.”

– Neutral: T: “A young girl is dancing.”

H: “A young girl is standing on one leg.”

• Translate sentences to logic using Boxer [Bos 2008]

• T: John is driving a car

x,y,z. john(x) agent(y, x) drive(y) patient(y, z) car(z)• H: John is driving a vehicle

x,y,z. john(x) agent(y, x) drive(y) patient(y, z) vehicle(z)• KB: (collected from difference sources)

x. car(x) vehicle(x) | w• P(H|T, KB)

Outline

– Definition

• Evaluation

Domain Closure Assumption (DCA)

• There are no objects in the world other than the named constants (Finite Domain)

• e.g.

x. smoke(x) cancer(x) | 1.5x,y. friend(x,y) (smoke(x) smoke(y)) | 1.1Two constants: Anna (A) and Bob (B)

Cancer(A)

Smokes(A)Friends(A,A)

Friends(B,A)

Smokes(B)

Friends(A,B)

Cancer(B)

Friends(B,B)

Ground Atoms

Domain Closure Assumption (DCA)

• There are no objects in the universe other than the named constants (Finite Domain)

– Constants need to be explicitly added

– Universal quantifiers do not behave as expected because of finite domain

– e.g. “Tweety is a bird and it flies” “All birds fly”

P(H|T,KB) T H

Skolemization No problems

Existence Universals in H

Outline

– Definition

• Skolemization: in T

• Existence: in T

• Universals in Hypothesis : in H

• Evaluation

Skolemization ( in T )

• Explicitly introducing constants

• T: x,y. john(x) agent(y, x) eat(y) • Skolemized T: john(J) agent(T, J) eat(T) • Embedded existentials

– T : x. bird(x) y. agent(y, x) fly(y)– Skolemized T: x. bird(x) agent(f(x), x) fly(f(x))– Simulate skolem functions

x. bird(x) y. skolemf(x,y) agent(y, x) fly(y)– skolemf (B1, C1), skolemf (B2, C2) …18

Outline

– Definition

• Existence: in T

• Evaluation

Existence ( in T )

• T: All birds fly

• H: Some birds fly

• Logically, T ⇏H but pragmatically it does

– “All birds fly” presupposes that “there exist birds”

• Solution: simulate this existential presupposition

– From parse tree, Q(restrictor, body)

– “All birds fly” becomes: all(bird, fly)

– Introduce additional evidence for the restrictor bird(B)20

Existence ( in T )

• Negated Existential– T: No bird flies = no (bird, fly) x,y. bird(x) agent(y, x) fly(y) x. bird(x) y. agent(y, x) fly(y)– Additional evidence bird(B)

• Exception– T: There are no birds x. bird(x)– No additional evidence because the existence presupposition is

explicitly negated

Outline

– Definition

• Existence: in T

• Evaluation

Universals in Hypothesis ( in H )

• T: Tweety is a bird, and Tweety fliesbird(Tweety) agent(F, Tweety) fly (F)• H: All birds fly

x. bird(x) y. agent(y, x) fly(y)• T H because universal quantifiers work only on the constants of

the given finite domain

• Solution:

– As in Existence, add evidence for the restrictor: bird(Woody)– If the new bird can be shown to fly, then there is an explicit universal

quantification in T

Outline

– Definition

• Evaluation

Closed-World Assumption (CWA)

• The assumption that everything (all ground atoms) have very low prior probability

• CWA fits the RTE task because:

– In the world, most things are false

– Inference results are less sensitive to the domain size

– Enable inference optimization [Beltagy and Mooney, 2014]

• Because of CWA, negated H comes true regardless of T• H : x,y. bird(x) agent(y, x) fly(y) • Solution

– Add positive evidence that contradicts the negated parts of H

– A set of ground atoms with high prior probability (in contrast with low prior probability on all other ground atoms)– R: bird(B) agent(F, B) fly(F) | w=1.5– P(H| CWA) 1– P(H|R, CWA) 0

Entailing example:

• T: No bird flies: x,y. bird(x) agent(y, x) fly(y) • H: No penguin flies: x,y. penguin(x) agent(y, x) fly(y) • R: penguin(P) agent(F, P) fly(F) | w=1.5• KB: x. penguin(x) bird(x) • P(H|T, R, KB) = 1• T KB contradicts R, which lets H be true.

Outline

– Definition

• Evaluation

Evaluation

• Probabilistic Logic Framework: Markov Logic Network

– Proposed handling of DCA and CWA applies to other Probabilistic Logic frameworks that make similar assumptions, e.g, PSL (Probabilistic Soft Logic)

• Evaluation Task: RTE

– Proposed handling of DCA and CWA applies to other tasks where the logical formulas have existential and universal quantifiers, e.g, STS (Textual Similarity) and Question Answering

Evaluation

1) Synthetic Dataset

• Template: Q1 NP1 V Q2 NP2 = Q1(NP1, Q2(NP2 ,V))

• Example

– T: No man eats all food

– H: Some hungry men eat not all delicious food

Evaluation

31Baseline

SkolemSkolem + Existence

Skolem + Univ in HSkolem + Univ in H + CWA

Accuracy

1) Synthetic Dataset

• Dataset size: 952 Neutral + 72 Entail = 1024

Detection of Contradiction

• Entailment: P(H| T, KB, Wt,h)

• Contradiction: P(H| T, KB, Wt,h)

World configuration:

• Domain size

• Prior probabilities

Evaluation

2) Sentences Involving Compositional Knowledge (SICK) [Marelli et al., SemEval 2014]

– 10,000 pairs of sentences annotated as Entail, Contradict or Neutral

BaselineSkolem

Skolem + ExistenceSkolem + Univ in H

Skolem + Univ in H + CWAAll

Accuracy

Evaluation

3) FraCas [Cooper et al., 1996]: hand-built entailments pairs

– We evaluate of the first section (out of 9 sections)

– Unsupported quantifiers (few, most, many, at least) (28/74 pairs)

BaselineSkolem

Skolem + ExistenceSkolem + Univ in H

Skolem + Univ in H + CWAAll

Gold parses

Standard parses

Outline

– Definition

• Evaluation

Future Work

Generalized Quantifiers:

• How to extend this work to generalized quantifiers like Few and Most

Conclusion

• Domain Closure Assumption, its implication on the probabilistic logic inferences, and how to formulate the RTE problem in a way that we get the expected inferences

• Closed-World Assumption, why we make that assumption, and what its effect on the negation, and how to formulate the RTE problem to get correct inferences.

Thank You

On the Proper Treatment of Quantifiers in Probabilistic Logic Semantics Islam Beltagy and Katrin Erk...

Documents