Implicit learning of common sense for reasoning Brendan Juba Harvard University.

Implicit learning of common sense for reasoning

Brendan JubaHarvard University

A convenient example

“Thomson visited Cooper’s grave in 1765. At that date, he had been traveling [resp.: dead] for five years.

“Who had been traveling [resp.: dead]?”(The Winograd Schema Challenge, [Levesque, Davis, and Morgenstern, 2012])

Our approach: learn sufficient knowledge to answer such queries from examples.

The taskIn_grave(x) Alive(x) Traveling(x)

1 0 0

* 0 0

0 1 *

0 1 1

1 1 0

0 1 1

0 1 *

1 0 0

0 0 0

1 0 0

0 1 0

0 1 1

* 1 *

* 0 0

• The examples may be incomplete (a * in the table)

• Given In_grave(Cooper), we wish to infer ¬Traveling(Cooper)

• Follows from In_grave(x) ¬Alive(x), ⇒Traveling(x) Alive(x)⇒

• These two rules can be learned from this data

• Challenge: how can we tell which rules to learn?

This work

Given: examples, KB, and a query…• Proposes a criterion for learnability of rules in

reasoning: “witnessed evaluation”• Presents a simple algorithm for efficiently

considering all such rules for reasoning in any “natural” (tractable) fragment– “Natural” defined previously by Beame, Kautz,

Sabharwal (JAIR 2004)– Tolerant to counterexamples as appropriate for

application to “common sense” reasoning

This work

• Only concerns learned “common sense”– Cf. Spelke’s “core knowledge:” naïve theories, etc.– But: use of logical representations provide

potential “hook” into traditional KR• Focuses on confirming or refuting query

formulas on a domain (distribution)– As opposed to: predicting missing attributes in a

given example (cf. past work on PAC-Semantics)

Why not use…

Bayes nets/Markov Logic/etc.?– Learning is the Achilles heel of these approaches:

Even if the distributions are described by a simple network, how do we find the dependencies?

Outline

1. PAC-Semantics: model for learned knowledge– Suitable for capturing learned common sense

2. Witnessed evaluation: a learnability criterion under partial information

3. “Natural” fragments of proof systems4. The algorithm and its guarantee

PAC Semantics (for propositional logic) Valiant, (AIJ 2000)

• Recall: propositional logic consists of formulas built from variables x1,…,xn, and connectives, e.g., ∧(AND), ∨(OR), ¬(NOT)

• Defined with respect to a background probability distribution D over {0,1}n

(Boolean assignments to x1,…,xn)

☞Definition. A formula φ(x1,…,xn) is (1-ε)-valid under D if PrD[φ(x1,…,xn)=1] ≥ 1-ε.

A RULE OF

THUMB…

ExamplesIn_grave(x) Alive(x) Traveling(x)

1 0 0

* 0 0

0 1 *

0 1 1

1 1 0

0 1 1

0 1 *

1 0 0

0 0 0

1 0 0

0 1 0

0 1 1

* 1 *

* 0 0

In_grave(x) ¬Alive(x) ⇒

1

1

1

1

0

1

1

1

1

1

1

1

*

1

Buried Alive!!

Grave-digger

APPEARS TO BE ≈86%-VALID…

ExamplesIn_grave(x) Alive(x) Traveling(x)

1 0 0

* 0 0

0 1 *

0 1 1

1 1 0

0 1 1

0 1 *

1 0 0

0 0 0

1 0 0

0 1 0

0 1 1

* 1 *

* 0 0

Traveling(x) Alive(x)⇒

1

1

1

1

1

1

1

1

1

1

1

1

1

1

Note: Agreeing with all observed examples does not imply 1-validity. Rare counterexamples may exist. We only get (1-ε)-valid with probability 1-δ

The theorem, informally

Theorem. For every natural tractable proof system, there is an algorithm that efficiently simulates access during proof search to all rules that can be verified (1-ε)-valid on examples.

• Can’t afford to explicitly consider all rules!• Won’t even be able to identify rules simulated• Thus: rules are “learned implicitly”

Outline

1. PAC-Semantics: model for learned knowledge2. Witnessed evaluation: a learnability

criterion under partial information3. “Natural” fragments of proof systems4. The algorithm and its guarantee

Masking processesMichael, (AIJ 2010)

• A masking function m : {0,1} n → {0,1,*}n

takes an example (x1,…,xn) to a partial example by replacing some values with *

• A masking process M is a masking function valued random variable– NOTE: the choice of attributes to hide may depend

on the example!

Restricting formulas

Given a formula φ and masked example ρ, the restriction of φ under ρ, φ|ρ, is obtained by “plugging in” the values of ρi for xi whenever ρi ≠ * and recursively simplifying (using game-tree evaluation). I.e., φ|ρ is a formula in the unknown values.

¬x∨

y ¬z

∧ρ: x=0, y=0

=1=0

¬z∨z

=1

Witnessed formulas

We will learn rules that can be observed to hold under the given partial information:• Definition. ψ is (1-ε)-witnessed under a

distribution over partial examples M(D) ifPrρ M(D)∈ [ψ|ρ=1] ≥ 1-ε

• We will aim to succeed whenever there exists a (1-ε)-witnessed formula that completes a simple proof of the query formula…

Remark: equal to “ψ is a tautology given ρ” in standard cases where this is tractable, e.g., CNFs, intersections of halfspaces; remains tractable in cases where this is not, e.g., 3-DNFs

Outline

1. PAC-Semantics: model for learned knowledge2. Witnessed evaluation: a learnability criterion

under partial information3. “Natural” fragments of proof systems4. The algorithm and its guarantee

Example: Resolution (“RES”)

• A proof system for refuting CNFs (AND of ORs)– Equiv., for proving DNFs (ORs of ANDs)

• Operates on clauses—given a set of clauses {C1,…,Ck}, may derive– (“weakening”) Ci∨l from any Ci

(where l is any literal—a variable or its negation)– (“cut”) C’i C’∨ j from Ci=C’i∨x and Cj=C’j ¬∨ x

• Refute a CNF by deriving empty clause from it

Tractable fragments of RES

• Bounded-width• Treelike, bounded clause space

∅

xi ¬xi

¬xi∨xj ¬xi ¬∨ xj… SPACE-2 ≡ “UNIT PROPAGATION,”

SIMULATES CHAINING

Tractable fragments of RES

• Bounded-width• Treelike, bounded clause space☞Applying a restriction to every step of proofs of

these forms yields proofs of the same form(from a refutation of φ, we obtain a refutation of φ|ρ of the same syntactic form)

• Def’n (BKS’04): such fragments are “natural”

Other “natural” fragments…

• Bounded width k-DNF resolution• L1-bounded, sparse cutting planes• Degree-bounded polynomial calculus• (more?)

REQUIRES THAT RESTRICTIONS

PRESERVE THE SPECIAL SYNTACTIC FORM

Outline

1. PAC-Semantics: model for learned knowledge2. Witnessed evaluation: a learnability criterion

under partial information3. “Natural” fragments of proof systems4. The algorithm and its guarantee

The basic algorithm

• Given query DNF φ and masked ex’s {ρ1,…,ρk}

– For each ρi, search for a refutation of ¬φ|ρi

• If the fraction of successful refutations is greater than (1-ε), accept φ, and otherwise reject.

CAN INCORPORATE KB CNF

Φ: REFUTE [Φ∧¬φ]|ρi

Example space-2 treelike RES refutation

∅

Traveling

¬Traveling

¬Traveling∨Alive

¬Alive

¬In_grave∨¬Alive In_grave

Given

Refute

Supporting “common sense” premises

Example [Traveling∧In_grave]|ρ1

∅

Traveling

¬Traveling

¬Traveling∨Alive

¬Alive


Given

Refute

Example ρ1: In_grave = 0, Alive = 1

=T =T=∅

Trivial refutation

Example [Traveling∧In_grave]|ρ2

∅

Traveling

¬Traveling

¬Traveling∨Alive

¬Alive


Given

Refute

Example ρ2: Traveling = 0, Alive = 0

=T =T=∅

=TTrivial refutation

=T

The algorithm uses 1/γ2log1/δ partial examples to distinguish the following cases w.p. 1-δ:• The query φ is not (1-ε-γ)-valid • There exists a (1-ε+γ)-witnessed formula ψ

for which there exists a proof of the query φ from ψ

LEARN ANY ψ THAT HELPS

VALIDATE THE

QUERY φ.

N.B.: ψ MAY NOT BE 1-VALID

The theorem, formally

• Note that resolution is sound…– So, whenever a proof of φ|ρi

exists, φ was satisfied by

the example from D ⇒If φ is not (1-ε-γ)-valid, tail bounds imply that it is

unlikely that a (1-ε) fraction satisfied φ • On the other hand, consider the proof of φ from

the (1-ε+γ)-witnessed CNF ψ…– With probability (1-ε+γ),

all of the clauses of ψ simplify to 1⇒The restricted proof does not require clauses of ψ

Analysis

“Implicitly learned”

Recap: this work…

• Proposed a criterion for learnability of common sense rules in reasoning: “witnessed evaluation”

• Presented a simple algorithm for efficiently considering all such rules as premises for reasoning in any “natural” (tractable) fragment– “Natural” defined by Beame, Kautz, Sabharwal (JAIR

2004) means: “closed under plugging in partial info.”– Tolerant to counterexamples as appropriate for

application to “common sense” reasoning

Prior work: Learning to Reason

• Khardon & Roth (JACM 1997) showed that O(log n)-CNF queries could be efficiently answered using complete examples– No mention of theorem-proving whatsoever!– Could only handle low-width queries under

incomplete information (Mach. Learn. 1999)• Noise-tolerant learning captures (some kinds

of) common sense (Roth, IJCAI’95)

Work in progress

• Further integration of learning and reasoning– Deciding general RES for limited learning problems

in quasipoly-time: arXiv:1304.4633– Limits of this approach: ECCC TR13-094

• Integration with “fancier” semantics (e.g., naf)– The point: want to consider proofs using such

“implicitly learned” facts & rules

Future work

• Empirical validation– Good domain?

• Explicit learning of premises– Not hard for our fragments under “bounded

concealment” (Michael AIJ 2010)– But: this won’t tolerate counterexamples!

Date post:	23-Dec-2015
Category:	Documents
Upload:	mark-wilkerson
View:	214 times
Download:	0 times

Implicit learning of common sense for reasoning Brendan Juba Harvard University.

Documents