Date post: | 23-Dec-2015 |
Category: |
Documents |
Upload: | mark-wilkerson |
View: | 214 times |
Download: | 0 times |
Implicit learning of common sense for reasoning
Brendan JubaHarvard University
A convenient example
“Thomson visited Cooper’s grave in 1765. At that date, he had been traveling [resp.: dead] for five years.
“Who had been traveling [resp.: dead]?”(The Winograd Schema Challenge, [Levesque, Davis, and Morgenstern, 2012])
Our approach: learn sufficient knowledge to answer such queries from examples.
The taskIn_grave(x) Alive(x) Traveling(x)
1 0 0
* 0 0
0 1 *
0 1 1
1 1 0
0 1 1
0 1 *
1 0 0
0 0 0
1 0 0
0 1 0
0 1 1
* 1 *
* 0 0
• The examples may be incomplete (a * in the table)
• Given In_grave(Cooper), we wish to infer ¬Traveling(Cooper)
• Follows from In_grave(x) ¬Alive(x), ⇒Traveling(x) Alive(x)⇒
• These two rules can be learned from this data
• Challenge: how can we tell which rules to learn?
This work
Given: examples, KB, and a query…• Proposes a criterion for learnability of rules in
reasoning: “witnessed evaluation”• Presents a simple algorithm for efficiently
considering all such rules for reasoning in any “natural” (tractable) fragment– “Natural” defined previously by Beame, Kautz,
Sabharwal (JAIR 2004)– Tolerant to counterexamples as appropriate for
application to “common sense” reasoning
This work
• Only concerns learned “common sense”– Cf. Spelke’s “core knowledge:” naïve theories, etc.– But: use of logical representations provide
potential “hook” into traditional KR• Focuses on confirming or refuting query
formulas on a domain (distribution)– As opposed to: predicting missing attributes in a
given example (cf. past work on PAC-Semantics)
Why not use…
Bayes nets/Markov Logic/etc.?– Learning is the Achilles heel of these approaches:
Even if the distributions are described by a simple network, how do we find the dependencies?
Outline
1. PAC-Semantics: model for learned knowledge– Suitable for capturing learned common sense
2. Witnessed evaluation: a learnability criterion under partial information
3. “Natural” fragments of proof systems4. The algorithm and its guarantee
PAC Semantics (for propositional logic) Valiant, (AIJ 2000)
• Recall: propositional logic consists of formulas built from variables x1,…,xn, and connectives, e.g., ∧(AND), ∨(OR), ¬(NOT)
• Defined with respect to a background probability distribution D over {0,1}n
(Boolean assignments to x1,…,xn)
☞Definition. A formula φ(x1,…,xn) is (1-ε)-valid under D if PrD[φ(x1,…,xn)=1] ≥ 1-ε.
A RULE OF
THUMB…
ExamplesIn_grave(x) Alive(x) Traveling(x)
1 0 0
* 0 0
0 1 *
0 1 1
1 1 0
0 1 1
0 1 *
1 0 0
0 0 0
1 0 0
0 1 0
0 1 1
* 1 *
* 0 0
In_grave(x) ¬Alive(x) ⇒
1
1
1
1
0
1
1
1
1
1
1
1
*
1
Buried Alive!!
Grave-digger
APPEARS TO BE ≈86%-VALID…
ExamplesIn_grave(x) Alive(x) Traveling(x)
1 0 0
* 0 0
0 1 *
0 1 1
1 1 0
0 1 1
0 1 *
1 0 0
0 0 0
1 0 0
0 1 0
0 1 1
* 1 *
* 0 0
Traveling(x) Alive(x)⇒
1
1
1
1
1
1
1
1
1
1
1
1
1
1
Note: Agreeing with all observed examples does not imply 1-validity. Rare counterexamples may exist. We only get (1-ε)-valid with probability 1-δ
The theorem, informally
Theorem. For every natural tractable proof system, there is an algorithm that efficiently simulates access during proof search to all rules that can be verified (1-ε)-valid on examples.
• Can’t afford to explicitly consider all rules!• Won’t even be able to identify rules simulated• Thus: rules are “learned implicitly”
Outline
1. PAC-Semantics: model for learned knowledge2. Witnessed evaluation: a learnability
criterion under partial information3. “Natural” fragments of proof systems4. The algorithm and its guarantee
Masking processesMichael, (AIJ 2010)
• A masking function m : {0,1} n → {0,1,*}n
takes an example (x1,…,xn) to a partial example by replacing some values with *
• A masking process M is a masking function valued random variable– NOTE: the choice of attributes to hide may depend
on the example!
Restricting formulas
Given a formula φ and masked example ρ, the restriction of φ under ρ, φ|ρ, is obtained by “plugging in” the values of ρi for xi whenever ρi ≠ * and recursively simplifying (using game-tree evaluation). I.e., φ|ρ is a formula in the unknown values.
¬x∨
y ¬z
∧ρ: x=0, y=0
=1=0
¬z∨z
=1
Witnessed formulas
We will learn rules that can be observed to hold under the given partial information:• Definition. ψ is (1-ε)-witnessed under a
distribution over partial examples M(D) ifPrρ M(D)∈ [ψ|ρ=1] ≥ 1-ε
• We will aim to succeed whenever there exists a (1-ε)-witnessed formula that completes a simple proof of the query formula…
Remark: equal to “ψ is a tautology given ρ” in standard cases where this is tractable, e.g., CNFs, intersections of halfspaces; remains tractable in cases where this is not, e.g., 3-DNFs
Outline
1. PAC-Semantics: model for learned knowledge2. Witnessed evaluation: a learnability criterion
under partial information3. “Natural” fragments of proof systems4. The algorithm and its guarantee
Example: Resolution (“RES”)
• A proof system for refuting CNFs (AND of ORs)– Equiv., for proving DNFs (ORs of ANDs)
• Operates on clauses—given a set of clauses {C1,…,Ck}, may derive– (“weakening”) Ci∨l from any Ci
(where l is any literal—a variable or its negation)– (“cut”) C’i C’∨ j from Ci=C’i∨x and Cj=C’j ¬∨ x
• Refute a CNF by deriving empty clause from it
Tractable fragments of RES
• Bounded-width• Treelike, bounded clause space
∅
xi ¬xi
¬xi∨xj ¬xi ¬∨ xj… SPACE-2 ≡ “UNIT PROPAGATION,”
SIMULATES CHAINING
Tractable fragments of RES
• Bounded-width• Treelike, bounded clause space☞Applying a restriction to every step of proofs of
these forms yields proofs of the same form(from a refutation of φ, we obtain a refutation of φ|ρ of the same syntactic form)
• Def’n (BKS’04): such fragments are “natural”
Other “natural” fragments…
• Bounded width k-DNF resolution• L1-bounded, sparse cutting planes• Degree-bounded polynomial calculus• (more?)
REQUIRES THAT RESTRICTIONS
PRESERVE THE SPECIAL SYNTACTIC FORM
Outline
1. PAC-Semantics: model for learned knowledge2. Witnessed evaluation: a learnability criterion
under partial information3. “Natural” fragments of proof systems4. The algorithm and its guarantee
The basic algorithm
• Given query DNF φ and masked ex’s {ρ1,…,ρk}
– For each ρi, search for a refutation of ¬φ|ρi
• If the fraction of successful refutations is greater than (1-ε), accept φ, and otherwise reject.
CAN INCORPORATE KB CNF
Φ: REFUTE [Φ∧¬φ]|ρi
Example space-2 treelike RES refutation
∅
Traveling
¬Traveling
¬Traveling∨Alive
¬Alive
¬In_grave∨¬Alive In_grave
Given
Refute
Supporting “common sense” premises
Example [Traveling∧In_grave]|ρ1
∅
Traveling
¬Traveling
¬Traveling∨Alive
¬Alive
¬In_grave∨¬Alive In_grave
Given
Refute
Example ρ1: In_grave = 0, Alive = 1
=T =T=∅
Trivial refutation
Example [Traveling∧In_grave]|ρ2
∅
Traveling
¬Traveling
¬Traveling∨Alive
¬Alive
¬In_grave∨¬Alive In_grave
Given
Refute
Example ρ2: Traveling = 0, Alive = 0
=T =T=∅
=TTrivial refutation
=T
The algorithm uses 1/γ2log1/δ partial examples to distinguish the following cases w.p. 1-δ:• The query φ is not (1-ε-γ)-valid • There exists a (1-ε+γ)-witnessed formula ψ
for which there exists a proof of the query φ from ψ
LEARN ANY ψ THAT HELPS
VALIDATE THE
QUERY φ.
N.B.: ψ MAY NOT BE 1-VALID
The theorem, formally
• Note that resolution is sound…– So, whenever a proof of φ|ρi
exists, φ was satisfied by
the example from D ⇒If φ is not (1-ε-γ)-valid, tail bounds imply that it is
unlikely that a (1-ε) fraction satisfied φ • On the other hand, consider the proof of φ from
the (1-ε+γ)-witnessed CNF ψ…– With probability (1-ε+γ),
all of the clauses of ψ simplify to 1⇒The restricted proof does not require clauses of ψ
Analysis
“Implicitly learned”
Recap: this work…
• Proposed a criterion for learnability of common sense rules in reasoning: “witnessed evaluation”
• Presented a simple algorithm for efficiently considering all such rules as premises for reasoning in any “natural” (tractable) fragment– “Natural” defined by Beame, Kautz, Sabharwal (JAIR
2004) means: “closed under plugging in partial info.”– Tolerant to counterexamples as appropriate for
application to “common sense” reasoning
Prior work: Learning to Reason
• Khardon & Roth (JACM 1997) showed that O(log n)-CNF queries could be efficiently answered using complete examples– No mention of theorem-proving whatsoever!– Could only handle low-width queries under
incomplete information (Mach. Learn. 1999)• Noise-tolerant learning captures (some kinds
of) common sense (Roth, IJCAI’95)
Work in progress
• Further integration of learning and reasoning– Deciding general RES for limited learning problems
in quasipoly-time: arXiv:1304.4633– Limits of this approach: ECCC TR13-094
• Integration with “fancier” semantics (e.g., naf)– The point: want to consider proofs using such
“implicitly learned” facts & rules
Future work
• Empirical validation– Good domain?
• Explicit learning of premises– Not hard for our fragments under “bounded
concealment” (Michael AIJ 2010)– But: this won’t tolerate counterexamples!