Download - Toward a Universal Inference Engine Henry Kautz University of Washington With Fahiem Bacchus, Paul Beame, Toni Pitassi, Ashish Sabharwal, & Tian Sang.

Toward a Universal Inference Toward a Universal Inference EngineEngine

Henry KautzHenry KautzUniversity of WashingtonUniversity of Washington

With Fahiem Bacchus, Paul Beame, With Fahiem Bacchus, Paul Beame, Toni Pitassi, Ashish Sabharwal, & Tian SangToni Pitassi, Ashish Sabharwal, & Tian Sang

Universal Inference EngineUniversal Inference Engine

Old dream of AI –Old dream of AI – General Problem Solver – Newell & SimonGeneral Problem Solver – Newell & Simon Logic + Inference – McCarthy & HayesLogic + Inference – McCarthy & Hayes

Reality:Reality: 1962 – 50 variable toy SAT problems1962 – 50 variable toy SAT problems 1992 – 300 variable non-trivial problems1992 – 300 variable non-trivial problems 1996 – 1,000 variable difficult problems1996 – 1,000 variable difficult problems 2002 – 1,000,000 variable real-world 2002 – 1,000,000 variable real-world

problemsproblems

Pieces of the PuzzlePieces of the Puzzle

Good old Davis-Putnam-Logemann-Good old Davis-Putnam-Logemann-LovelandLoveland

Clause learning (nogood-caching)Clause learning (nogood-caching) Randomized restartsRandomized restarts Component analysisComponent analysis Formula cachingFormula caching

Learning domain-specific heuristicsLearning domain-specific heuristics

GeneralityGenerality

SATSAT #SAT#SAT Bayesian NetworksBayesian Networks Bounded-alternation Quantified Boolean Bounded-alternation Quantified Boolean

formulasformulas Quantified Boolean formulasQuantified Boolean formulas Stochastic SATStochastic SAT

#P complete

NP complete

PSPACE complete

1. Clause Learning1. Clause Learningwith Paul Beame &with Paul Beame & Ashish Sabharwal Ashish Sabharwal

DPLLDPLL((FF))// Perform // Perform unit propagationunit propagation

while exists unit clause while exists unit clause (y) (y) F F

F F FF||yy

if if FF is empty, report satisfiable and halt is empty, report satisfiable and haltif if FF contains the empty clause contains the empty clause returnreturnelse choose a literal else choose a literal xx

DPLLDPLL((FF||xx))

DPLLDPLL((FF||xx))

DPLL AlgorithmDPLL Algorithm

Remove all clauses containing y Shrink all clauses containing y

Extending DPLL: Clause LearningExtending DPLL: Clause Learning

Added conflict clauses Added conflict clauses Capture Capture reasonsreasons of conflicts of conflicts Obtained via Obtained via unit propagationsunit propagations from known ones from known ones Reduce future search by producing conflicts soonerReduce future search by producing conflicts sooner

When backtracking in DPLL, add new clauses When backtracking in DPLL, add new clauses corresponding to causes of failure of the searchcorresponding to causes of failure of the searchEBLEBL [Stallman & Sussman 77, de Kleer & Williams 87] [Stallman & Sussman 77, de Kleer & Williams 87]

CSPCSP [Dechter 90] [Dechter 90]

CLCL [Bayardo-Schrag 97, MarquesSilva-Sakallah 96, [Bayardo-Schrag 97, MarquesSilva-Sakallah 96, Zhang 97, Moskewicz Zhang 97, Moskewicz et al.et al. 01, Zhang 01, Zhang et al.et al. 01] 01]

Conflict GraphsConflict Graphs

FirstNewCut scheme(x1 x2 x3)

Decision scheme(p q b)

1-UIP schemet

p

q

b

a

x1

x2

x3

y

yfalset

Known Clauses(p q a)

( a b t)(t x1)(t x2)(t x3)

(x1 x2 x3 y)(x2 y)

Current decisionsp falseq falseb true

CL Critical to PerformanceCL Critical to PerformanceBest current SAT algorithms rely heavily on Best current SAT algorithms rely heavily on CL for good behavior on real world problemsCL for good behavior on real world problems

GRASP [MarquesSilva-Sakallah 96], SATO [H.Zhang 97]GRASP [MarquesSilva-Sakallah 96], SATO [H.Zhang 97]zChaff [Moskewicz zChaff [Moskewicz et al.et al. 01], Berkmin [Goldberg-Novikov 02] 01], Berkmin [Goldberg-Novikov 02]

However,However, No good understanding of strengths No good understanding of strengths

and weaknesses of CL and weaknesses of CL Not much insight on why it works wellNot much insight on why it works well

when it does when it does

Harnessing the Power of Clause LearningHarnessing the Power of Clause Learning(Beame, Kautz, & Sabharwal 2003)(Beame, Kautz, & Sabharwal 2003)

Mathematical frameworkMathematical framework for analyzing clause learningfor analyzing clause learning

Characterization of its powerCharacterization of its power in relation to well-studied topics inin relation to well-studied topics in proof complexity theoryproof complexity theory

Ways to improve solver performanceWays to improve solver performance based on formal analysisbased on formal analysis

Proofs of UnsatisfiabilityProofs of UnsatisfiabilityWhen When FF is unsatisfiable, is unsatisfiable,

Trace of DPLL on Trace of DPLL on FF is a is a proofproof of its unsatisfiability of its unsatisfiability

BoundBound onon shortest proofshortest proof of of FF gives givesbound on best possible implementationbound on best possible implementation Upper bound – “There is a proof no larger than K” Upper bound – “There is a proof no larger than K”

PotentialPotential for finding proofs quickly for finding proofs quickly Best possible branching heuristic, backtracking, etc.Best possible branching heuristic, backtracking, etc.

Lower bound – “Shortest proof is at least size K”Lower bound – “Shortest proof is at least size K” Inherent limitationsInherent limitations of the algorithm or proof system of the algorithm or proof system

Proof System: ResolutionProof System: Resolution

FF == ((a a b) b) ( (a a c) c) a a ( (b b c) c) (a (a cc))

Unsatisfiable CNF formulaUnsatisfiable CNF formula

c

c

Proof size = 9

empty clause

(a b) (a c) (bc) (a c)a

(b c)

Special Cases of ResolutionSpecial Cases of Resolution

Tree-like resolutionTree-like resolution Graph of inferences forms a Graph of inferences forms a treetree DPLL DPLL

Regular resolutionRegular resolution Variable can be resolved on Variable can be resolved on only onceonly once on any on any

path from input to empty clausepath from input to empty clause Directed acyclic graph analog of DPLL tree Directed acyclic graph analog of DPLL tree

Natural to not branch on a variable once it has Natural to not branch on a variable once it has been eliminatedbeen eliminated

Used in original DP Used in original DP [Davis-Putnam 60][Davis-Putnam 60]

Proof System HierarchyProof System Hierarchy

Tree-likeRES

Space of formulaswith poly-size proofs

RegularRES

[Bonet et al. 00]

General RES

[Alekhnovich et al. 02]

Frege systems

……

Pigeonhole principle[Haken 85]

Thm1.Thm1. CL can beat Regular RES CL can beat Regular RES

RegularRES

General RES

Formula f• Poly-size RES proof • Exp-size Regular proof

Example formulasGTn Ordering principlePeb Pebbling formulas [Alekhnovich et al. 02]

Formula PT(f,)• Poly-size CL proof• Exp-size Regular proof

RegularRES

CL

DPLL

PT(f,PT(f,)):: Proof Trace Extension Proof Trace ExtensionStart withStart with unsatisfiable formula unsatisfiable formula ff with poly-size RES proof with poly-size RES proof

PT(f, PT(f, )) contains contains • All clauses ofAll clauses of ff• For each derived clause For each derived clause Q=(aQ=(abbc)c) in in ,,

– Trace variableTrace variable ttQQ

– New clausesNew clauses ((ttQQ a), (a), (ttQQ b), (b), (ttQQ c)c)

CL proof of PT(f, ) works by branching negativelyon tQ’s in bottom up order of clauses of

PT(f,PT(f,)):: Proof Trace Extension Proof Trace Extension

(a b x) (c x)

Q (a b c)

… … … …

Formula fRES proof


(a b x) (c x)

Q (a b c)

… … … …

Formula fRES proof

• Trace variable tQ

• New clauses(tQ a)(tQ b)(tQ c)

PT(f,)


(a b x) (c x)

Q (a b c)

… … … …

Formula fRES proof

• Trace variable tQ

• New clauses(tQ a)(tQ b)(tQ c)

PT(f,)

tQ

a

b

c

x

x

false

FirstNewCut(a b c)

How hard is How hard is PT(f,PT(f,))??

Hard for Regular RES:Hard for Regular RES: reduction argument reduction argument

Fact 1Fact 1:: PT(f,PT(f,))||TraceVars = trueTraceVars = true f f

Fact 2Fact 2: If : If is a Regular RES proof of is a Regular RES proof of gg, , then then ||xx is a Regular RES proof of is a Regular RES proof of gg||xx

Fact 3: Fact 3: ff does not have small Regular RES proofs!does not have small Regular RES proofs!

Easy for CL:Easy for CL: by construction by constructionCL branches exactly once on each trace variableCL branches exactly once on each trace variable # branches = size( # branches = size() = poly) = poly

Implications?Implications?

DPLL algorithms w/o clause learning areDPLL algorithms w/o clause learning arehopeless for certain formula classeshopeless for certain formula classes

CL algorithms have CL algorithms have potential for small proofspotential for small proofs

Can we use such analysis to harness this potential?

Pebbling FormulasPebbling Formulas

(a1 a2)

E

A B C

F

T fG = Pebbling(G)

A node X is “pebbled” if (x1 or x2) holds

Source axioms: A, B, C are pebbled

Pebbling axioms: A and B are pebbled D is pebbled

Target axioms: T is not pebbled

(b1 b2) (c1 c2)

(e1 e2)(d1 d2)

(t1 t2)

Pebbling FormulasPebbling Formulas

(a1 a2)

E

A B C

F

T fG = Pebbling(G)

A node X is “pebbled” if (x1 or x2) holds

Source axioms: A, B, C are pebbled

Pebbling axioms: A and B are pebbled D is pebbled

Target axioms: T is not pebbled

(b1 b2) (c1 c2)

(e1 e2)(d1 d2)

(t1 t2)

1 1 1 2

1 2 1 2

2 1 1 2

2 2 1

1 2 1 2 1

2

2

( )

( )

(

[( ) ( )] (

)

( )

)

a b d d

a b d d

a b d d

a a

a b d

b b

d

d d

Grid vs. Randomized PebblingGrid vs. Randomized Pebbling

(a1 a2) b1 (c1 c2 c3)

(d1 d2 d3)

l1

(h1 h2)

(i1 i2 i3 i4)e1

(g1 g2)

f1

(n1 n2)

m1

(a1 a2) (b1 b2) (c1 c2) (d1 d2)

(e1 e2)

(h1 h2)

(t1 t2)

(i1 i2)

(g1 g2)(f1 f2)

Branching SequenceBranching Sequence

BB = ( = (xx11, , xx44, , ::xx33, , xx11, , ::xx88, , ::xx22,, ::xx44, , xx77, , ::xx11, , xx22))

OLD: “Pick unassigned var x” NEW: “Pick next literal y from B; delete it from B; if y already assigned, repeat”

Statement of ResultsStatement of Results

DPLL-Learn*: DPLL-Learn*: Any clause learner with 1-UIP learning Any clause learner with 1-UIP learning scheme and fast backtracking,scheme and fast backtracking, e.g. zChaff [Moskewicz e.g. zChaff [Moskewicz et alet al ’01] ’01]

EfficientEfficient : : (|(|ffGG|) time to generate |) time to generate BBGG EffectiveEffective: : (|(|ffGG|) branching steps to solve |) branching steps to solve ffGG using using BBGG

Given a pebbling graph G, can efficiently generatea branching sequence BG such that DPLL-Learn*(fG, BG)is empirically exponentially faster than DPLL-Learn*(fG)

Genseq on Grid Pebbling Genseq on Grid Pebbling GraphsGraphs

(a1 a2) (b1 b2) (c1 c2) (d1 d2)

(e1 e2)

(h1 h2)

(t1 t2)

(i1 i2)

(g1 g2)(f1 f2)

Results: Grid PebblingResults: Grid Pebbling

Unsatisfiable Satisfiable

Learning OFF Branching Seq OFF

45 vars 55 vars

Learning OFF Branching Seq ON

45 vars 55 vars

Learning ON Branching Seq OFF

2,000 vars 4,500 vars

Learning ON Branching Seq ON

2,500,000 vars 1,000,000 vars

zChaff settings

Max formula size solved24 hours; 512 MB memory

OriginalzChaff

ModifiedzChaff

NaiveDPLL

Results: Randomized PebblingResults: Randomized Pebbling

Unsatisfiable Satisfiable

Learning OFF Branching Seq OFF

35 vars 35 vars

Learning OFF Branching Seq ON

45 vars 45 vars

Learning ON Branching Seq OFF

350 vars 350 vars

Learning ON Branching Seq ON

45,000 vars 20,000 vars

zChaff settings

Max formula size solved24 hours; 512MB memory

OriginalzChaff

ModifiedzChaff

NaiveDPLL

2. Randomized Restarts2. Randomized Restarts

RestartsRestarts

Run-time distribution typically has high Run-time distribution typically has high variance across instances or random variance across instances or random seedsseeds tie-breaking in branching heuristictie-breaking in branching heuristic heavy-tailed – infinite mean & variance!heavy-tailed – infinite mean & variance!

Leverage by restart strategiesLeverage by restart strategies Heavy-tailed Heavy-tailed exponential distribution exponential distribution

short long

Generalized RestartsGeneralized Restarts

At conflict backtrack to arbitrary point in At conflict backtrack to arbitrary point in search treesearch tree Lowest conflict decision variable = backjumpingLowest conflict decision variable = backjumping Root = restartRoot = restart Other = partial restartOther = partial restart

Adding clause learning makes almost any Adding clause learning makes almost any restart scheme completerestart scheme complete (J. Marques-Silva 2002)(J. Marques-Silva 2002)

Aggressive BacktrackingAggressive Backtracking

zChaff – at conflict zChaff – at conflict backtrack to above backtrack to above highest conflict variablehighest conflict variable Not traditional backjumping!Not traditional backjumping!

Wasteful?Wasteful? Learned clause saves “most” workLearned clause saves “most” work Learned clause provides Learned clause provides new evidencenew evidence about about

best branching variable and value!best branching variable and value!

4. Component Analysis4. Component Analysis

#SAT – Model Counting#SAT – Model Counting

Why #SAT?Why #SAT?

Prototypical #P complete problemPrototypical #P complete problem Can encode probabilistic inferenceCan encode probabilistic inference Natural encoding for counting problemsNatural encoding for counting problems

Bayesian Nets to Weighted Bayesian Nets to Weighted CountingCounting

Introduce new vars so all internal vars are Introduce new vars so all internal vars are deterministicdeterministic

A

B

AA ~A~A

B B .2.2 .6.6

A A .1.1


Introduce new vars so all internal vars are Introduce new vars so all internal vars are deterministicdeterministic

A

B

AA ~A~A

B B .2.2 .6.6

A A .1.1 A

B

P Q

A A .1.1 PP .2.2 Q Q .6.6

( ) ( )B A P A Q


Weight of a model Weight of a model is product of is product of variable weightsvariable weights

Weight of a Weight of a formula is sum of formula is sum of weights of its weights of its modelsmodels

A

B

P Q

A A .1.1 PP .2.2 Q Q .6.6

( ) ( )B A P A Q


Let F be the Let F be the formula defining all formula defining all internal variablesinternal variables

Pr(query) =Pr(query) =weight(F & query)weight(F & query)

A

B

P Q

A A .1.1 PP .2.2 Q Q .6.6

( ) ( )B A P A Q

Bayesian Nets to CountingBayesian Nets to Counting

Unweighted counting is case where all Unweighted counting is case where all non-defined variables have weight 0.5non-defined variables have weight 0.5

Introduce sets of variables to define other Introduce sets of variables to define other probabilities to desired accuracyprobabilities to desired accuracy

In practice: just modify #SAT algorithm to In practice: just modify #SAT algorithm to weighted #SATweighted #SAT

Component AnalysisComponent Analysis

Can use DPLL to count modelsCan use DPLL to count models Just don’t stop when first assignment is foundJust don’t stop when first assignment is found

If formula breaks into If formula breaks into separate componentsseparate components (no shared variables), can count each (no shared variables), can count each separately and multiply results:separately and multiply results:#SAT(C1 #SAT(C1 C2) = #SAT(C1) * #SAT(C2) C2) = #SAT(C1) * #SAT(C2)

RelSat RelSat (Bayardo)(Bayardo) – CL + component analysis – CL + component analysis at each node in search treeat each node in search tree

50 variable #SAT50 variable #SAT State of the art circa 2000State of the art circa 2000

5. Formula Caching5. Formula Caching

with Fahiem Bacchus, Paul Beame, with Fahiem Bacchus, Paul Beame, Toni Pitassi, & Tian SangToni Pitassi, & Tian Sang

Formula CachingFormula Caching

New idea: cache counts of residual formulas New idea: cache counts of residual formulas at each nodeat each node Bacchus, Dalmao & Pitassi 2003Bacchus, Dalmao & Pitassi 2003 Beame, Impagliazzo, Pitassi, & Segerlind 2003Beame, Impagliazzo, Pitassi, & Segerlind 2003

Matches time/space tradeoffs of best known Matches time/space tradeoffs of best known exact probabilistic inference algorithmsexact probabilistic inference algorithms

(1) ( )

( log )

2 where w is tree-width of formula

if only linear space is used for c

2 ache

O O w

O w n

n

#SAT with Component Caching#SAT with Component Caching

#SAT(F)#SAT(F) a = 1;a = 1;for each G for each G to_components(F) { to_components(F) {

if (G == if (G == ) m = 1;) m = 1;else if (else if ( G) m = 0; G) m = 0;else if (in_cache(G)) m = else if (in_cache(G)) m =

cache_value(G);cache_value(G);else { select v else { select v F; F;

m = ½ * #SAT(G|v) + m = ½ * #SAT(G|v) + ½ * #SAT(G|½ * #SAT(G|v);v);

insert_cache(G,m);}insert_cache(G,m);}a = a * m; }a = a * m; }

return a;return a;

#SAT with Component Caching#SAT with Component Caching

#SAT(F)#SAT(F) a = 1;a = 1;for each G for each G to_components(F) { to_components(F) {

if (G == if (G == ) m = 1;) m = 1;else if (else if ( G) m = 0; G) m = 0;else if (in_cache(G)) m = else if (in_cache(G)) m =

cache_value(G);cache_value(G);else { select v else { select v F; F;

m = ½ * #SAT(G|v) + m = ½ * #SAT(G|v) + ½ * #SAT(G|½ * #SAT(G|v);v);

insert_cache(G,m);}insert_cache(G,m);}a = a * m; }a = a * m; }

return a;return a;

Computes probability m that a random truth assignment satisfies the formula:

# models = 2m

Putting it All TogetherPutting it All Together

Goal: combineGoal: combine Clause learningClause learning Component analysisComponent analysis Formula cachingFormula caching

to create a to create a practicalpractical #SAT algorithm #SAT algorithm Not Not quitequite as straightforward as it looks! as straightforward as it looks!

Issue 1: How Much to Cache?Issue 1: How Much to Cache?

EverythingEverything Infeasible – 10Infeasible – 105050 + nodes + nodes

Only sub-formulas on current branchOnly sub-formulas on current branch Linear spaceLinear space Fixed variable ordering + no clause learning Fixed variable ordering + no clause learning

== Recursive Conditioning (Darwiche 2002)== Recursive Conditioning (Darwiche 2002) Surely we can do better...Surely we can do better...

Efficient Cache ManagementEfficient Cache Management

Ideal: make maximum use of RAM, but not Ideal: make maximum use of RAM, but not one bit moreone bit more

Space & age-bounded cachingSpace & age-bounded caching Separate-chaining hash tableSeparate-chaining hash table Lazy deletion of entries older than K when Lazy deletion of entries older than K when

searching chainssearching chains Constant amortized timeConstant amortized time

If sum of all chains becomes too large, do If sum of all chains becomes too large, do global cleanupglobal cleanup Rare in practiceRare in practice

Issue 2: Interaction of Component Issue 2: Interaction of Component Analysis & Clause LearningAnalysis & Clause Learning

WithoutWithout CL, sub-formulas CL, sub-formulas decreasedecrease in size in size

WithWith CL, sub-formulas may become CL, sub-formulas may become huge huge 1,000 clauses 1,000 clauses 1,000,000 learned clauses 1,000,000 learned clauses

F

F|p F|p

Why this is a ProblemWhy this is a Problem

Finding connected components at each Finding connected components at each node requires linear timenode requires linear time Way too costly for learned clausesWay too costly for learned clauses

Components using learned clauses Components using learned clauses unlikely to reoccurunlikely to reoccur Defeats purpose of formula cachingDefeats purpose of formula caching

SuggestionSuggestion

Use only clauses Use only clauses derived from original derived from original formulaformula for for Component analysisComponent analysis ““Keys” for cached entriesKeys” for cached entries

Use all the Use all the learned clauses learned clauses for unit for unit propagationpropagation

Can this possibly be sound?Can this possibly be sound?

Almost!

Main TheoremMain Theorem

Therefore: for SAT sub-formulas it is Therefore: for SAT sub-formulas it is safesafe to to use learned clauses for unit propagation!use learned clauses for unit propagation!

F| G|

A2A1 A3

UNSAT Sub-formulasUNSAT Sub-formulas

But if But if F|F| is is unsatisfiableunsatisfiable, all bets are off..., all bets are off... WithoutWithout component caching, there is still no component caching, there is still no

problem – because the final value is 0 in any problem – because the final value is 0 in any casecase

With With component cachingcomponent caching, could cause , could cause incorrect values to be cachedincorrect values to be cached

SolutionSolution Flush siblings (& their descendents) of unsat Flush siblings (& their descendents) of unsat

components from cachecomponents from cache

#SAT CC+CL#SAT CC+CL

#SAT(F)#SAT(F)a = 1; s = a = 1; s = ;;for each G for each G to_components(F) { to_components(F) {

if (in_cache(G)) { m = if (in_cache(G)) { m = cache_value(G);}cache_value(G);}

else{ m = else{ m = split(G)split(G);; insert_cache(G,m);insert_cache(G,m); a = a * m;a = a * m; if (m==0) { flush_cache(s);if (m==0) { flush_cache(s);

break; }break; } else s = s else s = s {G}; {G};

}}}}return a;return a;

#SAT CC+CL continued#SAT CC+CL continued

split(G)

if (G == ) return 1; if ( G) {

learn_new_clause()

return 0; }

select v G;return ½ * #SAT(G|v) + ½ * #SAT(G|v);

Results: Pebbling FormulasResults: Pebbling Formulas

30 layers = 930 variables, 1771 clauses

Results: Planning ProblemsResults: Planning Problems

Results: Circuit SynthesisResults: Circuit Synthesis

Random 3-SATRandom 3-SAT

75 var random 3-sat

0.1

1

10

100

1000

10000

100000

0.8 1 1.2 1.4 1.6 1.8 2 2.2

clause / variable ratio

seco

nd

s

relsat

CC+L

SummarySummary

Dramatic progress in automating Dramatic progress in automating propositional inference over last decadepropositional inference over last decade

Progress due to the careful refinement of a Progress due to the careful refinement of a handful of ideashandful of ideas – – DPLL, clause learning, restarts, component DPLL, clause learning, restarts, component

analysis, formula cachinganalysis, formula caching The successful unification of these The successful unification of these

elements for #SAT gives renewed hope for elements for #SAT gives renewed hope for a universal reasoning enginea universal reasoning engine!!

What’s Next?What’s Next?

Evaluation of weighted-#SAT version on Evaluation of weighted-#SAT version on Bayesian networksBayesian networks

Better Better component orderingcomponent ordering and and component-aware component-aware variable branchingvariable branching heuristicsheuristics

Optimal restart policies for #SAT CC+CLOptimal restart policies for #SAT CC+CL Adapt techniques for sampling methods – Adapt techniques for sampling methods –

approximate inference???approximate inference???