Toward a Universal Inference Toward a Universal Inference EngineEngine
Henry KautzHenry KautzUniversity of WashingtonUniversity of Washington
With Fahiem Bacchus, Paul Beame, With Fahiem Bacchus, Paul Beame, Toni Pitassi, Ashish Sabharwal, & Tian SangToni Pitassi, Ashish Sabharwal, & Tian Sang
Universal Inference EngineUniversal Inference Engine
Old dream of AI –Old dream of AI – General Problem Solver – Newell & SimonGeneral Problem Solver – Newell & Simon Logic + Inference – McCarthy & HayesLogic + Inference – McCarthy & Hayes
Reality:Reality: 1962 – 50 variable toy SAT problems1962 – 50 variable toy SAT problems 1992 – 300 variable non-trivial problems1992 – 300 variable non-trivial problems 1996 – 1,000 variable difficult problems1996 – 1,000 variable difficult problems 2002 – 1,000,000 variable real-world 2002 – 1,000,000 variable real-world
problemsproblems
Pieces of the PuzzlePieces of the Puzzle
Good old Davis-Putnam-Logemann-Good old Davis-Putnam-Logemann-LovelandLoveland
Clause learning (nogood-caching)Clause learning (nogood-caching) Randomized restartsRandomized restarts Component analysisComponent analysis Formula cachingFormula caching
Learning domain-specific heuristicsLearning domain-specific heuristics
GeneralityGenerality
SATSAT #SAT#SAT Bayesian NetworksBayesian Networks Bounded-alternation Quantified Boolean Bounded-alternation Quantified Boolean
formulasformulas Quantified Boolean formulasQuantified Boolean formulas Stochastic SATStochastic SAT
#P complete
NP complete
PSPACE complete
1. Clause Learning1. Clause Learningwith Paul Beame &with Paul Beame & Ashish Sabharwal Ashish Sabharwal
DPLLDPLL((FF))// Perform // Perform unit propagationunit propagation
while exists unit clause while exists unit clause (y) (y) F F
F F FF||yy
if if FF is empty, report satisfiable and halt is empty, report satisfiable and haltif if FF contains the empty clause contains the empty clause returnreturnelse choose a literal else choose a literal xx
DPLLDPLL((FF||xx))
DPLLDPLL((FF||xx))
DPLL AlgorithmDPLL Algorithm
Remove all clauses containing y Shrink all clauses containing y
Extending DPLL: Clause LearningExtending DPLL: Clause Learning
Added conflict clauses Added conflict clauses Capture Capture reasonsreasons of conflicts of conflicts Obtained via Obtained via unit propagationsunit propagations from known ones from known ones Reduce future search by producing conflicts soonerReduce future search by producing conflicts sooner
When backtracking in DPLL, add new clauses When backtracking in DPLL, add new clauses corresponding to causes of failure of the searchcorresponding to causes of failure of the searchEBLEBL [Stallman & Sussman 77, de Kleer & Williams 87] [Stallman & Sussman 77, de Kleer & Williams 87]
CSPCSP [Dechter 90] [Dechter 90]
CLCL [Bayardo-Schrag 97, MarquesSilva-Sakallah 96, [Bayardo-Schrag 97, MarquesSilva-Sakallah 96, Zhang 97, Moskewicz Zhang 97, Moskewicz et al.et al. 01, Zhang 01, Zhang et al.et al. 01] 01]
Conflict GraphsConflict Graphs
FirstNewCut scheme(x1 x2 x3)
Decision scheme(p q b)
1-UIP schemet
p
q
b
a
x1
x2
x3
y
yfalset
Known Clauses(p q a)
( a b t)(t x1)(t x2)(t x3)
(x1 x2 x3 y)(x2 y)
Current decisionsp falseq falseb true
CL Critical to PerformanceCL Critical to PerformanceBest current SAT algorithms rely heavily on Best current SAT algorithms rely heavily on CL for good behavior on real world problemsCL for good behavior on real world problems
GRASP [MarquesSilva-Sakallah 96], SATO [H.Zhang 97]GRASP [MarquesSilva-Sakallah 96], SATO [H.Zhang 97]zChaff [Moskewicz zChaff [Moskewicz et al.et al. 01], Berkmin [Goldberg-Novikov 02] 01], Berkmin [Goldberg-Novikov 02]
However,However, No good understanding of strengths No good understanding of strengths
and weaknesses of CL and weaknesses of CL Not much insight on why it works wellNot much insight on why it works well
when it does when it does
Harnessing the Power of Clause LearningHarnessing the Power of Clause Learning(Beame, Kautz, & Sabharwal 2003)(Beame, Kautz, & Sabharwal 2003)
Mathematical frameworkMathematical framework for analyzing clause learningfor analyzing clause learning
Characterization of its powerCharacterization of its power in relation to well-studied topics inin relation to well-studied topics in proof complexity theoryproof complexity theory
Ways to improve solver performanceWays to improve solver performance based on formal analysisbased on formal analysis
Proofs of UnsatisfiabilityProofs of UnsatisfiabilityWhen When FF is unsatisfiable, is unsatisfiable,
Trace of DPLL on Trace of DPLL on FF is a is a proofproof of its unsatisfiability of its unsatisfiability
BoundBound onon shortest proofshortest proof of of FF gives givesbound on best possible implementationbound on best possible implementation Upper bound – “There is a proof no larger than K” Upper bound – “There is a proof no larger than K”
PotentialPotential for finding proofs quickly for finding proofs quickly Best possible branching heuristic, backtracking, etc.Best possible branching heuristic, backtracking, etc.
Lower bound – “Shortest proof is at least size K”Lower bound – “Shortest proof is at least size K” Inherent limitationsInherent limitations of the algorithm or proof system of the algorithm or proof system
Proof System: ResolutionProof System: Resolution
FF == ((a a b) b) ( (a a c) c) a a ( (b b c) c) (a (a cc))
Unsatisfiable CNF formulaUnsatisfiable CNF formula
c
c
Proof size = 9
empty clause
(a b) (a c) (bc) (a c)a
(b c)
Special Cases of ResolutionSpecial Cases of Resolution
Tree-like resolutionTree-like resolution Graph of inferences forms a Graph of inferences forms a treetree DPLL DPLL
Regular resolutionRegular resolution Variable can be resolved on Variable can be resolved on only onceonly once on any on any
path from input to empty clausepath from input to empty clause Directed acyclic graph analog of DPLL tree Directed acyclic graph analog of DPLL tree
Natural to not branch on a variable once it has Natural to not branch on a variable once it has been eliminatedbeen eliminated
Used in original DP Used in original DP [Davis-Putnam 60][Davis-Putnam 60]
Proof System HierarchyProof System Hierarchy
Tree-likeRES
Space of formulaswith poly-size proofs
RegularRES
[Bonet et al. 00]
General RES
[Alekhnovich et al. 02]
Frege systems
……
Pigeonhole principle[Haken 85]
Thm1.Thm1. CL can beat Regular RES CL can beat Regular RES
RegularRES
General RES
Formula f• Poly-size RES proof • Exp-size Regular proof
Example formulasGTn Ordering principlePeb Pebbling formulas [Alekhnovich et al. 02]
Formula PT(f,)• Poly-size CL proof• Exp-size Regular proof
RegularRES
CL
DPLL
PT(f,PT(f,)):: Proof Trace Extension Proof Trace ExtensionStart withStart with unsatisfiable formula unsatisfiable formula ff with poly-size RES proof with poly-size RES proof
PT(f, PT(f, )) contains contains • All clauses ofAll clauses of ff• For each derived clause For each derived clause Q=(aQ=(abbc)c) in in ,,
– Trace variableTrace variable ttQQ
– New clausesNew clauses ((ttQQ a), (a), (ttQQ b), (b), (ttQQ c)c)
CL proof of PT(f, ) works by branching negativelyon tQ’s in bottom up order of clauses of
PT(f,PT(f,)):: Proof Trace Extension Proof Trace Extension
(a b x) (c x)
Q (a b c)
… … … …
Formula fRES proof
PT(f,PT(f,)):: Proof Trace Extension Proof Trace Extension
(a b x) (c x)
Q (a b c)
… … … …
Formula fRES proof
• Trace variable tQ
• New clauses(tQ a)(tQ b)(tQ c)
PT(f,)
PT(f,PT(f,)):: Proof Trace Extension Proof Trace Extension
(a b x) (c x)
Q (a b c)
… … … …
Formula fRES proof
• Trace variable tQ
• New clauses(tQ a)(tQ b)(tQ c)
PT(f,)
tQ
a
b
c
x
x
false
FirstNewCut(a b c)
How hard is How hard is PT(f,PT(f,))??
Hard for Regular RES:Hard for Regular RES: reduction argument reduction argument
Fact 1Fact 1:: PT(f,PT(f,))||TraceVars = trueTraceVars = true f f
Fact 2Fact 2: If : If is a Regular RES proof of is a Regular RES proof of gg, , then then ||xx is a Regular RES proof of is a Regular RES proof of gg||xx
Fact 3: Fact 3: ff does not have small Regular RES proofs!does not have small Regular RES proofs!
Easy for CL:Easy for CL: by construction by constructionCL branches exactly once on each trace variableCL branches exactly once on each trace variable # branches = size( # branches = size() = poly) = poly
Implications?Implications?
DPLL algorithms w/o clause learning areDPLL algorithms w/o clause learning arehopeless for certain formula classeshopeless for certain formula classes
CL algorithms have CL algorithms have potential for small proofspotential for small proofs
Can we use such analysis to harness this potential?
Pebbling FormulasPebbling Formulas
(a1 a2)
E
A B C
F
T fG = Pebbling(G)
A node X is “pebbled” if (x1 or x2) holds
Source axioms: A, B, C are pebbled
Pebbling axioms: A and B are pebbled D is pebbled
Target axioms: T is not pebbled
(b1 b2) (c1 c2)
(e1 e2)(d1 d2)
(t1 t2)
Pebbling FormulasPebbling Formulas
(a1 a2)
E
A B C
F
T fG = Pebbling(G)
A node X is “pebbled” if (x1 or x2) holds
Source axioms: A, B, C are pebbled
Pebbling axioms: A and B are pebbled D is pebbled
Target axioms: T is not pebbled
(b1 b2) (c1 c2)
(e1 e2)(d1 d2)
(t1 t2)
1 1 1 2
1 2 1 2
2 1 1 2
2 2 1
1 2 1 2 1
2
2
( )
( )
(
[( ) ( )] (
)
( )
)
a b d d
a b d d
a b d d
a a
a b d
b b
d
d d
Grid vs. Randomized PebblingGrid vs. Randomized Pebbling
(a1 a2) b1 (c1 c2 c3)
(d1 d2 d3)
l1
(h1 h2)
(i1 i2 i3 i4)e1
(g1 g2)
f1
(n1 n2)
m1
(a1 a2) (b1 b2) (c1 c2) (d1 d2)
(e1 e2)
(h1 h2)
(t1 t2)
(i1 i2)
(g1 g2)(f1 f2)
Branching SequenceBranching Sequence
BB = ( = (xx11, , xx44, , ::xx33, , xx11, , ::xx88, , ::xx22,, ::xx44, , xx77, , ::xx11, , xx22))
OLD: “Pick unassigned var x” NEW: “Pick next literal y from B; delete it from B; if y already assigned, repeat”
Statement of ResultsStatement of Results
DPLL-Learn*: DPLL-Learn*: Any clause learner with 1-UIP learning Any clause learner with 1-UIP learning scheme and fast backtracking,scheme and fast backtracking, e.g. zChaff [Moskewicz e.g. zChaff [Moskewicz et alet al ’01] ’01]
EfficientEfficient : : (|(|ffGG|) time to generate |) time to generate BBGG EffectiveEffective: : (|(|ffGG|) branching steps to solve |) branching steps to solve ffGG using using BBGG
Given a pebbling graph G, can efficiently generatea branching sequence BG such that DPLL-Learn*(fG, BG)is empirically exponentially faster than DPLL-Learn*(fG)
Genseq on Grid Pebbling Genseq on Grid Pebbling GraphsGraphs
(a1 a2) (b1 b2) (c1 c2) (d1 d2)
(e1 e2)
(h1 h2)
(t1 t2)
(i1 i2)
(g1 g2)(f1 f2)
Results: Grid PebblingResults: Grid Pebbling
Unsatisfiable Satisfiable
Learning OFF Branching Seq OFF
45 vars 55 vars
Learning OFF Branching Seq ON
45 vars 55 vars
Learning ON Branching Seq OFF
2,000 vars 4,500 vars
Learning ON Branching Seq ON
2,500,000 vars 1,000,000 vars
zChaff settings
Max formula size solved24 hours; 512 MB memory
OriginalzChaff
ModifiedzChaff
NaiveDPLL
Results: Randomized PebblingResults: Randomized Pebbling
Unsatisfiable Satisfiable
Learning OFF Branching Seq OFF
35 vars 35 vars
Learning OFF Branching Seq ON
45 vars 45 vars
Learning ON Branching Seq OFF
350 vars 350 vars
Learning ON Branching Seq ON
45,000 vars 20,000 vars
zChaff settings
Max formula size solved24 hours; 512MB memory
OriginalzChaff
ModifiedzChaff
NaiveDPLL
RestartsRestarts
Run-time distribution typically has high Run-time distribution typically has high variance across instances or random variance across instances or random seedsseeds tie-breaking in branching heuristictie-breaking in branching heuristic heavy-tailed – infinite mean & variance!heavy-tailed – infinite mean & variance!
Leverage by restart strategiesLeverage by restart strategies Heavy-tailed Heavy-tailed exponential distribution exponential distribution
short long
Generalized RestartsGeneralized Restarts
At conflict backtrack to arbitrary point in At conflict backtrack to arbitrary point in search treesearch tree Lowest conflict decision variable = backjumpingLowest conflict decision variable = backjumping Root = restartRoot = restart Other = partial restartOther = partial restart
Adding clause learning makes almost any Adding clause learning makes almost any restart scheme completerestart scheme complete (J. Marques-Silva 2002)(J. Marques-Silva 2002)
Aggressive BacktrackingAggressive Backtracking
zChaff – at conflict zChaff – at conflict backtrack to above backtrack to above highest conflict variablehighest conflict variable Not traditional backjumping!Not traditional backjumping!
Wasteful?Wasteful? Learned clause saves “most” workLearned clause saves “most” work Learned clause provides Learned clause provides new evidencenew evidence about about
best branching variable and value!best branching variable and value!
Why #SAT?Why #SAT?
Prototypical #P complete problemPrototypical #P complete problem Can encode probabilistic inferenceCan encode probabilistic inference Natural encoding for counting problemsNatural encoding for counting problems
Bayesian Nets to Weighted Bayesian Nets to Weighted CountingCounting
Introduce new vars so all internal vars are Introduce new vars so all internal vars are deterministicdeterministic
A
B
AA ~A~A
B B .2.2 .6.6
A A .1.1
Bayesian Nets to Weighted Bayesian Nets to Weighted CountingCounting
Introduce new vars so all internal vars are Introduce new vars so all internal vars are deterministicdeterministic
A
B
AA ~A~A
B B .2.2 .6.6
A A .1.1 A
B
P Q
A A .1.1 PP .2.2 Q Q .6.6
( ) ( )B A P A Q
Bayesian Nets to Weighted Bayesian Nets to Weighted CountingCounting
Weight of a model Weight of a model is product of is product of variable weightsvariable weights
Weight of a Weight of a formula is sum of formula is sum of weights of its weights of its modelsmodels
A
B
P Q
A A .1.1 PP .2.2 Q Q .6.6
( ) ( )B A P A Q
Bayesian Nets to Weighted Bayesian Nets to Weighted CountingCounting
Let F be the Let F be the formula defining all formula defining all internal variablesinternal variables
Pr(query) =Pr(query) =weight(F & query)weight(F & query)
A
B
P Q
A A .1.1 PP .2.2 Q Q .6.6
( ) ( )B A P A Q
Bayesian Nets to CountingBayesian Nets to Counting
Unweighted counting is case where all Unweighted counting is case where all non-defined variables have weight 0.5non-defined variables have weight 0.5
Introduce sets of variables to define other Introduce sets of variables to define other probabilities to desired accuracyprobabilities to desired accuracy
In practice: just modify #SAT algorithm to In practice: just modify #SAT algorithm to weighted #SATweighted #SAT
Component AnalysisComponent Analysis
Can use DPLL to count modelsCan use DPLL to count models Just don’t stop when first assignment is foundJust don’t stop when first assignment is found
If formula breaks into If formula breaks into separate componentsseparate components (no shared variables), can count each (no shared variables), can count each separately and multiply results:separately and multiply results:#SAT(C1 #SAT(C1 C2) = #SAT(C1) * #SAT(C2) C2) = #SAT(C1) * #SAT(C2)
RelSat RelSat (Bayardo)(Bayardo) – CL + component analysis – CL + component analysis at each node in search treeat each node in search tree
50 variable #SAT50 variable #SAT State of the art circa 2000State of the art circa 2000
5. Formula Caching5. Formula Caching
with Fahiem Bacchus, Paul Beame, with Fahiem Bacchus, Paul Beame, Toni Pitassi, & Tian SangToni Pitassi, & Tian Sang
Formula CachingFormula Caching
New idea: cache counts of residual formulas New idea: cache counts of residual formulas at each nodeat each node Bacchus, Dalmao & Pitassi 2003Bacchus, Dalmao & Pitassi 2003 Beame, Impagliazzo, Pitassi, & Segerlind 2003Beame, Impagliazzo, Pitassi, & Segerlind 2003
Matches time/space tradeoffs of best known Matches time/space tradeoffs of best known exact probabilistic inference algorithmsexact probabilistic inference algorithms
(1) ( )
( log )
2 where w is tree-width of formula
if only linear space is used for c
2 ache
O O w
O w n
n
#SAT with Component Caching#SAT with Component Caching
#SAT(F)#SAT(F) a = 1;a = 1;for each G for each G to_components(F) { to_components(F) {
if (G == if (G == ) m = 1;) m = 1;else if (else if ( G) m = 0; G) m = 0;else if (in_cache(G)) m = else if (in_cache(G)) m =
cache_value(G);cache_value(G);else { select v else { select v F; F;
m = ½ * #SAT(G|v) + m = ½ * #SAT(G|v) + ½ * #SAT(G|½ * #SAT(G|v);v);
insert_cache(G,m);}insert_cache(G,m);}a = a * m; }a = a * m; }
return a;return a;
#SAT with Component Caching#SAT with Component Caching
#SAT(F)#SAT(F) a = 1;a = 1;for each G for each G to_components(F) { to_components(F) {
if (G == if (G == ) m = 1;) m = 1;else if (else if ( G) m = 0; G) m = 0;else if (in_cache(G)) m = else if (in_cache(G)) m =
cache_value(G);cache_value(G);else { select v else { select v F; F;
m = ½ * #SAT(G|v) + m = ½ * #SAT(G|v) + ½ * #SAT(G|½ * #SAT(G|v);v);
insert_cache(G,m);}insert_cache(G,m);}a = a * m; }a = a * m; }
return a;return a;
Computes probability m that a random truth assignment satisfies the formula:
# models = 2m
Putting it All TogetherPutting it All Together
Goal: combineGoal: combine Clause learningClause learning Component analysisComponent analysis Formula cachingFormula caching
to create a to create a practicalpractical #SAT algorithm #SAT algorithm Not Not quitequite as straightforward as it looks! as straightforward as it looks!
Issue 1: How Much to Cache?Issue 1: How Much to Cache?
EverythingEverything Infeasible – 10Infeasible – 105050 + nodes + nodes
Only sub-formulas on current branchOnly sub-formulas on current branch Linear spaceLinear space Fixed variable ordering + no clause learning Fixed variable ordering + no clause learning
== Recursive Conditioning (Darwiche 2002)== Recursive Conditioning (Darwiche 2002) Surely we can do better...Surely we can do better...
Efficient Cache ManagementEfficient Cache Management
Ideal: make maximum use of RAM, but not Ideal: make maximum use of RAM, but not one bit moreone bit more
Space & age-bounded cachingSpace & age-bounded caching Separate-chaining hash tableSeparate-chaining hash table Lazy deletion of entries older than K when Lazy deletion of entries older than K when
searching chainssearching chains Constant amortized timeConstant amortized time
If sum of all chains becomes too large, do If sum of all chains becomes too large, do global cleanupglobal cleanup Rare in practiceRare in practice
Issue 2: Interaction of Component Issue 2: Interaction of Component Analysis & Clause LearningAnalysis & Clause Learning
WithoutWithout CL, sub-formulas CL, sub-formulas decreasedecrease in size in size
WithWith CL, sub-formulas may become CL, sub-formulas may become huge huge 1,000 clauses 1,000 clauses 1,000,000 learned clauses 1,000,000 learned clauses
F
F|p F|p
Why this is a ProblemWhy this is a Problem
Finding connected components at each Finding connected components at each node requires linear timenode requires linear time Way too costly for learned clausesWay too costly for learned clauses
Components using learned clauses Components using learned clauses unlikely to reoccurunlikely to reoccur Defeats purpose of formula cachingDefeats purpose of formula caching
SuggestionSuggestion
Use only clauses Use only clauses derived from original derived from original formulaformula for for Component analysisComponent analysis ““Keys” for cached entriesKeys” for cached entries
Use all the Use all the learned clauses learned clauses for unit for unit propagationpropagation
Can this possibly be sound?Can this possibly be sound?
Almost!
Main TheoremMain Theorem
Therefore: for SAT sub-formulas it is Therefore: for SAT sub-formulas it is safesafe to to use learned clauses for unit propagation!use learned clauses for unit propagation!
F| G|
A2A1 A3
UNSAT Sub-formulasUNSAT Sub-formulas
But if But if F|F| is is unsatisfiableunsatisfiable, all bets are off..., all bets are off... WithoutWithout component caching, there is still no component caching, there is still no
problem – because the final value is 0 in any problem – because the final value is 0 in any casecase
With With component cachingcomponent caching, could cause , could cause incorrect values to be cachedincorrect values to be cached
SolutionSolution Flush siblings (& their descendents) of unsat Flush siblings (& their descendents) of unsat
components from cachecomponents from cache
#SAT CC+CL#SAT CC+CL
#SAT(F)#SAT(F)a = 1; s = a = 1; s = ;;for each G for each G to_components(F) { to_components(F) {
if (in_cache(G)) { m = if (in_cache(G)) { m = cache_value(G);}cache_value(G);}
else{ m = else{ m = split(G)split(G);; insert_cache(G,m);insert_cache(G,m); a = a * m;a = a * m; if (m==0) { flush_cache(s);if (m==0) { flush_cache(s);
break; }break; } else s = s else s = s {G}; {G};
}}}}return a;return a;
#SAT CC+CL continued#SAT CC+CL continued
split(G)
if (G == ) return 1; if ( G) {
learn_new_clause()
return 0; }
select v G;return ½ * #SAT(G|v) + ½ * #SAT(G|v);
75 var random 3-sat
0.1
1
10
100
1000
10000
100000
0.8 1 1.2 1.4 1.6 1.8 2 2.2
clause / variable ratio
seco
nd
s
relsat
CC+L
SummarySummary
Dramatic progress in automating Dramatic progress in automating propositional inference over last decadepropositional inference over last decade
Progress due to the careful refinement of a Progress due to the careful refinement of a handful of ideashandful of ideas – – DPLL, clause learning, restarts, component DPLL, clause learning, restarts, component
analysis, formula cachinganalysis, formula caching The successful unification of these The successful unification of these
elements for #SAT gives renewed hope for elements for #SAT gives renewed hope for a universal reasoning enginea universal reasoning engine!!
What’s Next?What’s Next?
Evaluation of weighted-#SAT version on Evaluation of weighted-#SAT version on Bayesian networksBayesian networks
Better Better component orderingcomponent ordering and and component-aware component-aware variable branchingvariable branching heuristicsheuristics
Optimal restart policies for #SAT CC+CLOptimal restart policies for #SAT CC+CL Adapt techniques for sampling methods – Adapt techniques for sampling methods –
approximate inference???approximate inference???