AAAI 2013 Conference, Bellevue, WA
AAAI 2013 © 2013 IBM Corporation
Resolution and Parallelizability:Barriers to the Efficient Parallelization of SAT Solvers
George Katsirelos MIAT, INRA, Toulouse, FranceAshish Sabharwal IBM Watson Research Center, USAHorst SamulowitzLaurent Simon Univ. Paris-Sud, LRI/CNRS, Orsay, France
Resolution and Parallelizability
© 2013 IBM Corporation
Trend Towards Parallelization
Focus Shifting From Single-Thread Performanceto Multi-Processor Performance
– 100s and even 1000s of compute cores easily accessible
– Classical Algorithm Parallelization, e.g., parallel sort, PRAM model
– Significant Advances in Data Parallelisme.g., MapReduce, Hadoop, SystemML, R statistics
Challenge: Search and Optimization on 1000s of Processors
– Tremendous advances in the Sequential case of Combinatorial Search E.g., SAT solvers can tackle instances with ~2M variables, 10M constraints!
– Exponential search appears to be an “obvious” candidate to parallelize!
– In fact, many SAT/CSP/MIP solvers already do support multi-core runs
2 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Resolution and Parallelizability
© 2013 IBM Corporation
Parallelization of Combinatorial Search
Fact: State-of-the-Art Search Engines Do NOT Parallelize Well
– Brute Force exponential search is, of course, trivial to parallelize
– But sophisticated search engines that adapt (through e.g. clause learning, impact aggregation, etc.) have inherent sequential aspects
AAAI 2012 Challenge Paper on the topic [Hamadi & Wintersteiger 2012]
Rather Disappointing Performance at SAT Competitions. E.g., in 2011:
– 8-core track: average speedup of best parallel solvers only ~1.8x
– 32-core track: only ~3x
– Top performing solvers based on little to no communication(CryptoMinisat-MT [Soos 2012], Plingeling [Biere 2012])
– Parallel track winners were “simple” Portfolio solvers(ppfolio [Roussel 2012], pfolioUZK [Wotzlaw et al, 2012])
3 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Resolution and Parallelizability
© 2013 IBM Corporation4 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
What makes parallelization of SAT solvers hard?
Can we obtain insights into their behaviorbeyond eventual wall-clock performance?
Resolution and Parallelizability
© 2013 IBM Corporation
Contributions of the Paper
A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity
– Focus on understanding rather than on engineering
– Are there inherent bottlenecks that may hinder parallelization,irrespective of which heuristics are used to share information?
1. A Practical Study: Interesting properties of Actual Proofs
– Proofs generated by state-of-the-art SAT solvers contain narrow bottlenecks
2. Proof-Based Measures that capture Best-Case Parallelizability
– Coarse measure: “Depth” of the proof graph
– Refined measure: Makespan of a resource constrained scheduling problem
3. Empirical Findings: Correlations and Parallelization Limits
– Typical sequential proofs are not very parallelizable even in the best case!
– “Schedule speedup” / makespan correlates with observed speedup
5 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Resolution and Parallelizability
© 2013 IBM Corporation
Approach: Proof Complexity (applied here to Typically Generated Proofs)
Proof Complexity [Cook & Reckhov, 1979]: Study the nature (e.g., size, depth, width, “shape”, etc.) of Proofs of Unsatisfiability
– Resolution Graph of Conflict-Directed-Clause-Learning (CDCL) SAT Solvers
Runtime(any SAT solver, F) minproofs Size(Resolution proof of F)
– Note: Insights applicable also to Satisfiable instances!• Solvers prove a lot of sub-formulas to be unsatisfiable before hitting the first solution• Formal characterization [Achlioptas et al, 2001 & 2004]
Study of Proofs has provided strong insights into CDCL SAT solvers
– What does “clause learning” bring?
– What do “restarts” add?
[Beame et al, 2004; Buss et al, 2008, 2012; Hertel et al, 2008; Pipatsrisawat et al, 2011]
6 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Worst case / Best case results
Resolution and Parallelizability
© 2013 IBM Corporation7
Underlying Inference Principle: Resolution
CDCL SAT solvers produce Resolution Derivations
Proof Graph and Depth:
– Each initial and derived constraint is a node, annotated with its proof depth
– proofdepth(initial clause C) = 0
– proofdepth(derived clause C) = 1 + maxparents proofdepth(parent(C))
C1 0 C2 0 C3 0 C4 0 C5 0 C6 0
C7 1
C8 2
C9 1
C10 3
C11 2
C12 3
C13 4
AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Constraint ID Depth
F :
Resolution and Parallelizability
© 2013 IBM Corporation8
How Parallelizable are Resolution Refutations?
Refutation(F) = Resolution Proof that derives the empty (“false”) clause
Depth of the proof clearly limits the amount of potential parallelization
– Chain of dependencies
– Theorem: Certain “pebbling” style instances have large depth
However, proofdepth bound on parallelization is very crude
– Does not explain poor performance with small k (e.g., 8, 32, … processors)
How does a typical sequential SAT solver proof look like?
– Setup for Experiments:• Sequential Glucose 2.1 extended with proof output• GluSatX10: using SatX10 to run a k-processor version of Sequential Glucose
– Working Assumption: Proofs produced by GluSatX10 on k cores look “similar”to proofs produced by Sequential Glucose
AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
http://x10-lang.org/satx10 [IBM Teams: X10 and SAT/CSP]
** simplified statements; see paper for more formal notions
Resolution and Parallelizability
© 2013 IBM Corporation9
Proof Graph Example: Very Complex Structure
AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
[Easy sequential case, solved in ~30 seconds]
Resolution and Parallelizability
© 2013 IBM Corporation10
Bottlenecks in Typical SAT Proofs
AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Proofs Generated by SAT Solvers Exhibit Surprisingly Narrow “Bottlenecks”, i.e., Depths with Very Few (~1) Clauses!
– Nothing deeper can be derived before bottleneck clauses Sequentiality
Depth in the proof
Nu
mb
er o
f C
lau
ses
(lo
g-sc
ale)
Der
ived
at
that
Dep
th
Resolution and Parallelizability
© 2013 IBM Corporation11
Best-Case Parallelization with k Processors
Given Proof P and k Processors, Best-Case Parallelization of P = Resource Constrained Scheduling Problem with Precedences
Let Mk(P) = makespan of the optimal schedule of P on k processors
– Even approximating Mk(P) within 4/3 is NP-hard, but (2 – 1/k) approx. is easy
Best-Case k processor speedup on P: Sk(P) = M1(P) / Mk(P)
AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
C1 0 C2 0 C3 0 C4 0 C5 0 C6 0
C7 1
C8 2
C9 1
C10 3
C11 2
C12 3
C13 4Constraint ID Depth
C’9 1Example:M1(P) = 8M2(P) = 5M3(P) = 4M4(P) = 4…depth = 4
1 1 2
2 3
3 4
5
Resolution and Parallelizability
© 2013 IBM Corporation12
Makespan vs. Proof Depth
Schedule Makespan yields a finer grained lower bound, Sk(P),on best-case parallelization than proof depth
– proofdepth(P) : limit of parallelization of P with “infinite” processors
– Mk(P) proofdepth(P)
– Mk(P) proofdepth(P) as k
AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Resolution and Parallelizability
© 2013 IBM Corporation13 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Empirical Findings
Resolution and Parallelizability
© 2013 IBM Corporation14
Even Best-Case Parallelization Efficiency is Low Beyond 100 Processors
AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Best-Case Efficiency of parallelizing P with k processors = 100 * (Sk(P) / k)
E.g., 100% = full utilization of k processors speedup = k
Resolution and Parallelizability
© 2013 IBM Corporation15
Proofs of Some Instances Exhibit Very LowBest-Case Schedule Speedup
AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
A) Even with 1024 processors,best-case speedup ~ 50-100
B) 128 processors insufficient toachieve a speedup of ~ 90
Resolution and Parallelizability
© 2013 IBM Corporation16
Best-Case Schedule Speedup Correlates WithActual Observed Runtime Speedup
AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon
Average over a sliding window
(Makes the study of the best-case schedule speedup relevant)
Resolution and Parallelizability
© 2013 IBM Corporation
Summary
A New Systematic Study of Parallelism in the Context of Searchthrough the Lens of Proof Complexity
– Focus on understanding rather than on engineering
Main Findings:
A. Typical Sequential Refutations Contain Surprisingly Narrow Bottlenecks
B. Typical Sequential Refutations are Not Parallelizable Beyond a Few Processors, even in the best case of offline ‘schedule speedup’ produced in hindsight
C. Observed Runtime Speedup with k processors weakly correlates withBest-Case Schedule Speedup of a Sequential Proof produced in hindsight
Open Question: Can we design SAT solvers that generate Proofs that are inherently More Parallelizable?
Caveat: assumption that proofs generated by GluSatX10 on k cores look “similar” to proofs generated by Sequential Glucose
17 AAAI 2013 | Katsirelos, Samulowitz, Sabharwal, Simon