Lifted Probabilistic Inference
in Relational Models
Guy Van den Broeck Dan Suciu
KU Leuven U. of Washington
Tutorial UAI 2014 1
About the Tutorial
Slides available online.
Bibliography is at the end.
Your speakers:
I work in AI I work in DB
http://www.guyvdb.eu/
https://homes.cs.washington.edu/~suciu/
About the Tutorial
• The tutorial is about
– deep connections between AI and DBs
– a unified view on probabilistic reasoning
– a logical approach to Lifted Inference
• The tutorial is NOT an exhaustive
overview of lifted algorithms for graphical
models (see references at the end)
Outline
• Part 1: Motivation
• Part 2: Probabilistic Databases
• Part 3: Weighted Model Counting
• Part 4: Lifted Inference for WFOMC
• Part 5: The Power of Lifted Inference
• Part 6: Conclusion/Open Problems
Tutorial UAI 2014 4
Part 1: Motivation
• Why do we need relational representations
of uncertainty?
• Why do we need lifted inference
algorithms?
Tutorial UAI 2014 5
Why Relational Data?
• Our data is already relational!
– Companies run relational databases
– Scientific data is relational:
• Large Hadron Collider generated 25PB in 2012
• LSST Telescope will produce 30TB per night
• Big data is big business:
– Oracle: $7.1BN in sales
– IBM: $3.2BN in sales
– Microsoft: $2.6BN in sales
≈ GDP of
Iceland!
6 [Gartner‟06]
Why Probabilistic Relational Data?
• Relational data is increasingly probabilistic
– NELL machine reading (>50M tuples)
– Google Knowledge Vault (>2BN tuples)
– DeepDive (>7M tuples)
• Data is inferred from unstructured information using statistical models
– Learned from the web, large text corpora, ontologies, etc.
– The learned/extracted data is relational
Tutorial UAI 2014 7 [Carlson‟10, Dong‟14, Niu‟12]
Representation:
Probabilistic Databases
• Tuple-independent probabilistic databases
• Query: SQL or First Order Logic
Actor:
Name Prob
Brando 0.9
Cruise 0.8
Coppola 0.1
WorkedFor:
Actor Director Prob
Brando Coppola 0.9
Coppola Brando 0.2
Cruise Coppola 0.1
Tutorial UAI 2014 8
Q(x) = ∃y Actor(x)∧WorkedFor(x,y) SELECT Actor.name
FROM Actor, WorkedFor
WHERE Actor.name = WorkedFor.actor
9
Logical
Propositional Relational
Summary
Uncertainty in AI
Databases
Statistical Prob.
DBs
10
Statistical
Logical
Propositional Relational
Representations in AI and ML
Cloudy Rain P(Rain|Cloudy)
T T 0.80
T F 0.20
F T 0.01
F F 0.99
Rain ⇒ Cloudy
Relational Representations
Example: First-Order Logic
Logical variables have domain of constants
x,y range over domain People = {Alice,Bob}
Ground formula has no logical variables
Smokes(Alice) ∧ Friends(Alice,Bob) ⇒ Smokes(Bob)
Tutorial UAI 2014 11
∀x,y, Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)
Formula
Logical Variable Atom
12
Statistical
Logical
Propositional Relational
Representations in AI and ML
Cloudy Rain P(Rain|Cloudy)
T T 0.80
T F 0.20
F T 0.01
F F 0.99
∀x, ∀y, Smokes(x) ∧ Friends(x,y)
⇒ Smokes(y) Rain ⇒ Cloudy
Why Statistical Relational Models?
Probabilistic graphical models
Quantify uncertainty and noise
Not very expressive Rules of chess in ~100,000 pages
First-order logic Very expressive Rules of chess in 1 page
Good match for abundant relational data
Hard to express uncertainty and noise
Tutorial UAI 2014 13
Example: Markov Logic
Weighted First-Order Logic
Ground atom/tuple = random variable in {true,false}
e.g., Smokes(Alice), Friends(Alice,Bob), etc.
Ground formula = factor in propositional factor graph
Friends(Alice,Bob)
Smokes(Alice) Smokes(Bob)
Friends(Bob,Alice)
f1 f2 Friends(Alice,Alice) Friends(Bob,Bob)
f3 f4
3.14 Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)
FOL Formula Weight or Probability
[Richardson‟06]
15
Cloudy Rain P(Rain|Cloudy)
T T 0.80
T F 0.20
F T 0.01
F F 0.99
Statistical
Logical
Propositional Relational
Representations in AI and ML
3.14 Smokes(x) ∧ Friends(x,y)
⇒ Smokes(y)
∀x, ∀y, Smokes(x) ∧ Friends(x,y)
⇒ Smokes(y) Rain ⇒ Cloudy
15
Cloudy Rain P(Rain|Cloudy)
T T 0.80
T F 0.20
F T 0.01
F F 0.99
Statistical
Logical
Propositional Relational
Representations in AI and ML
3.14 Smokes(x) ∧ Friends(x,y)
⇒ Smokes(y)
∀x, ∀y, Smokes(x) ∧ Friends(x,y)
⇒ Smokes(y) Rain ⇒ Cloudy
16
Logical
Propositional Relational
Summary
Uncertainty in AI
Databases
Statistical StarAI
17
Logical
Propositional Relational
Summary
Uncertainty in AI
Databases
Statistical
17
Logical
Propositional Relational
Summary
Uncertainty in AI
Databases
Statistical
Lifted Inference
• Main idea: exploit high level relational
representation to speed up reasoning
• Let‟s see an example…
Tutorial UAI 2014 18
A Simple Reasoning Problem
52 playing cards
Let us ask some simple questions
...
Tutorial UAI 2014 19
...
A Simple Reasoning Problem
?
Probability that Card1 is Q?
Tutorial UAI 2014 20
...
A Simple Reasoning Problem
?
Probability that Card1 is Q? 1/13
Tutorial UAI 2014 20
...
A Simple Reasoning Problem
?
Probability that Card1 is Hearts?
Tutorial UAI 2014 21
...
A Simple Reasoning Problem
?
Probability that Card1 is Hearts? 1/4
Tutorial UAI 2014 21
...
A Simple Reasoning Problem
?
Probability that Card1 is Hearts
given that Card1 is red?
Tutorial UAI 2014 22
...
A Simple Reasoning Problem
?
Probability that Card1 is Hearts
given that Card1 is red? 1/2
Tutorial UAI 2014 22
A Simple Reasoning Problem
...
?
Probability that Card52 is Spades
given that Card1 is QH?
Tutorial UAI 2014 23
A Simple Reasoning Problem
...
?
Probability that Card52 is Spades
given that Card1 is QH? 13/51
Tutorial UAI 2014 23
Automated Reasoning
Let us automate this:
1. Probabilistic graphical model (e.g., factor graph)
2. Probabilistic inference algorithm
(e.g., variable elimination or junction tree)
Tutorial UAI 2014 24
Reasoning in Propositional Models
A key result: Treewidth Why?
A
B C
D E
F
A
B C
D E
F
A
B C
D E
F
Tree Sparse Graph Dense Graph
25
Reasoning in Propositional Models
A key result: Treewidth Why?
A
B C
D E
F
A
B C
D E
F
A
B C
D E
F
Conditional Independence!
Tree Sparse Graph Dense Graph
25
Reasoning in Propositional Models
A key result: Treewidth Why?
A
B C
D E
F
A
B C
D E
F
A
B C
D E
F
P(A|C,E) = P(A|C) P(A|B,E,F) = P(A|B,E) P(A|B,E,F) ≠ P(A|B,E)
Conditional Independence!
Tree Sparse Graph Dense Graph
25
Is There Conditional Independence?
...
P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)
26
Is There Conditional Independence?
...
?
P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)
? ≟ ?
26
Is There Conditional Independence?
...
?
P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)
13/51 ≟ ?
26
Is There Conditional Independence?
...
?
P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)
13/51 ≟ ?
26
Is There Conditional Independence?
...
?
P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)
13/51 ≠ 12/50
26
Is There Conditional Independence?
...
?
P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)
13/51 ≠ 12/50
P(Card52 | Card1) ≠ P(Card52 | Card1, Card2)
26
Is There Conditional Independence?
...
?
P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)
13/51 ≠ 12/50
P(Card52 | Card1, Card2) ≟ P(Card52 | Card1, Card2, Card3)
P(Card52 | Card1) ≠ P(Card52 | Card1, Card2)
26
Is There Conditional Independence?
...
?
P(Card52 | Card1) ≟ P(Card52 | Card1, Card2)
13/51 ≠ 12/50
P(Card52 | Card1, Card2) ≟ P(Card52 | Card1, Card2, Card3)
P(Card52 | Card1) ≠ P(Card52 | Card1, Card2)
12/50 ≠ 12/49
P(Card52 | Card1, Card2) ≠ P(Card52 | Card1, Card2, Card3)
26
Automated Reasoning
(artist's impression)
Let us automate this:
1. Probabilistic graphical model (e.g., factor graph) is fully connected!
2. Probabilistic inference algorithm (e.g., variable elimination or junction tree) builds a table with 1352 rows
27
...
What's Going On Here?
?
Probability that Card52 is Spades
given that Card1 is QH?
Tutorial UAI 2014 28
...
What's Going On Here?
?
Probability that Card52 is Spades
given that Card1 is QH? 13/51
Tutorial UAI 2014 28
What's Going On Here?
?
...
Probability that Card52 is Spades
given that Card2 is QH?
Tutorial UAI 2014 29
What's Going On Here?
?
...
Probability that Card52 is Spades
given that Card2 is QH? 13/51
Tutorial UAI 2014 29
What's Going On Here?
?
...
Probability that Card52 is Spades
given that Card3 is QH?
Tutorial UAI 2014 30
What's Going On Here?
?
...
Probability that Card52 is Spades
given that Card3 is QH? 13/51
Tutorial UAI 2014 30
...
Tractable Probabilistic Inference
Which property makes inference tractable?
Traditional belief: Independence (conditional/contextual)
What's going on here?
31 [Niepert‟14]
...
Tractable Probabilistic Inference
Which property makes inference tractable?
Traditional belief: Independence (conditional/contextual)
What's going on here?
⇒ Lifted Inference
High-level reasoning
Symmetry
Exchangeability
31 [Niepert‟14]
...
Tractable Probabilistic Inference
Which property makes inference tractable?
Traditional belief: Independence (conditional/contextual)
What's going on here?
⇒ Lifted Inference
High-level reasoning
Symmetry
Exchangeability See AAAI talk on Tuesday!
31 [Niepert‟14]
Automated Reasoning
Let us automate this:
Relational model
Lifted probabilistic inference algorithm
Tutorial UAI 2014 32
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Other Examples of Lifted Inference
implies
First-order resolution
Tutorial UAI 2014 33
∀x, Human(x) ⇒ Mortal(x)
∀x, Greek(x) ⇒ Human(x)
∀x, Greek(x) ⇒ Mortal(x)
Other Examples of Lifted Inference
First-order resolution
Reasoning about populations We are investigating a rare disease. The disease is more rare in
women, presenting only in one in every two billion women
and one in every billion men. Then, assuming there are 3.4
billion men and 3.6 billion women in the world, the probability
that more than five people have the disease is
Lifted Inference in SRL
Statistical relational model (e.g., MLN)
As a probabilistic graphical model:
26 pages; 728 variables; 676 factors
1000 pages; 1,002,000 variables;
1,000,000 factors
Highly intractable?
– Lifted inference in milliseconds!
Tutorial UAI 2014 35
3.14 FacultyPage(x) ∧ Linked(x,y) ⇒ CoursePage(y)
Summary of Motivation
• Relational data is everywhere:
– Databases in industry
– Databases in sciences
– Knowledge bases
• Lifted inference:
– Use relational structure during reasoning
– Very efficient where traditional methods break
This tutorial: Lifted Inference in Relational Models
Tutorial UAI 2014 36
Outline
• Part 1: Motivation
• Part 2: Probabilistic Databases
• Part 3: Weighted Model Counting
• Part 4: Lifted Inference for WFOMC
• Part 5: The Power of Lifted Inference
• Part 6: Conclusion/Open Problems
Tutorial UAI 2014 37
What Everyone Should Know about
Databases
• Database = several relations (a.k.a. tables)
• SQL Query = FO Formula
• Boolean Query = FO Sentence
Tutorial UAI 2014 38
What Everyone Should Know about
Databases
X Y
Alice 2009
Alice 2010
Bob 2009
Carol 2010
Smoker X Z
Alice Bob
Alice Carol
Bob Carol
Carol Bob
Friend
D =
Database: relations (= tables)
What Everyone Should Know about
Databases
X Y
Alice 2009
Alice 2010
Bob 2009
Carol 2010
Smoker X Z
Alice Bob
Alice Carol
Bob Carol
Carol Bob
Friend
D =
Z
Bob
Carol
Database: relations (= tables)
Query answer: Q(D) =
Q(z) = ∃x (Smoker(x,‟2009‟)∧Friend(x,z)) Query: First Order Formula
Find friends of smokers in 2009
Conjunctive Queries CQ = FO(∃, ∧)
Union of Conjunctive Queries UCQ = FO(∃, ∧, ∨)
What Everyone Should Know about
Databases
X Y
Alice 2009
Alice 2010
Bob 2009
Carol 2010
Smoker X Z
Alice Bob
Alice Carol
Bob Carol
Carol Bob
Friend
Boolean Query: FO Sentence Q = ∃x (Smoker(x,‟2009‟)∧Friend(x,‟Bob‟))
Query answer: Q(D) = TRUE
D =
Z
Bob
Carol
Database: relations (= tables)
Query answer: Q(D) =
Q(z) = ∃x (Smoker(x,‟2009‟)∧Friend(x,z)) Query: First Order Formula
Find friends of smokers in 2009
Conjunctive Queries CQ = FO(∃, ∧)
Union of Conjunctive Queries UCQ = FO(∃, ∧, ∨)
What Everyone Should Know about
Databases
Declarative Query Query Plan
“what” “how”
What Everyone Should Know about
Databases
Declarative Query Query Plan
“what” “how”
Q(z) = ∃x (Smoker(x,‟2009‟) ∧ Friend(x,z))
What Everyone Should Know about
Databases
Declarative Query Query Plan
“what” “how”
Q(z) = ∃x (Smoker(x,‟2009‟) ∧ Friend(x,z))
Πz
⋈x
σy=„2009‟
Smoker(x,y) Friend(x,z)
Logical Query Plan
What Everyone Should Know about
Databases
Declarative Query Query Plan
“what” “how”
Q(z) = ∃x (Smoker(x,‟2009‟) ∧ Friend(x,z))
Πz
⋈x
σy=„2009‟
Smoker(x,y) Friend(x,z)
Logical Query Plan
Πz
⋈x
σy=„2009‟
Smoker(x,y) Friend(x,z)
Logical Query Plan
Optimize
What Everyone Should Know about
Databases
Declarative Query Query Plan
“what” “how”
Q(z) = ∃x (Smoker(x,‟2009‟) ∧ Friend(x,z))
Πz
⋈x
σy=„2009‟
Smoker(x,y) Friend(x,z)
Logical Query Plan
Index-scan
Merge-join
Hash-based
deduplication
Physical Query Plan
Πz
⋈x
σy=„2009‟
Smoker(x,y) Friend(x,z)
Logical Query Plan
Optimize
What Every Researcher Should
Know about Databases
Problem: compute Q(D)
Moshe Vardi [Vardi‟82]
2008 ACM SIGMOD Contribution Award
Tutorial UAI 2014 41 [Vardi‟82]
What Every Researcher Should
Know about Databases
Problem: compute Q(D)
• Data complexity:
fix Q, complexity = f(D)
Moshe Vardi [Vardi‟82]
2008 ACM SIGMOD Contribution Award
Tutorial UAI 2014 41 [Vardi‟82]
What Every Researcher Should
Know about Databases
Problem: compute Q(D)
• Data complexity:
fix Q, complexity = f(D)
Query complexity: (expression complexity)
fix D, complexity = f(Q)
• Combined complexity:
complexity = f(D,Q)
Moshe Vardi [Vardi‟82]
2008 ACM SIGMOD Contribution Award
Tutorial UAI 2014 41 [Vardi‟82]
Probabilistic Databases
• A probabilistic database = relational
database where each tuple has an
associated probability
• Semantics = probability distribution over
possible worlds (deterministic databases)
• In this talk: tuples are independent events
Tutorial UAI 2014 42
[Suciu‟11]
Example
x y P
A B p1
A C p2
B C p3
Probabilistic database D: Friend
Tutorial UAI 2014 43
x y
A B
A C
B C
Example
x y P
A B p1
A C p2
B C p3
Possible worlds semantics:
p1p2p3
Probabilistic database D: Friend
Tutorial UAI 2014 43
x y
A B
A C
B C
Example
x y P
A B p1
A C p2
B C p3
Possible worlds semantics:
p1p2p3
(1-p1)p2p3
Probabilistic database D: Friend
x y
A C
B C
Tutorial UAI 2014 43
x y
A B
A C
B C
Example
x y P
A B p1
A C p2
B C p3
Possible worlds semantics:
p1p2p3
(1-p1)p2p3
(1-p1)(1-p2)(1-p3)
Probabilistic database D: Friend
x y
A C
B C
x y
A B
A C
x y
A B
B C
x y
A B x y
A C x y
B C x y
Tutorial UAI 2014 43
Query Semantics
Fix a Boolean query Q
Fix a probabilistic database D:
P (Q | D) = marginal probability of Q
on possible words of D
Tutorial UAI 2014 44
x y P
A D q1 Y1
A E q2 Y2
B F q3 Y3
B G q4 Y4
B H q5 Y5
Friend
x P
A p1 X1
B p2 X2
C p3 X3
Smoker
P(Q | D) =
An Example
Q = ∃x∃y Smoker(x)∧Friend(x,y)
x y P
A D q1 Y1
A E q2 Y2
B F q3 Y3
B G q4 Y4
B H q5 Y5
Friend
x P
A p1 X1
B p2 X2
C p3 X3
Smoker
P(Q | D) = 1-(1-q1)*(1-q2)
An Example
Q = ∃x∃y Smoker(x)∧Friend(x,y)
x y P
A D q1 Y1
A E q2 Y2
B F q3 Y3
B G q4 Y4
B H q5 Y5
Friend
x P
A p1 X1
B p2 X2
C p3 X3
Smoker
P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]
An Example
Q = ∃x∃y Smoker(x)∧Friend(x,y)
x y P
A D q1 Y1
A E q2 Y2
B F q3 Y3
B G q4 Y4
B H q5 Y5
Friend
x P
A p1 X1
B p2 X2
C p3 X3
Smoker
P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]
1-(1-q3)*(1-q4)*(1-q5)
An Example
Q = ∃x∃y Smoker(x)∧Friend(x,y)
x y P
A D q1 Y1
A E q2 Y2
B F q3 Y3
B G q4 Y4
B H q5 Y5
Friend
x P
A p1 X1
B p2 X2
C p3 X3
Smoker
P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]
1-(1-q3)*(1-q4)*(1-q5) p2*[ ]
An Example
Q = ∃x∃y Smoker(x)∧Friend(x,y)
x y P
A D q1 Y1
A E q2 Y2
B F q3 Y3
B G q4 Y4
B H q5 Y5
Friend
x P
A p1 X1
B p2 X2
C p3 X3
Smoker
P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]
1-(1-q3)*(1-q4)*(1-q5) p2*[ ]
1- {1- } *
{1- }
An Example
Q = ∃x∃y Smoker(x)∧Friend(x,y)
x y P
A D q1 Y1
A E q2 Y2
B F q3 Y3
B G q4 Y4
B H q5 Y5
Friend
x P
A p1 X1
B p2 X2
C p3 X3
Smoker
P(Q | D) = 1-(1-q1)*(1-q2) p1*[ ]
1-(1-q3)*(1-q4)*(1-q5) p2*[ ]
1- {1- } *
{1- }
An Example
Q = ∃x∃y Smoker(x)∧Friend(x,y)
One can compute P(Q | D) in PTIME
in the size of the database D
An Example Q = ∃x∃y Smoker(x)∧Friend(x,y)
x y P
A D q1
A E q2
B F q3
B G q4
B H q5
x P
A p1
B p2
C p3
⋈
ΠΦ
Friend(x,y)
Πx
Smoker(x)
Use the SQL engine
to compute the query!
Aggregate on probabilities.
An Example Q = ∃x∃y Smoker(x)∧Friend(x,y)
x y P
A D q1
A E q2
B F q3
B G q4
B H q5
x P
A p1
B p2
C p3
⋈
ΠΦ
Friend(x,y)
Πx
x P
A 1-(1-q1)(1-q2)
B 1-(1-q4)(1-q5) (1-q6)
Smoker(x)
Use the SQL engine
to compute the query!
Aggregate on probabilities.
An Example Q = ∃x∃y Smoker(x)∧Friend(x,y)
x y P
A D q1
A E q2
B F q3
B G q4
B H q5
x P
A p1
B p2
C p3
⋈
ΠΦ
Friend(x,y)
Πx
x P
A 1-(1-q1)(1-q2)
B 1-(1-q4)(1-q5) (1-q6)
1-{1-p1[1-(1-q1)(1-q2)]}*
{1-p2[1-(1-q4)(1-q5) (1-q6)]}
Smoker(x)
Use the SQL engine
to compute the query!
Aggregate on probabilities.
Problem Statement
Given: probabilistic database D, query Q
Compute: P(Q | D)
Data complexity: fix Q, complexity = f(|D|)
Tutorial UAI 2014 47
Approaches to Compute P(Q | D)
• Propositional inference:
– Ground the query Q FQ,D, compute P(FQ,D)
– This is Weighted Model Counting (later…)
– Works for every query Q
– But: may be exponential in |D| (data complexity)
• Lifted inference:
– Compute a query plan for Q, execute plan on D
– Always polynomial time in |D| (data complexity)
– But: does not work for all queries Q
Tutorial UAI 2014 48 [Olteanu‟08, Jha‟13, Dalvi‟04, Dalvi‟12]
The Lifted Inference Rules
• If Q1, Q2 are independent:
AND-rule: P(Q1 ∧ Q2) = P(Q1)P(Q2)
OR-rule: P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2))
The Lifted Inference Rules
• If Q1, Q2 are independent:
AND-rule: P(Q1 ∧ Q2) = P(Q1)P(Q2)
OR-rule: P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2))
• If Q[C1/x], Q[C2/x], … are independent
∀-Rule: P(∀z Q) = ΠC ∈Domain P(Q[C/z])
∃-Rule: P(∃z Q) = 1 – ΠC ∈Domain (1– P(Q[C/z])
The Lifted Inference Rules
• If Q1, Q2 are independent:
AND-rule: P(Q1 ∧ Q2) = P(Q1)P(Q2)
OR-rule: P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2))
• If Q[C1/x], Q[C2/x], … are independent
∀-Rule: P(∀z Q) = ΠC ∈Domain P(Q[C/z])
∃-Rule: P(∃z Q) = 1 – ΠC ∈Domain (1– P(Q[C/z])
• Inclusion/Exclusion formula:
P(Q1 ∨ Q2) = P(Q1) + P(Q2)- P(Q1 ∧ Q2)
P(Q1 ∧ Q2) = P(Q1) + P(Q2)- P(Q1 ∨ Q2)
The Lifted Inference Rules
• If Q1, Q2 are independent:
AND-rule: P(Q1 ∧ Q2) = P(Q1)P(Q2)
OR-rule: P(Q1 ∨ Q2) =1 – (1 – P(Q1))(1 – P(Q2))
• If Q[C1/x], Q[C2/x], … are independent
∀-Rule: P(∀z Q) = ΠC ∈Domain P(Q[C/z])
∃-Rule: P(∃z Q) = 1 – ΠC ∈Domain (1– P(Q[C/z])
• Inclusion/Exclusion formula:
P(Q1 ∨ Q2) = P(Q1) + P(Q2)- P(Q1 ∧ Q2)
P(Q1 ∧ Q2) = P(Q1) + P(Q2)- P(Q1 ∨ Q2)
• Negation: P(¬Q) = 1 – P(Q)
Example
Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))
P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))
∀-Rule
Check independence:
Smoker(Alice) ∨ ∀y Friend(Alice,y)
Smoker(Bob) ∨ ∀y Friend(Bob,y)
Example
Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))
P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))
∀-Rule
P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – P(∀y Friend(A,y))) ∨-Rule
Check independence:
Smoker(Alice) ∨ ∀y Friend(Alice,y)
Smoker(Bob) ∨ ∀y Friend(Bob,y)
Example
Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))
P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))
∀-Rule
P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – P(∀y Friend(A,y))) ∨-Rule
P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – ΠB ∈ Domain P(Friend(A,B)))
∀-Rule
Check independence:
Smoker(Alice) ∨ ∀y Friend(Alice,y)
Smoker(Bob) ∨ ∀y Friend(Bob,y)
Example
Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))
Lookup the probabilities
in the database
P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))
∀-Rule
P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – P(∀y Friend(A,y))) ∨-Rule
P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – ΠB ∈ Domain P(Friend(A,B)))
∀-Rule
Check independence:
Smoker(Alice) ∨ ∀y Friend(Alice,y)
Smoker(Bob) ∨ ∀y Friend(Bob,y)
Example
Q = ∀x∀y (Smoker(x) ∨ Friend(x,y)) =∀x (Smoker(x) ∨∀y Friend(x,y))
Lookup the probabilities
in the database Runtime = O(n2).
P(Q) = ΠA ∈ Domain P(Smoker(A) ∨ ∀y Friend(A,y))
∀-Rule
P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – P(∀y Friend(A,y))) ∨-Rule
P(Q) = ΠA ∈ Domain (1– P(Smoker(A))) × (1 – ΠB ∈ Domain P(Friend(A,B)))
∀-Rule
Check independence:
Smoker(Alice) ∨ ∀y Friend(Alice,y)
Smoker(Bob) ∨ ∀y Friend(Bob,y)
Discussion: CNF vs. DNF
Databases KR/AI
Conjunctive
Queries CQ
FO(∃, ∧) Positive Clause FO(∀, ∨)
Union of Conjunctive
Queries UCQ
FO(∃, ∧, ∨) =
∃ Positive-DNF
Positive FO FO(∀, ∧, ∨) =
∀ Positive-CNF
UCQ with “safe
negation” UCQ¬
∃ DNF First Order CNF ∀ CNF
Q = ∃x,∃y, Smoker(x)∧Friend(x,y) Q = ∀x∀y (Smoker(x) ∨ Friend(x,y))
By duality we can reduce one problem to the other:
∃x,∃y, Smoker(x) ∧ Friend(x,y) = ¬ ∀x,∀y, (¬Smoker(x) ∨ ¬Friend(x,y))
Discussion
H0= ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))
No rule applies here!
The ∀-rule does not apply, because H0[Alice/x] and H0[Bob/x] are dependent:
H0[Alice/x] = ∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y))
H0[Bob/x] = ∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Dependent
Lifted Inference Sometimes Fails
Discussion
H0= ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))
No rule applies here!
The ∀-rule does not apply, because H0[Alice/x] and H0[Bob/x] are dependent:
H0[Alice/x] = ∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y))
H0[Bob/x] = ∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Dependent
Theorem. [Dalvi‟04] Computing P(H0 | D) is #P-hard in |D|
Proof: later…
Lifted Inference Sometimes Fails
Discussion
H0= ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))
No rule applies here!
The ∀-rule does not apply, because H0[Alice/x] and H0[Bob/x] are dependent:
H0[Alice/x] = ∀y (Smoker(Alice) ∨ Friend(Alice,y) ∨ Jogger(y))
H0[Bob/x] = ∀y (Smoker(Bob) ∨ Friend(Bob,y) ∨ Jogger(y)) Dependent
Theorem. [Dalvi‟04] Computing P(H0 | D) is #P-hard in |D|
Proof: later…
Consequence: assuming PTIME ≠ #P, H0 is not liftable!
Lifted Inference Sometimes Fails
Summary
• Database D = relations
• Query Q = FO
• Query plans, query optimization
• Data complexity: fix Q, complexity f(D)
• Probabilistic DB‟s = independent tuples
• Lifted inference: simple, but fails sometimes
Next: Weighted Model Counting = Unified framework for inference
Later: Are rules complete? Yes! (sort of): Power of Lifted Inference
Outline
• Part 1: Motivation
• Part 2: Probabilistic Databases
• Part 3: Weighted Model Counting
• Part 4: Lifted Inference for WFOMC
• Part 5: The Power of Lifted Inference
• Part 6: Conclusion/Open Problems
54
Weighted Model Counting
• Model = solution to a propositional logic formula Δ
• Model counting = #SAT
Rain Cloudy Model?
T T Yes
T F No
F T Yes
F F Yes
#SAT = 3
+
Δ = (Rain ⇒ Cloudy)
Weighted Model Counting
• Model = solution to a propositional logic formula Δ
• Model counting = #SAT
Rain Cloudy Model?
T T Yes
T F No
F T Yes
F F Yes
#SAT = 3
Weight
1 * 3 = 3
0
2 * 3 = 6
2 * 5 = 10
• Weighted model counting (WMC) – Weights for assignments to variables
– Model weight is product of variable weights w(.)
+
Δ = (Rain ⇒ Cloudy)
Rain
w(R) w(¬R)
1 2
Cloudy
w(C) w(¬C)
3 5
Weighted Model Counting
• Model = solution to a propositional logic formula Δ
• Model counting = #SAT
Rain Cloudy Model?
T T Yes
T F No
F T Yes
F F Yes
#SAT = 3
Weight
1 * 3 = 3
0
2 * 3 = 6
2 * 5 = 10
WMC = 19
• Weighted model counting (WMC) – Weights for assignments to variables
– Model weight is product of variable weights w(.)
+ +
Δ = (Rain ⇒ Cloudy)
Rain
w(R) w(¬R)
1 2
Cloudy
w(C) w(¬C)
3 5
Weighted Model Counting @ UAI
• Assembly language for non-lifted inference
• Reductions to WMC for inference in – Bayesian networks [Chavira‟05, Sang‟05 , Chavira‟08]
– Factor graphs [Choi‟13]
– Relational Bayesian networks [Chavira‟06]
– Probabilistic logic programs [Fierens‟11, Fierens‟13]
– Probabilistic databases [Olteanu‟08, Jha‟13]
• State-of-the-art solvers
– Knowledge compilation (WMC → d-DNNF → AC)
Winner of the UAI’08 exact inference competition!
– DPLL search
Weighted First-Order Model Counting Model = solution to first-order logic formula Δ
Δ = ∀d (Rain(d)
⇒ Cloudy(d))
[V.d.Broeck‟11a, Gogate‟11]
Days = {Monday}
Weighted First-Order Model Counting Model = solution to first-order logic formula Δ
Rain(M) Cloudy(M) Model?
T T Yes
T F No
F T Yes
F F Yes
#SAT = 3
+
Δ = ∀d (Rain(d)
⇒ Cloudy(d))
[V.d.Broeck‟11a, Gogate‟11]
Days = {Monday}
Weighted First-Order Model Counting Model = solution to first-order logic formula Δ
Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?
T T T T Yes
T F T T No
F T T T Yes
F F T T Yes
T T T F No
T F T F No
F T T F No
F F T F No
T T F T Yes
T F F T No
F T F T Yes
F F F T Yes
T T F F Yes
T F F F No
F T F F Yes
F F F F Yes
Δ = ∀d (Rain(d)
⇒ Cloudy(d))
Days = {Monday
Tuesday}
[V.d.Broeck‟11a, Gogate‟11]
Weighted First-Order Model Counting Model = solution to first-order logic formula Δ
Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?
T T T T Yes
T F T T No
F T T T Yes
F F T T Yes
T T T F No
T F T F No
F T T F No
F F T F No
T T F T Yes
T F F T No
F T F T Yes
F F F T Yes
T T F F Yes
T F F F No
F T F F Yes
F F F F Yes
#SAT = 9
+
Δ = ∀d (Rain(d)
⇒ Cloudy(d))
Days = {Monday
Tuesday}
[V.d.Broeck‟11a, Gogate‟11]
Weighted First-Order Model Counting Model = solution to first-order logic formula Δ
Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?
T T T T Yes
T F T T No
F T T T Yes
F F T T Yes
T T T F No
T F T F No
F T T F No
F F T F No
T T F T Yes
T F F T No
F T F T Yes
F F F T Yes
T T F F Yes
T F F F No
F T F F Yes
F F F F Yes
#SAT = 9
+
Δ = ∀d (Rain(d)
⇒ Cloudy(d))
Rain
d w(R(d)) w(¬R(d))
M 1 2
T 4 1
Cloudy
d w(C(d)) w(¬C(d))
M 3 5
T 6 2
Days = {Monday
Tuesday}
[V.d.Broeck‟11a, Gogate‟11]
Weighted First-Order Model Counting Model = solution to first-order logic formula Δ
Weight
1 * 3 * 4 * 6 = 72
0
2 * 3 * 4 * 6 = 144
2 * 5 * 4 * 6 = 240
0
0
0
0
1 * 3 * 1 * 6 = 18
0
2 * 3 * 1 * 6 = 36
2 * 5 * 1 * 6 = 60
1 * 3 * 1 * 2 = 6
0
2 * 3 * 1 * 2 = 12
2 * 5 * 1 * 2 = 20
Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?
T T T T Yes
T F T T No
F T T T Yes
F F T T Yes
T T T F No
T F T F No
F T T F No
F F T F No
T T F T Yes
T F F T No
F T F T Yes
F F F T Yes
T T F F Yes
T F F F No
F T F F Yes
F F F F Yes
#SAT = 9
+
Δ = ∀d (Rain(d)
⇒ Cloudy(d))
Rain
d w(R(d)) w(¬R(d))
M 1 2
T 4 1
Cloudy
d w(C(d)) w(¬C(d))
M 3 5
T 6 2
Days = {Monday
Tuesday}
[V.d.Broeck‟11a, Gogate‟11]
Weighted First-Order Model Counting Model = solution to first-order logic formula Δ
Weight
1 * 3 * 4 * 6 = 72
0
2 * 3 * 4 * 6 = 144
2 * 5 * 4 * 6 = 240
0
0
0
0
1 * 3 * 1 * 6 = 18
0
2 * 3 * 1 * 6 = 36
2 * 5 * 1 * 6 = 60
1 * 3 * 1 * 2 = 6
0
2 * 3 * 1 * 2 = 12
2 * 5 * 1 * 2 = 20
WFOMC = 608
+
Rain(M) Cloudy(M) Rain(T) Cloudy(T) Model?
T T T T Yes
T F T T No
F T T T Yes
F F T T Yes
T T T F No
T F T F No
F T T F No
F F T F No
T T F T Yes
T F F T No
F T F T Yes
F F F T Yes
T T F F Yes
T F F F No
F T F F Yes
F F F F Yes
#SAT = 9
+
Δ = ∀d (Rain(d)
⇒ Cloudy(d))
Rain
d w(R(d)) w(¬R(d))
M 1 2
T 4 1
Cloudy
d w(C(d)) w(¬C(d))
M 3 5
T 6 2
Days = {Monday
Tuesday}
[V.d.Broeck‟11a, Gogate‟11]
Weighted First-Order
Model Counting @ UAI
• Assembly language for lifted inference
• Reduction to WFOMC for lifted inference in
– Markov logic networks [V.d.Broeck‟11a,Gogate‟11]
– Parfactor graphs [V.d.Broeck‟13a]
– Probabilistic logic programs [V.d.Broeck‟14]
– Probabilistic databases [Gribkoff‟14]
x y
A B
A C
B C
From Probabilities to Weights
Friend
x y P
A B p1
A C p2
B C p3
p1p2p3
(1-p1)p2p3
(1-p1)(1-p2)(1-p3)
x y
A C
B C
x y
A B
A C
x y
A B
B C
x y
A B x y
A C x y
B C x y
x y
A B
A C
B C
From Probabilities to Weights
Friend
x y P
A B p1
A C p2
B C p3
p1p2p3
(1-p1)p2p3
(1-p1)(1-p2)(1-p3)
x y
A C
B C
x y
A B
A C
x y
A B
B C
x y
A B x y
A C x y
B C x y
x y
A B
A C
B C
From Probabilities to Weights
Friend
x y P
A B p1
A C p2
B C p3
p1p2p3
(1-p1)p2p3
(1-p1)(1-p2)(1-p3)
x y
A C
B C
x y
A B
A C
x y
A B
B C
x y
A B x y
A C x y
B C x y
x y w(Friend(x,y)) w(¬Friend(x,y))
A B w1 = p1 w1 = 1-p1
A C w2 = p2 w2 = 1-p2
B C w3 = p3 w3 = 1-p3
A A w4 = 0 w4 = 1
A C w5 = 0 w5 = 1
… …
Also for
missing
tuples!
w1w2w3
w1w2w3
w1w2w3
Discussion
• Simple idea: replace p, 1-p by w, w
• Query computation becomes WFOMC
• To obtain a probability space, divide the weight of each world by Z = sum of weights of all worlds: Z = (w1+w1) (w2+w2) (w3+w3) …
• Why weights instead of probabilities? They can describe complex correlations (next)
Markov Logic
Capture knowledge through constraints (a.k.a. “features”):
Soft constraint,
weight = exp(3.73)
Hard constraint
∞ Smoker(x) ⇒ Person(x)
3.75 Smoker(x)∧Friend(x,y) ⇒ Smoker(y)
[Richardson‟06]
Markov Logic
Capture knowledge through constraints (a.k.a. “features”):
Soft constraint,
weight = exp(3.73)
Hard constraint
An MLN is a set of constraints (w, Γ(x)), where w=weight, Γ(x)=FO formula
∞ Smoker(x) ⇒ Person(x)
3.75 Smoker(x)∧Friend(x,y) ⇒ Smoker(y)
[Richardson‟06]
Markov Logic
Capture knowledge through constraints (a.k.a. “features”):
Soft constraint,
weight = exp(3.73)
Hard constraint
An MLN is a set of constraints (w, Γ(x)), where w=weight, Γ(x)=FO formula
Weight of a world = product of exp(w), for all MLN rules (w, Γ(x))
and grounding Γ(a) that hold in that world
∞ Smoker(x) ⇒ Person(x)
3.75 Smoker(x)∧Friend(x,y) ⇒ Smoker(y)
[Richardson‟06]
Markov Logic
Capture knowledge through constraints (a.k.a. “features”):
Soft constraint,
weight = exp(3.73)
Hard constraint
An MLN is a set of constraints (w, Γ(x)), where w=weight, Γ(x)=FO formula
Weight of a world = product of exp(w), for all MLN rules (w, Γ(x))
and grounding Γ(a) that hold in that world
∞ Smoker(x) ⇒ Person(x)
3.75 Smoker(x)∧Friend(x,y) ⇒ Smoker(y)
Probability of a world = Weight / Z
Z = sum of weights of all worlds (no longer a simple expression!)
[Richardson‟06]
Problem Statement
0.7 Actor(a) ⇒ ¬Director(a)
1.2 Director(a) ⇒ ¬WorkedFor(a,b)
1.4 InMovie(m,a) ∧ WorkedFor(a,b) ⇒ InMovie(m,b)
Actor:
Name w
Brando 2.9
Cruise 3.8
Coppola 1.1
WorkedFor:
Actor Director w
Brando Coppola 2.5
Coppola Brando 0.2
Cruise Coppola 1.7
MLN:
Database tables (if missing, then w = 1)
P(InMovie(GodFather, Brando) = ??
Given:
Compute:
Discussion
• Probabilistic databases = independence MLN = complex correlations
• To translate weights to probabilities we need to divide by Z, which often is difficult to compute
• However, we can reduce the Z-computation problem to WFOMC (next)
Z WFOMC(Δ)
1. Formula Δ
2. Weight function w(.)
Z WFOMC(Δ)
1. Formula Δ
Δ = ∧ (∞,Γ(x))∈MLN (∀x Γ(x)) If all MLN constraints are hard:
2. Weight function w(.)
Z WFOMC(Δ)
1. Formula Δ
Δ = ∧ (∞,Γ(x))∈MLN (∀x Γ(x)) If all MLN constraints are hard:
If (wi, Γi (x)) is a soft MLN constraint, then:
a) Remove (wi, Γi (x)) from the MLN
b) Add new probabilistic relation Fi(x)
c) Add hard constraint (∞, ∀x (Fi(x) ⬄ Γi (x)))
2. Weight function w(.)
Z WFOMC(Δ)
1. Formula Δ
Δ = ∧ (∞,Γ(x))∈MLN (∀x Γ(x)) If all MLN constraints are hard:
If (wi, Γi (x)) is a soft MLN constraint, then:
a) Remove (wi, Γi (x)) from the MLN
b) Add new probabilistic relation Fi(x)
c) Add hard constraint (∞, ∀x (Fi(x) ⬄ Γi (x)))
Better rewritings in
[Jha‟12],[V.d.Broeck‟14]
2. Weight function w(.)
For all constants A, relations Fi,
set w(Fi(A)) = exp(wi), w(¬Fi(A)) = 1
Z WFOMC(Δ)
1. Formula Δ
Δ = ∧ (∞,Γ(x))∈MLN (∀x Γ(x)) If all MLN constraints are hard:
If (wi, Γi (x)) is a soft MLN constraint, then:
a) Remove (wi, Γi (x)) from the MLN
b) Add new probabilistic relation Fi(x)
c) Add hard constraint (∞, ∀x (Fi(x) ⬄ Γi (x)))
Better rewritings in
[Jha‟12],[V.d.Broeck‟14]
2. Weight function w(.)
For all constants A, relations Fi,
set w(Fi(A)) = exp(wi), w(¬Fi(A)) = 1
Theorem: Z = WFOMC(Δ)
Example
1. Formula Δ
2. Weight function w(.)
Example
1. Formula Δ ∞ Smoker(x) ⇒ Person(x)
2. Weight function w(.)
Example
1. Formula Δ ∞ Smoker(x) ⇒ Person(x)
2. Weight function w(.)
Δ = ∀x (Smoker(x) ⇒ Person(x))
Example
1. Formula Δ ∞ Smoker(x) ⇒ Person(x)
3.75 Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)
2. Weight function w(.)
Δ = ∀x (Smoker(x) ⇒ Person(x))
Example
1. Formula Δ ∞ Smoker(x) ⇒ Person(x)
3.75 Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)
2. Weight function w(.)
Δ = ∀x (Smoker(x) ⇒ Person(x)) Δ = ∀x (Smoker(x) ⇒ Person(x))
∧ ∀x∀y (F(x,y) ⬄ [Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)])
Example
1. Formula Δ ∞ Smoker(x) ⇒ Person(x)
3.75 Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)
2. Weight function w(.)
F
x y w(F(x,y)) w(¬F(x,y))
A A exp(3.75) 1
A B exp(3.75) 1
A C exp(3.75) 1
B A exp(3.75) 1
. . . . . .
Note: if no tables given
for Smoker, Person, etc,
(i.e. no evidence)
then set their w = w = 1
Δ = ∀x (Smoker(x) ⇒ Person(x)) Δ = ∀x (Smoker(x) ⇒ Person(x))
∧ ∀x∀y (F(x,y) ⬄ [Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)])
Example
1. Formula Δ ∞ Smoker(x) ⇒ Person(x)
3.75 Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)
2. Weight function w(.)
F
x y w(F(x,y)) w(¬F(x,y))
A A exp(3.75) 1
A B exp(3.75) 1
A C exp(3.75) 1
B A exp(3.75) 1
. . . . . .
Note: if no tables given
for Smoker, Person, etc,
(i.e. no evidence)
then set their w = w = 1
Z = WFOMC(Δ)
Δ = ∀x (Smoker(x) ⇒ Person(x)) Δ = ∀x (Smoker(x) ⇒ Person(x))
∧ ∀x∀y (F(x,y) ⬄ [Smoker(x) ∧ Friend(x,y) ⇒ Smoker(y)])
Lessons
• Weighed Model Counting: – Unified framework for probabilistic inference tasks
– Independent variables
• Weighed FO Model Counting: – Formula described by a concise FO sentence
– Still independent variables
• MLN: – Formulas plus weights
– Correlations!
– Can be converted to WFOMC
Symmetric vs. Asymmetric
Symmetric WFOMC:
• In every relation R, all tuples have same weight
• Example: converting MLN “without evidence” into WFOMC leads to a symmetric weight function
Asymmetric WFOMC:
• Each relation R is given explicitly
• Example: Probabilistic Databases
• Example: MLN‟s plus evidence
[Gribkoff‟14]
Terminology
MLNs Prob. DBs
Random variable is a Ground atom DB Tuple
Weights w associated with Formulas DB Tuples
Typical query Q is a Single atom FO formula/SQL
Data is encoded into Evidence (Query) Distribution
Correlations induced by Model formulas Query
Model generalizes across domains? Yes No
Query generalizes across domains? No Yes
Sum of weights of worlds is 1 (normalized)? No Yes
Outline
• Part 1: Motivation
• Part 2: Probabilistic Databases
• Part 3: Weighted Model Counting
• Part 4: Lifted Inference for WFOMC
• Part 5: The Power of Lifted Inference
• Part 6: Conclusion/Open Problems
70
Informal: Exploit symmetries, Reason at first-order level, Reason about groups of objects,
Scalable inference, High-level probabilistic reasoning, etc.
A formal definition: Domain-lifted inference
Defining Lifted Inference
Polynomial in #people, #webpages, #cards
Not polynomial in #predicates, #formulas, #logical variables
Related to data complexity in databases
Inference runs in time polynomial
in the number of objects in the domain.
[V.d.Broeck‟11b, Jaeger‟12]
Informal: Exploit symmetries, Reason at first-order level, Reason about groups of objects,
Scalable inference, High-level probabilistic reasoning, etc.
A formal definition: Domain-lifted inference
Defining Lifted Inference
[V.d.Broeck‟11b, Jaeger‟12]
Informal: Exploit symmetries, Reason at first-order level, Reason about groups of objects,
Scalable inference, High-level probabilistic reasoning, etc.
A formal definition: Domain-lifted inference
Defining Lifted Inference
[V.d.Broeck‟11b, Jaeger‟12]
Lifted inference = ∃Query Plan = ∃FO Compilation
Alternative in this tutorial:
Rules for Asymmetric WFOMC
• If Δ1, Δ 2 are independent:
AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))
Rules for Asymmetric WFOMC
• If Δ1, Δ 2 are independent:
AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))
Normalization constants
(easy to compute)
Rules for Asymmetric WFOMC
• If Δ1, Δ 2 are independent:
AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))
• If Δ[c1/x], Δ[c2/x], … are independent
∀-Rule: WMC(∀z Δ) = Πc ∈Domain WMC(Δ[c/z]) ∃-Rule: WMC(∃z Δ) = Z – Πc ∈Domain (Zc-WMC(Δ[c/z])
Normalization constants
(easy to compute)
Rules for Asymmetric WFOMC
• If Δ1, Δ 2 are independent:
AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))
• If Δ[c1/x], Δ[c2/x], … are independent
∀-Rule: WMC(∀z Δ) = Πc ∈Domain WMC(Δ[c/z]) ∃-Rule: WMC(∃z Δ) = Z – Πc ∈Domain (Zc-WMC(Δ[c/z])
• Inclusion/Exclusion formula: WMC(Δ1 ∨ Δ2) = WMC(Δ1) + WMC(Δ2) - WMC(Δ1 ∧ Δ2)
WMC(Δ1 ∧ Δ2) = WMC(Δ1) + WMC(Δ2) - WMC(Δ1 ∨ Δ2)
Normalization constants
(easy to compute)
Rules for Asymmetric WFOMC
• If Δ1, Δ 2 are independent:
AND-rule: WMC(Δ1 ∧ Δ2) = WMC(Δ 1) * WMC(Δ 2) OR-rule: WMC(Δ1 ∨ Δ2) = Z - (Z1-WMC(Δ1)) * (Z2-WMC(Δ2))
• If Δ[c1/x], Δ[c2/x], … are independent
∀-Rule: WMC(∀z Δ) = Πc ∈Domain WMC(Δ[c/z]) ∃-Rule: WMC(∃z Δ) = Z – Πc ∈Domain (Zc-WMC(Δ[c/z])
• Inclusion/Exclusion formula: WMC(Δ1 ∨ Δ2) = WMC(Δ1) + WMC(Δ2) - WMC(Δ1 ∧ Δ2)
WMC(Δ1 ∧ Δ2) = WMC(Δ1) + WMC(Δ2) - WMC(Δ1 ∨ Δ2)
• Negation: WMC(¬Δ) = Z - WMC(Δ)
Normalization constants
(easy to compute)
Symmetric WFOMC Rules
• Simplifications:
If Δ[c1/x], Δ[c2/x], … are independent
∀-Rule: WMC(∀z Δ) = WMC(Δ[c1/z])|Domain|
∃-Rule: WMC(∃z Δ) = Z – (Zc1-WMC(Δ[c1/z])|Domain|
• A powerful new inference rule: atom counting
Only possible with symmetric weights
Intuition: Remove unary relations
[V.d.Broeck‟11a]
Symmetric WFOMC Rules
• Simplifications:
If Δ[c1/x], Δ[c2/x], … are independent
∀-Rule: WMC(∀z Δ) = WMC(Δ[c1/z])|Domain|
∃-Rule: WMC(∃z Δ) = Z – (Zc1-WMC(Δ[c1/z])|Domain|
The workhorse of
Symmetric WFOMC
• A powerful new inference rule: atom counting
Only possible with symmetric weights
Intuition: Remove unary relations
[V.d.Broeck‟11a]
Symmetric WFOMC Rules: Example
• FO-Model Counting: w(R) = w(¬R) = 1
• Apply inference rules backwards (step 4-3-2-1)
Symmetric WFOMC Rules: Example
4.
• FO-Model Counting: w(R) = w(¬R) = 1
• Apply inference rules backwards (step 4-3-2-1)
Δ = (Stress(Alice) ⇒ Smokes(Alice)) Domain = {Alice}
Symmetric WFOMC Rules: Example
4.
• FO-Model Counting: w(R) = w(¬R) = 1
• Apply inference rules backwards (step 4-3-2-1)
WMC(¬Stress(Alice) ∨ Smokes(Alice))) =
= Z – WMC(Stress(Alice)) × WMC(¬Smokes(Alice))
= 4 – 1 × 1 = 3 models
Δ = (Stress(Alice) ⇒ Smokes(Alice)) Domain = {Alice}
OR-rule
Symmetric WFOMC Rules: Example
4.
• FO-Model Counting: w(R) = w(¬R) = 1
• Apply inference rules backwards (step 4-3-2-1)
WMC(¬Stress(Alice) ∨ Smokes(Alice))) =
= Z – WMC(Stress(Alice)) × WMC(¬Smokes(Alice))
= 4 – 1 × 1 = 3 models
Δ = (Stress(Alice) ⇒ Smokes(Alice)) Domain = {Alice}
3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}
OR-rule
Symmetric WFOMC Rules: Example
4.
→ 3n models
• FO-Model Counting: w(R) = w(¬R) = 1
• Apply inference rules backwards (step 4-3-2-1)
WMC(¬Stress(Alice) ∨ Smokes(Alice))) =
= Z – WMC(Stress(Alice)) × WMC(¬Smokes(Alice))
= 4 – 1 × 1 = 3 models
∀-Rule
Δ = (Stress(Alice) ⇒ Smokes(Alice)) Domain = {Alice}
3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}
OR-rule
Symmetric WFOMC Rules: Example
→ 3n models
3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}
Symmetric WFOMC Rules: Example
→ 3n models
3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}
2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}
Symmetric WFOMC Rules: Example
OR-Rule
→ 3n models
3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}
2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}
WMC(Δ ) = WMC(¬ Female ∨ ∀y, (ParentOf(y) ⇒ MotherOf(y)))
= 2 * 2n * 2n - (2 – 1) * (2n * 2n – WMC(∀y, (ParentOf(y) ⇒ MotherOf(y))))
= 2 * 4n – (4n – 3n)
→ 3n + 4n models
Symmetric WFOMC Rules: Example
OR-Rule
→ 3n models
3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}
2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y)) D = {n people}
WMC(Δ ) = WMC(¬ Female ∨ ∀y, (ParentOf(y) ⇒ MotherOf(y)))
= 2 * 2n * 2n - (2 – 1) * (2n * 2n – WMC(∀y, (ParentOf(y) ⇒ MotherOf(y))))
= 2 * 4n – (4n – 3n)
→ 3n + 4n models
Symmetric WFOMC Rules: Example
OR-Rule
→ 3n models
3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}
2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y))
1. Δ = ∀x,y, (ParentOf(x,y) ∧ Female(x) ⇒ MotherOf(x,y)) D = {n people}
D = {n people}
WMC(Δ ) = WMC(¬ Female ∨ ∀y, (ParentOf(y) ⇒ MotherOf(y)))
= 2 * 2n * 2n - (2 – 1) * (2n * 2n – WMC(∀y, (ParentOf(y) ⇒ MotherOf(y))))
= 2 * 4n – (4n – 3n)
→ 3n + 4n models
→ (3n + 4n)n models
Symmetric WFOMC Rules: Example
∀-Rule
OR-Rule
→ 3n models
3. Δ = ∀x, (Stress(x) ⇒ Smokes(x)) Domain = {n people}
2. Δ = ∀y, (ParentOf(y) ∧ Female ⇒ MotherOf(y))
1. Δ = ∀x,y, (ParentOf(x,y) ∧ Female(x) ⇒ MotherOf(x,y)) D = {n people}
D = {n people}
WMC(Δ ) = WMC(¬ Female ∨ ∀y, (ParentOf(y) ⇒ MotherOf(y)))
= 2 * 2n * 2n - (2 – 1) * (2n * 2n – WMC(∀y, (ParentOf(y) ⇒ MotherOf(y))))
= 2 * 4n – (4n – 3n)
Atom Counting: Example
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
→ models
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
If we know that there are k smokers?
→ models
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
If we know that there are k smokers?
→ models
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
→ models
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
If we know that there are k smokers?
In total…
→ models
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
→ models
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Atom Counting: Example
If we know precisely who smokes, and there are k smokers?
k
n-k
k
n-k
If we know that there are k smokers?
In total…
→ models
Database:
Smokes(Alice) = 1
Smokes(Bob) = 0
Smokes(Charlie) = 0
Smokes(Dave) = 1
Smokes(Eve) = 0
...
→ models
→ models
Smokes Smokes Friends
Δ = ∀x,y, (Smokes(x) ∧ Friends(x,y) ⇒ Smokes(y)) Domain = {n people}
Augment Rules with Logical Rewritings
Augment Rules with Logical Rewritings
1. Remove constants (shattering)
Δ = ∀x (Friend(Alice, x) ∨ Friend(x, Bob))
Augment Rules with Logical Rewritings
1. Remove constants (shattering)
Δ = ∀x (Friend(Alice, x) ∨ Friend(x, Bob))
Δ = ∀x (F1(x) ∨ F2(x)) ∧ (F3 ∨ F4) ∧ (F4 ∨ F5)
F1(x) = Friend(Alice,x)
F2(x) = Friend(x,Bob)
F3 = Friend(Alice, Alice)
F4 = Friend(Alice,Bob)
F5 = Friend(Bob,Bob)
Augment Rules with Logical Rewritings
1. Remove constants (shattering)
Δ = ∀x (Friend(Alice, x) ∨ Friend(x, Bob))
Δ = ∀x (F1(x) ∨ F2(x)) ∧ (F3 ∨ F4) ∧ (F4 ∨ F5)
F1(x) = Friend(Alice,x)
F2(x) = Friend(x,Bob)
F3 = Friend(Alice, Alice)
F4 = Friend(Alice,Bob)
F5 = Friend(Bob,Bob)
2. “Rank” variables (= occur in the same order in each atom)
Δ = (Friend(x,y) ∨ Enemy(x,y)) ∧ (Friend(x,y) ∨ Enemy(y,x)) Wrong order
Augment Rules with Logical Rewritings
1. Remove constants (shattering)
Δ = ∀x (Friend(Alice, x) ∨ Friend(x, Bob))
Δ = ∀x (F1(x) ∨ F2(x)) ∧ (F3 ∨ F4) ∧ (F4 ∨ F5)
F1(x) = Friend(Alice,x)
F2(x) = Friend(x,Bob)
F3 = Friend(Alice, Alice)
F4 = Friend(Alice,Bob)
F5 = Friend(Bob,Bob)
2. “Rank” variables (= occur in the same order in each atom)
Δ = (Friend(x,y) ∨ Enemy(x,y)) ∧ (Friend(x,y) ∨ Enemy(y,x)) Wrong order
F1(u,v) = Friend(u,v),u<v
F2(u) = Friend(u,u) F3(u,v) = Friend(v,u),v<u
E1(u,v) = Friend(u,v),u<v
E2(u) = Friend(u,u) E3(u,v) = Friend(v,u),v<u
Δ = (F1(x,y) ∨ E1(x,y)) ∧ (F1(x,y) ∨ E3(x,y))
∧ (F2(x) ∨ E2(x))
∧ (F3(x,y) ∨ E3(x,y)) ∧ (F3(x,y) ∨ E1(x,y))
Augment Rules with Logical Rewritings
3. Perform Resolution [Gribkoff‟14]
See UAI Poster
on Saturday!
4. Skolemization [V.d.Broeck‟14]
Δ = ∀p, ∃c, Card(p,c)
Mix ∀/∃ in encodings of MLNs with quantifiers and probabilistic programs
Input: Mix ∀/∃ Output: Only ∀
Δ = ∀x∀y (R(x) ∨¬S(x,y)) ∧ ∀x∀y (S(x,y) ∨ T(y)) Rules stuck…
Resolution: Δ ∧ ∀x∀y (R(x) ∨T(y)) Now apply I/E!
Skolemization: Example
Δ = ∀p, ∃c, Card(p,c)
[V.d.Boeck‟14]
Skolemization
Skolemization: Example
Δ = ∀p, ∃c, Card(p,c)
Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)
[V.d.Boeck‟14]
w(S) = 1 and w(¬S) = -1
Skolemization
Skolem predicate
Skolemization: Example
Δ = ∀p, ∃c, Card(p,c)
Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)
[V.d.Boeck‟14]
∃c, Card(p,c) = true
Consider one position p:
w(S) = 1 and w(¬S) = -1
∃c, Card(p,c) = false
Skolemization
Skolem predicate
Skolemization: Example
Δ = ∀p, ∃c, Card(p,c)
Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)
[V.d.Boeck‟14]
∃c, Card(p,c) = true
S(p) = true Also model of Δ, weight * 1
Consider one position p:
w(S) = 1 and w(¬S) = -1
∃c, Card(p,c) = false
Skolemization
Skolem predicate
Skolemization: Example
Δ = ∀p, ∃c, Card(p,c)
Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)
[V.d.Boeck‟14]
∃c, Card(p,c) = true
S(p) = true Also model of Δ, weight * 1
Consider one position p:
w(S) = 1 and w(¬S) = -1
∃c, Card(p,c) = false
S(p) = true No model of Δ, weight * 1
S(p) = false No model of Δ, weight * -1
Extra models Cancel out
Skolemization
Skolem predicate
Skolemization: Example
Δ = ∀p, ∃c, Card(p,c)
Δ‟ = ∀p, ∀c, Card(p,c) ⇒ S(p)
[V.d.Boeck‟14]
...
Playing Cards Revisited
Let us automate this:
Relational model
Lifted probabilistic inference algorithm
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Playing Cards Revisited
Tutorial UAI 2014 82
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Playing Cards Revisited
Tutorial UAI 2014 82
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Skolemization
∀p, ∀c, Card(p,c) ⇒ S1(p)
∀c, ∀p, Card(p,c) ⇒ S2(c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Playing Cards Revisited
Tutorial UAI 2014 82
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Skolemization
∀p, ∀c, Card(p,c) ⇒ S1(p)
∀c, ∀p, Card(p,c) ⇒ S2(c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Playing Cards Revisited
w(S1) = 1 and w(¬S1) = -1
w(S2) = 1 and w(¬S2) = -1
Tutorial UAI 2014 82
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Skolemization
∀p, ∀c, Card(p,c) ⇒ S1(p)
∀c, ∀p, Card(p,c) ⇒ S2(c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Playing Cards Revisited
w(S1) = 1 and w(¬S1) = -1
w(S2) = 1 and w(¬S2) = -1
Tutorial UAI 2014 82
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Skolemization
Atom counting
∀p, ∀c, Card(p,c) ⇒ S1(p)
∀c, ∀p, Card(p,c) ⇒ S2(c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Playing Cards Revisited
w(S1) = 1 and w(¬S1) = -1
w(S2) = 1 and w(¬S2) = -1
Tutorial UAI 2014 82
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Skolemization
Atom counting
∀p, ∀c, Card(p,c) ⇒ S1(p)
∀c, ∀p, Card(p,c) ⇒ S2(c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Playing Cards Revisited
w(S1) = 1 and w(¬S1) = -1
w(S2) = 1 and w(¬S2) = -1
Tutorial UAI 2014 82
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
∀-Rule
Skolemization
Atom counting
∀p, ∀c, Card(p,c) ⇒ S1(p)
∀c, ∀p, Card(p,c) ⇒ S2(c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Playing Cards Revisited
w(S1) = 1 and w(¬S1) = -1
w(S2) = 1 and w(¬S2) = -1
Tutorial UAI 2014 82
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
∀c, ∀c‟, Card(c) ∧ Card(c‟) ⇒ c = c‟
∀-Rule
Skolemization
Atom counting
∀p, ∀c, Card(p,c) ⇒ S1(p)
∀c, ∀p, Card(p,c) ⇒ S2(c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
Playing Cards Revisited
w(S1) = 1 and w(¬S1) = -1
…
w(S2) = 1 and w(¬S2) = -1
Tutorial UAI 2014 82
∀p, ∃c, Card(p,c)
∀c, ∃p, Card(p,c)
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
∀p, ∀c, ∀c‟, Card(p,c) ∧ Card(p,c‟) ⇒ c = c‟
∀c, ∀c‟, Card(c) ∧ Card(c‟) ⇒ c = c‟
∀-Rule
Skolemization
Atom counting
...
Playing Cards Revisited
Let us automate this:
Lifted probabilistic inference algorithm
Computed in time polynomial in n Tutorial UAI 2014 83
Summary Lifted Inference
• By definition: PTIME data complexity
Also: ∃ FO compilation = ∃ Query Plan
• However: only works for “liftable” queries
• The rules:
– AND/OR-rules, ∀/∃-rules, I/E
(inclusion/exclusion), Atom Counting
– Deceptively simple: the only surprising rules
are I/E and atom counting
Tutorial UAI 2014 84
Next: will show that lifted inference is provably
more powerful than grounded inference
Outline
• Part 1: Motivation
• Part 2: Probabilistic Databases
• Part 3: Weighted Model Counting
• Part 4: Lifted Inference for WFOMC
• Part 5: The Power of Lifted Inference
• Part 6: Conclusion/Open Problems
Tutorial UAI 2014 85
Two Questions
• Q1: Are the lifted rules complete?
– We know that they get stuck on some queries
– Do we need to add more rules?
• Q2: Are lifted rules stronger than grounded?
– Some lifted rules easily correspond to operations
on grounded formulas (e.g. Independent-AND)
– Can we simulate every lifted inference directly on
the grounded formula?
Tutorial UAI 2014 86
Two Questions
• Q1: Are the lifted rules complete?
– We know that they get stuck on some queries
– Do we need to add more rules?
• Q2: Are lifted rules stronger than grounded?
– Some lifted rules easily correspond to operations
on grounded formulas (e.g. Independent-AND)
– Can we simulate every lifted inference directly on
the grounded formula?
Complete for Positive CNF-FO, for UCQ
Symmetric: yes (grounded inference ignores symmetries)
Asymmetric: Strictly stronger than Decision-DNNF & DPLL-based algorithms
1. Are the Lifted Rules Complete?
We use complexity classes
• Inference rules: PTIME data complexity
• Some queries: #P-hard data complexity
Dichotomy Theorem for Positive CNF-FO:
• If lifted rules succeed, then query in PTIME
• If lifted rules fail, then query is #P-hard
Implies lifted rules are complete for Positive CNF-FO
Tutorial UAI 2014 88 Will show in two steps: Small and Big Dichotomy Theorem
NP v.s. #P
• SAT = Satisfiability Problem
• SAT is NP-complete [Cook‟71]
• NP = decision problems polynomial-time, nondeterministic TM
• #SAT = model counting
• #SAT is #P-complete [Valiant‟79]
• #P = numerical functions polynomial-time, nondeterministic TM, answer = #accepting computations
Note: it would be wrong to say “#SAT is NP-complete”
A Simple Propositional Formula
that is Hard
A Positive, Partitioned 2CNF Formula is a formula of the form:
F = ∧(i,j) ∈ E (xi yj)
Where E = the edge set of a bipartite graph
Tutorial UAI 2014 90
A Simple Propositional Formula
that is Hard
1
2
1
2
3
E:
A Positive, Partitioned 2CNF Formula is a formula of the form:
F = ∧(i,j) ∈ E (xi yj)
Where E = the edge set of a bipartite graph
F = (x1 y1) ∧ (x2 y1) ∧ (x2 y3) ∧ (x1 y3) ∧ (x2 y2)
Tutorial UAI 2014 91
A Simple Propositional Formula
that is Hard
Theorem [Provan‟83] #SAT for PP2CNF is #P-hard
1
2
1
2
3
E:
A Positive, Partitioned 2CNF Formula is a formula of the form:
F = ∧(i,j) ∈ E (xi yj)
Where E = the edge set of a bipartite graph
F = (x1 y1) ∧ (x2 y1) ∧ (x2 y3) ∧ (x1 y3) ∧ (x2 y2)
Tutorial UAI 2014 92
A Query That is #P-Hard
H0= ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))
Theorem. Computing P(H0 | D) is #P-hard in |D|
[Dalvi‟04]
Proof: Reduction from PP2CNF. Given a PP2CNF F defined by edge relation E, set:
P(Friend(a,b)) = 1 if (a,b) ∈ E
P(Friend(a,b)) = 0 if (a,b) ∉ E
Then the grounding of H0 is: ∧(i,j) ∈ E (Smoker(i) Jogger(j)) = F Hence, P(H0 | D) = P(F)
Lesson: no lifted inference rules will ever compute H0
Hierarchical Clause
at(x) = set of atoms containing the variable x
x
y
Non-hierarchical
Smoker Friend
x
z
Hierarchical
y
Q = (Smoker(x,y)∨Friend(x,z)) H0=Smoker(x)∨Friend(x,y)∨Jogger(y)
Definition A clause Q is hierarchical if forall variables x, y:
at(x) at(y) or at(x) ⊇ at(y) or at(x) at(y) =
= ∀x[∀y Smoker(x,y)]∨[∀z Friend(x,z)]
Smoker Friend Jogger
Small Dichotomy Theorem
Theorem [Dalvi‟04] Dichotomy:
• If Q is hierarchical, then Q is liftable (PTIME data complexity)
• If Q is not hierarchical, Q is #P-hard
Note: checking “Q is hierarchical” is in AC0 (expression complexity)
Definition A clause Q is hierarchical if forall variables x, y:
at(x) at(y) or at(x) ⊇ at(y) or at(x) at(y) =
Let Q be a single clause, w/o repeating relation symbols
And, moreover, the
OR-rule and ∀-rule are complete.
Proof Hierarchical PTIME
Proof Hierarchical PTIME
∀-Rule:
P(∀x Q) = Πa P(Q[a/x])
x
Case 1:
Q=
Proof Hierarchical PTIME
∀-Rule:
P(∀x Q) = Πa P(Q[a/x])
x
Case 1:
Q=
∨-Rule:
P(Q) =1 – (1– P(Q1))(1– P(Q2))
Q1 ∨ Q2 Q=
Case 2:
Proof Hierarchical PTIME
Non-hierarchical #P-hard
Reduction from H0:
Q = … R(x, …) ∨ S(x,y,…) ∨ T(y,…), …
x y
R S T
∀-Rule:
P(∀x Q) = Πa P(Q[a/x])
x
Case 1:
Q=
∨-Rule:
P(Q) =1 – (1– P(Q1))(1– P(Q2))
Q1 ∨ Q2 Q=
Case 2:
The Big Dichotomy Theorem
• For Positive CNF-FO the rules are not complete as stated!
• Instead we will revise inclusion/exclusion
• After the revision, the rules are complete
• We start with some non-liftable queries…
Tutorial UAI 2014 97
The Non-liftable Queries Hk
H0= R(x)∨S(x,y)∨T(y)
H1= [R(x0)∨S(x0,y0)] ∧ [S(x1,y1)∨T(y1)]
The Non-liftable Queries Hk
H0= R(x)∨S(x,y)∨T(y)
H2= [R(x0)∨S1(x0,y0)] ∧ [S1(x1,y1)∨S2(x1,y1)] ∧ [S2(x2,y2)∨T(y2)]
H1= [R(x0)∨S(x0,y0)] ∧ [S(x1,y1)∨T(y1)]
The Non-liftable Queries Hk
H0= R(x)∨S(x,y)∨T(y)
H2= [R(x0)∨S1(x0,y0)] ∧ [S1(x1,y1)∨S2(x1,y1)] ∧ [S2(x2,y2)∨T(y2)]
H1= [R(x0)∨S(x0,y0)] ∧ [S(x1,y1)∨T(y1)]
. . .
H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]
The Non-liftable Queries Hk
H0= R(x)∨S(x,y)∨T(y)
H2= [R(x0)∨S1(x0,y0)] ∧ [S1(x1,y1)∨S2(x1,y1)] ∧ [S2(x2,y2)∨T(y2)]
H1= [R(x0)∨S(x0,y0)] ∧ [S(x1,y1)∨T(y1)]
. . .
H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]
Theorem. [Dalvi‟12] For every k, the query Hk is #P-hard
So far, not very interesting…
The Query QW
QW =
[∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x2∀y2(S2(x2,y2)∨S3(x2,y2))]/* Q1 */
∨ [∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3)] /* Q2 */
∨ [∀x1∀y1(S1(x1,y1)∨S2(x1,y1)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3))] /* Q3 */
QW is a Boolean
combination
of clauses in H3
H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]
The Query QW
QW =
[∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x2∀y2(S2(x2,y2)∨S3(x2,y2))]/* Q1 */
∨ [∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3)] /* Q2 */
∨ [∀x1∀y1(S1(x1,y1)∨S2(x1,y1)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3))] /* Q3 */
QW is a Boolean
combination
of clauses in H3
QW is liftable BUT we need to use cancellations!
H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]
The Query QW
QW =
[∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x2∀y2(S2(x2,y2)∨S3(x2,y2))]/* Q1 */
∨ [∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3)] /* Q2 */
∨ [∀x1∀y1(S1(x1,y1)∨S2(x1,y1)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3))] /* Q3 */
QW is a Boolean
combination
of clauses in H3
QW is liftable BUT we need to use cancellations!
P(QW) = P(Q1) + P(Q2) + P(Q3) +
- P(Q1 ∧ Q2) - P(Q2 ∧ Q3) – P(Q1 ∧ Q3)
+ P(Q1 ∧ Q2 ∧ Q3)
Also = H3
= H3 (hard !)
Liftable
H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]
The Query QW
QW =
[∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x2∀y2(S2(x2,y2)∨S3(x2,y2))]/* Q1 */
∨ [∀x0∀y0(R(x0)∨S1(x0,y0)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3)] /* Q2 */
∨ [∀x1∀y1(S1(x1,y1)∨S2(x1,y1)) ∧ ∀x3∀y3(S3(x3,y3)∨T(y3))] /* Q3 */
The two hard queries cancel out, and what remains is Liftable
QW is a Boolean
combination
of clauses in H3
QW is liftable BUT we need to use cancellations!
P(QW) = P(Q1) + P(Q2) + P(Q3) +
- P(Q1 ∧ Q2) - P(Q2 ∧ Q3) – P(Q1 ∧ Q3)
+ P(Q1 ∧ Q2 ∧ Q3)
Also = H3
= H3 (hard !)
Liftable
H3= [R(x0)∨S1(x0,y0)]∧[S1(x1,y1)∨S2(x1,y1)]∧[S2(x2,y2)∨S3(x2,y2)]∧[S3(x3,y3)∨T(y3)]
Cancellations?
• Cancellations in the inclusion/exclusion
formula are critical! If we fail to do them,
then the rules get stuck
• The mathematical concept that explains
which terms cancel out is the Mobius‟
function (next)
Tutorial UAI 2014 100
August Ferdinand Möbius
1790-1868
• Möbius strip
• Möbius function μ in
number theory
• Generalized to lattices [Stanley‟97]
• And to lifted inference!
101
1
^
The Lattice of a Query Definition. The lattice of Q = Q1 ∧ Q2 ∧ … is:
• Elements are terms of inclusion/exclusion;
• Order is logical implication
1
^
1
^
The Lattice of a Query
Q1 Q2 Q3
Q2∧Q3 Q1∧Q2
Q1∧Q2∧Q3 (= Q1∧ Q3)
Definition. The lattice of Q = Q1 ∧ Q2 ∧ … is:
• Elements are terms of inclusion/exclusion;
• Order is logical implication
Nodes Liftable,
Nodes #P hard.
QW =
[ (R(x0)∨S1(x0,y0)) ∧ (S2(x2,y2)∨S3(x2,y2))]/* Q1 */
∨ [ (R(x0)∨S1(x0,y0)) ∧ (S3(x3,y3)∨T(y3)] /* Q2 */
∨ [ (S1(x1,y1)∨S2(x1,y1)) ∧ (S3(x3,y3)∨T(y3))] /* Q3 */
1
^
The Möbius‟ Function
103
Def. The Möbius function:
μ( , ) = 1
μ(u, ) = - Σu < v ≤ μ(v, ) 1
^ 1
^
1
^ 1
^ 1
^
1
^
1
^
Möbius’ Inversion Formula:
P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1
^ 1
^
The Möbius‟ Function
103
1
Def. The Möbius function:
μ( , ) = 1
μ(u, ) = - Σu < v ≤ μ(v, ) 1
^ 1
^
1
^ 1
^ 1
^
1
^
1
^
Möbius’ Inversion Formula:
P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1
^ 1
^
The Möbius‟ Function
103
1
-1 -1 -1
Def. The Möbius function:
μ( , ) = 1
μ(u, ) = - Σu < v ≤ μ(v, ) 1
^ 1
^
1
^ 1
^ 1
^
1
^
1
^
Möbius’ Inversion Formula:
P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1
^ 1
^
The Möbius‟ Function
103
1
-1 -1 -1
1 1
Def. The Möbius function:
μ( , ) = 1
μ(u, ) = - Σu < v ≤ μ(v, ) 1
^ 1
^
1
^ 1
^ 1
^
1
^
1
^
Möbius’ Inversion Formula:
P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1
^ 1
^
The Möbius‟ Function
103
1
-1 -1 -1
1 1
0
Def. The Möbius function:
μ( , ) = 1
μ(u, ) = - Σu < v ≤ μ(v, ) 1
^ 1
^
1
^ 1
^ 1
^
1
^
1
^
Möbius’ Inversion Formula:
P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1
^ 1
^
The Möbius‟ Function
-1 -1 -1
1
103
1
-1 -1 -1
1 1
0
Def. The Möbius function:
μ( , ) = 1
μ(u, ) = - Σu < v ≤ μ(v, ) 1
^ 1
^
1
^ 1
^ 1
^
1
^
1
^
Möbius’ Inversion Formula:
P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1
^ 1
^
The Möbius‟ Function
-1 -1 -1
2
1
103
1
-1 -1 -1
1 1
0
Def. The Möbius function:
μ( , ) = 1
μ(u, ) = - Σu < v ≤ μ(v, ) 1
^ 1
^
1
^ 1
^ 1
^
1
^
1
^
Möbius’ Inversion Formula:
P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1
^ 1
^
The Möbius‟ Function
-1 -1 -1
2
1
New Rule
Inclusion/Exclusion
Mobius‟ Inversion Formula 103
1
-1 -1 -1
1 1
0
Def. The Möbius function:
μ( , ) = 1
μ(u, ) = - Σu < v ≤ μ(v, ) 1
^ 1
^
1
^ 1
^ 1
^
1
^
1
^
Möbius’ Inversion Formula:
P(Q) = - Σ Qi < μ(Qi, ) P(Qi) 1
^ 1
^
The Dichotomy Theorem
104
Dichotomy Theorem [Dalvi‟12] Fix a Positive-CNF Q.
1. If Q is liftable, then P(Q) is in PTIME (obviously)
2. If Q is not liftable, then P(Q) is #P-complete
Note 1: for the theorem to hold one must replace
the inclusion/exclusion rule with the Mobius‟ rule
Note 2: Original formulation for UCQ; holds for
Positive CNF-FO by duality. Tutorial UAI 2014
Discussion
• This answers Question 1: lifted inference
rules are complete for Positive CNF-FO
• Beyond Positive CNF-FO?
– See poster on Saturday
– Take-away: rules+resolution conjectured to be
complete for CNF-FO; strong evidence that
no complete rules exists for FO
Tutorial UAI 2014 105
2. Are lifted rules stronger
than grounded?
Symmetric WFOMC:
Grounded WMC does not
use symmetries.
Query H0 is:
• Liftable on symmetric,
• #P-hard on asymmetric
Asymmetric WFOMC
Query QW is in PTIME:
• DPLL-based search
has exponential time
• Decision-DNNF have
exponential size
Alternative to lifting:
1. Ground the FO sentence
2. Do WMC on the propositional formula
Symmetric WFOMC
H0 = ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))
We have seen that H0 is #P-hard (over asymmetric spaces!)
But over symmetric spaces it can be lifted:
Lifted inference is strictly more powerful than grounded inference
Symmetric WFOMC
H0 = ∀x∀y (Smoker(x)∨Friend(x,y)∨Jogger(y))
We have seen that H0 is #P-hard (over asymmetric spaces!)
But over symmetric spaces it can be lifted:
Lifted inference is strictly more powerful than grounded inference
Theorem [V.d.Broeck‟14]: every query in FO2 is liftable over symmetric spaces
FO2 includes H0, and some quite complex complex sentences like:
Q = ∀x∀y∀z∀u (Friend(x,y) ∨Enemy(y,z) ∨Friend(z,u) ∨Enemy(u,v))
= ∀x∀y (Friend(x,y) ∨ ∀x (Enemy(y,x) ∨ ∀y (Friend(x,y) ∨ ∀x (Enemy(y,x)))))
Asymmetric WFOMC
• Lifted inference does no longer have a
fundamental reason to be stronger than
grounded WMC
• However, we can prove that lifted
inference is stronger than WMC algorithms
used in practice today:
– DPLL search (with caching; with components)
– Decision-DNNF
Basic DPLL
Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]
Assume uniform distribution for simplicity
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
Basic DPLL
Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]
F: (xy) (xuw) (xuwz)
Assume uniform distribution for simplicity
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
Basic DPLL
Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]
x
F: (xy) (xuw) (xuwz)
Assume uniform distribution for simplicity
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
Basic DPLL
Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]
x
0 1
F: (xy) (xuw) (xuwz)
uwz y(uw)
Assume uniform distribution for simplicity
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
Basic DPLL
Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]
x
z
0
y
1
u 0
1
1
0
w
1
0
0
1
1 0
u 1 1
1
0
w
1
0
0
1
1 0
F: (xy) (xuw) (xuwz)
uwz
uw
w
uw
y(uw)
w
Assume uniform distribution for simplicity
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
Basic DPLL
Davis, Putnam, Logemann, Loveland [Davis‟60, ‟62]
x
z
0
y
1
u 0
1
1
0
w
1
0
0
1
1 0
u 1 1
1
0
w
1
0
0
1
1 0
1 0 1 0
0 1
1 1
F: (xy) (xuw) (xuwz)
uwz
uw
w
uw
½
¾ ¾
y(uw) 3/8 7/8
5/8
w ½
Assume uniform distribution for simplicity
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
Basic DPLL x
z
0
y
1
u 0
1
1
0
w
1
0
0
1
1 0
u 1 1
1
0
w
1
0
0
1
1 0
F: (xy) (xuw) (xuwz)
The trace is a
Decision-Tree for F
Caching
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
x
z
0
y
1
u 0
1
1
0
w
1
0
0
1
1 0
u 1 1
1
0
w
1
0
0
1
1 0
F: (xy) (xuw) (xuwz)
uwz
uw
w
uw
y(uw)
w
// DPLL with caching:
Cache F and P(F);
look it up before computing
Caching
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
x
z
0
y
1
u 0
1
1
0
w
1
0
0
1
1 0
u 1 1
1
0
w
1
0
0
1
1 0
F: (xy) (xuw) (xuwz)
uwz
uw
w
uw
y(uw)
w
// DPLL with caching:
Cache F and P(F);
look it up before computing
Caching
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
x
z
0
y
1
u 0
1
1
0
w
1
0
0
1
1 0
u 1 1
1
0
w
1
0
0
1
1 0
F: (xy) (xuw) (xuwz)
uwz
uw
w
uw
y(uw)
w
// DPLL with caching:
Cache F and P(F);
look it up before computing
Caching
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
x
z
0
y
1
0
1 0
u 1 1
1
0
w
1
0
0
1
1 0
F: (xy) (xuw) (xuwz)
uwz
uw
w
y(uw)
// DPLL with caching:
Cache F and P(F);
look it up before computing
Caching & FBDDs
x
z
0
y
1
0
1 0
u 1 1
1
0
w
1
0
0
1
1 0
The trace is a decision-DAG for F
Caching & FBDDs
x
z
0
y
1
0
1 0
u 1 1
1
0
w
1
0
0
1
1 0
The trace is a decision-DAG for F
FBDD (Free Binary Decision Diagram)
or
ROBP (Read Once Branching Program)
Caching & FBDDs
x
z
0
y
1
0
1 0
u 1 1
1
0
w
1
0
0
1
1 0
The trace is a decision-DAG for F
FBDD (Free Binary Decision Diagram)
or
ROBP (Read Once Branching Program)
• Every variable is tested at most once
on any path
Caching & FBDDs
x
z
0
y
1
0
1 0
u 1 1
1
0
w
1
0
0
1
1 0
The trace is a decision-DAG for F
FBDD (Free Binary Decision Diagram)
or
ROBP (Read Once Branching Program)
• Every variable is tested at most once
on any path
• All internal nodes are decision-nodes
Decision-Node
Component Analysis
x
z
0
y
1
0
1 0
u 1 1
1
0
w
1
0
0
1
1 0
F: (xy) (xuw) (xuwz)
uwz
uw
w
y (uw)
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
// DPLL with component analysis
(and caching):
if F = G H
where G and H have disjoint set
of variables
P(F) = P(G) × P(H)
Component Analysis
x
z
0
y
1
0
1 0
u 1 1
1
0
w
1
0
0
1
1 0
F: (xy) (xuw) (xuwz)
uwz
uw
w
y (uw)
// basic DPLL:
Function P(F):
if F = false then return 0
if F = true then return 1
select a variable x, return
½ P(FX=0) + ½ P(FX=1)
// DPLL with component analysis
(and caching):
if F = G H
where G and H have disjoint set
of variables
P(F) = P(G) × P(H)
Components & Decision-DNNF
x
z
1 u 1
1
1
0
w
1
0
0
1
1 0
uwz
w
y (uw)
0
y
1
0
F: (xy) (xuw) (xuwz)
y
0 1
uw
Components & Decision-DNNF
x
z
1 u 1
1
1
0
w
1
0
0
1
1 0
0
y
1
0
The trace is a Decision-DNNF [Huang‟05, ‟07]
FBDD + “Decomposable” AND-nodes
(Two sub-DAGs do not share variables)
Decision Node
0 1 AND Node
New Queries From Hk
Consider the k+1 clauses that form Hk
Hk0= ∀x0∀y0 (R(x0)∨S1(x0,y0))
Hk1= ∀x1∀y1 (S1(x1,y1)∨S2(x1,y1))
Hk2= ∀x2∀y2 (S2(x2,y2)∨S3(x2,y2))
Hkk= ∀xk∀yk (Sk(xk,yk)∨T(yk))
…
Tutorial UAI 2014 115
Asymmetric WFOMC
Theorem. [Beame‟14] If the query Q is any Boolean combination
of the formulas Hk0, …, Hkk then:
• Any DPLL-based algorithm takes time Ω(2√n) time
• Any Decision-DNNF has Ω(2√n) nodes.
Tutorial UAI 2014 116
For example, QW is a Boolean combination of H30, H31, H32, H33.
Liftable (hence PTIME), yet grounded WMC takes exponential time.
Discussion
• This answers question 2: there exists queries that (a) are liftable, and (b) grounded algorithms like DPLL search or Decision-DNNF run in exponential time
• Perhaps there are more powerful grounded algorithms? We don‟t know. Open problem: do d-DNNFs compute these queries in PTIME?
Tutorial UAI 2014 117
Möbius Über Alles
#P-hard
PTIME
Poly-size d-DNNF
Poly-size FBDD,
DecisionDNNF
Poly-size OBDD
Read Once
QJ
QV
QW
Q9
H0
H1
H2
?
QU
Conjectured
H3
hierarchical
Non-hierarchical
Open
[Suciu‟11]
Outline
• Part 1: Motivation
• Part 2: Probabilistic Databases
• Part 3: Weighted Model Counting
• Part 4: Lifted Inference for WFOMC
• Part 5: The Power of Lifted Inference
• Part 6: Conclusion/Open Problems
Tutorial UAI 2014 119
Summary
• Relational models = the vast majority of
data today, plus probabilistic Databases
• Weighted Model Counting = Uniform
approach to Probabilistic Inference
• Lifted Inference = really simple rules
• The Power of Lifted Inference = we can
prove that lifted inference is better
Tutorial UAI 2014 120
Lifted Algorithms (in the AI community)
Exact Probabilistic Inference First-Order Variable Elimination [Poole‟03, Braz‟05, Milch‟08, Taghipour‟13]
First-Order Knowledge Compilation [V.d.Broeck‟11a, „11b, ‟12a, ‟13a]
Probabilistic Theorem Proving [Gogate‟11]
Approximate Probabilistic Inference Lifted Belief Propagation [Jaimovich‟07, Singla‟08, Kersting‟09]
Lifted Bisimulation/Mini-buckets [Sen‟08, „09]
Lifted Importance Sampling [Gogate‟11, ‟12]
Lifted Relax, Compensate & Recover [V.d.Broeck‟12b]
Lifted MCMC [Niepert‟12, Niepert‟13, Venugopal‟12]
Lifted Variational Inference [Choi‟12, Bui‟12]
Lifted MAP-LP [Mladenov‟14, Apsel‟14]
Special-Purpose Inference: Lifted Kalman Filter [Ahmadi‟11, Choi‟11]
Lifted Linear Programming [Mladenov‟12]
[Kimmig‟14]
“But my application has
no symmetries?”
1. Statistical relational models have abundant symmetries
2. Some tasks do not require symmetries in data Weight learning, partition functions, single marginals, etc.
3. Symmetries of computation are not symmetries of data Belief propagation and MAP-LP require weaker automorphisms
4. Over-symmetric approximations Approximate P(Q|DB) by P(Q|DB')
DB' has more symmetries than DB (is more liftable)
Very high speed improvements
Low approximation error
[Kersting‟09, Mladenov‟14, V.d.Broeck‟13b]
Open Problems
Symmetric spaces:
• Prove hardness for ANY lifted inference task. Likely needed: #P1-hardness.
• Are lifted inference rules complete beyond FO2?
Asymmetric spaces:
• Prove completeness for CNF FO formulas
• Extend lifted inference algorithms beyond liftable formulas (need approximations)
• Measure of complexity as a function of the FO formula AND the database D. E.g. if D has bounded treewidth then tractable
Final Thoughts
Long-term outlook: probabilistic inference
exploits
• 1988: conditional independence
• 2000: contextual independence (local
structure)
201?: Exchangeability/Symmetries
Need lifted inference!
Thank You!
Questions?
Thank You!
Questions?
Probabilistic
Inference
Inside!
[Suciu‟11]
StarAI Workshop
@ AAAI on Sunday
References
[Gartner‟06]
Gartner (2006). Market Share: Relational Database Management Systems by Operating System, Worldwide [Carlson‟10]
Carlson, A., Betteridge, J., Kisiel, B., Settles, B., Hruschka Jr, E. R., & Mitchell, T. M. (2010, July). Toward an Architecture for Never-Ending Language Learning. In AAAI (Vol. 5, p. 3).
[Dong‟14] Dong, X. L., Murphy, K., Gabrilovich, E., Heitz, G., Horn, W., Lao, N., Strohmann, T., Sun, S. & Zhang, W.
(2014). Knowledge Vault: A Web-scale approach to probabilistic knowledge fusion.
[Niu‟12] Niu, F., Zhang, C., Ré, C., & Shavlik, J. W. (2012). DeepDive: Web-scale Knowledge-base Construction using Statistical Learning and Inference. In VLDS (pp. 25-28).
[Richardson‟06]
Richardson, M., & Domingos, P. (2006). Markov logic networks. Machine learning, 62(1-2), 107-136. [Niepert‟14]
Niepert, M., & Van den Broeck, G. (2014). Tractability through exchangeability: A new perspective on efficient probabilistic inference. Proceedings of AAAI.
[Vardi‟82] Vardi, M. Y. (1982). The complexity of relational query languages. In Proceedings of the fourteenth annual
ACM symposium on Theory of computing (pp. 137-146). ACM.
References
[Suciu‟11]
Suciu, D., Olteanu, D., Ré, C., & Koch, C. (2011). Probabilistic databases. Synthesis Lectures on Data Management, 3(2), 1-180
[Olteanu‟08] Olteanu, D., & Huang, J. (2008). Using OBDDs for efficient query evaluation on probabilistic databases. In
Scalable Uncertainty Management (pp. 326-340). Springer Berlin Heidelberg. [Jha‟13]
Jha, A., & Suciu, D. (2013). Knowledge compilation meets database theory: compiling queries to decision diagrams. Theory of Computing Systems, 52(3), 403-440.
[Dalvi‟04] Dalvi, N. and Suciu, D. 2004. Efficient query evaluation on probabilistic databases. In VLDB.
[V.d.Broeck‟11a]
Van den Broeck, G., Taghipour, N., Meert, W., Davis, J., & De Raedt, L. (2011, July). Lifted probabilistic inference by first-order knowledge compilation. In Proceedings of the Twenty-Second international joint conference on Artificial Intelligence (pp. 2178-2185). AAAI Press.
[Dalvi‟12]
Dalvi, N., & Suciu, D. (2012). The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM), 59(6), 30.
References
[Chavira‟05]
Chavira, M., & Darwiche, A. (2005). Compiling Bayesian networks with local structure. In IJCAI (Vol. 5, pp. 1306-1312).
[Sang‟05] Sang, T., Beame, P., & Kautz, H. A. (2005, July). Performing Bayesian inference by weighted model
counting. In AAAI (Vol. 5, pp. 475-481). [Chavira‟08]
Chavira, M., & Darwiche, A. (2008). On probabilistic inference by weighted model counting. Artificial Intelligence, 172(6), 772-799.
[Choi‟13] Choi, A., Kisa, D., & Darwiche, A. (2013). Compiling probabilistic graphical models using sentential decision
diagrams. In Symbolic and Quantitative Approaches to Reasoning with Uncertainty (pp. 121-132). Springer Berlin Heidelberg.
[Chavira‟06] Chavira, M., Darwiche, A., & Jaeger, M. (2006). Compiling relational Bayesian networks for exact inference.
International Journal of Approximate Reasoning, 42(1), 4-20.
[Fierens‟11] Fierens, D., Broeck, G. V. D., Thon, I., Gutmann, B., & De Raedt, L. (2011). Inference in probabilistic logic programs using weighted CNF's. in UAI.
References
[Fierens‟13]
Fierens, D., Van den Broeck, G., Renkens, J., Shterionov, D., Gutmann, B., Thon, I., ... & De Raedt, L. (2013). Inference and learning in probabilistic logic programs using weighted boolean formulas. Theory and Practice of Logic Programming, 1-44.
[Gogate‟11]
Gogate, V., & Domingos, P. (2012). Probabilistic theorem proving. Proceedings of Uncertainty in AI.
[V.d.Boeck‟13a] Van den Broeck, G. (2013). Lifted Inference and Learning in Statistical Relational Models (Doctoral dissertation, Ph. D. Dissertation, KU Leuven).
[V.d.Boeck‟14]
Van den Broeck, G., Meert, W., & Darwiche, A. (2013). Skolemization for weighted first-order model counting. Proceedings of KR.
[Gribkoff‟14] Gribkoff, E., Van den Broeck, G., & Suciu, D. (2014). Understanding the Complexity of Lifted Inference and
Asymmetric Weighted Model Counting. Proceedings of Uncertainty in AI. [Jha‟12]
Jha, A., & Suciu, D. (2012). Probabilistic databases with MarkoViews. Proceedings of the VLDB Endowment, 5(11), 1160-1171.
References
[V.d.Broeck‟11b]
Van den Broeck, G. (2011). On the completeness of first-order knowledge compilation for lifted probabilistic inference. In Advances in Neural Information Processing Systems (pp. 1386-1394).
[Jaeger-‟12] Jaeger, M., & Van den Broeck, G. (2012, August). Liftability of probabilistic inference: Upper and lower
bounds. In Proceedings of the 2nd International Workshop on Statistical Relational AI .
[Cook‟71] Cook, S. A. (1971, May). The complexity of theorem-proving procedures. In Proceedings of the third annual ACM symposium on Theory of computing (pp. 151-158). ACM.
[Valiant‟79]
Valiant, L. G. (1979). The complexity of computing the permanent. Theoretical computer science, 8(2), 189-201.
[Provan‟83] Provan, J. S., & Ball, M. O. (1983). The complexity of counting cuts and of computing the probability that a
graph is connected. SIAM Journal on Computing, 12(4), 777-788. [Stanley‟97]
Stanley, R. P. 1997. Enumerative Combinatorics. Cambridge University Press.
References
[Davis‟60]
Davis, M., & Putnam, H. (1960). A computing procedure for quantification theory. Journal of the ACM (JACM), 7(3), 201-215.
[Davis‟62] Davis, M., Logemann, G., & Loveland, D. (1962). A machine program for theorem-proving. Communications
of the ACM, 5(7), 394-397. [Huang‟05]
Huang, J., & Darwiche, A. (2005, July). DPLL with a trace: From SAT to knowledge compilation. In IJCAI (Vol. 5, pp. 156-162).
[Huang‟07] Huang, J., & Darwiche, A. (2007). The Language of Search. J. Artif. Intell. Res.(JAIR), 29, 191-219.
[Beame‟14]
Beame, P., Li, J., Roy, S., & Suciu, D. (2014). Counting of Query Expressions: Limitations of Propositional Methods. In ICDT (pp. 177-188).
[Poole‟03] Poole, D. (2003, August). First-order probabilistic inference. In IJCAI (Vol. 3, pp. 985-991).
[Braz‟05]
Braz, R., Amir, E., & Roth, D. (2005, July). Lifted first-order probabilistic inference. In Proceedings of the 19th international joint conference on Artificial intelligence (pp. 1319-1325).
References
[Milch‟08]
Milch, B., Zettlemoyer, L. S., Kersting, K., Haimes, M., & Kaelbling, L. P. (2008, July). Lifted Probabilistic Inference with Counting Formulas. In AAAI (Vol. 8, pp. 1062-1068).
[Taghipour‟13] Taghipour, N., Fierens, D., Davis, J., & Blockeel, H. (2014). Lifted variable elimination: Decoupling the
operators from the constraint language. JAIR
[V.d.Broeck‟12a] Van den Broeck, G., & Davis, J. (2012, July). Conditioning in First-Order Knowledge Compilation and Lifted Probabilistic Inference. In AAAI.
[Jaimovich‟07]
Jaimovich, A., Meshi, O., & Friedman, N. (2007). Template based inference in symmetric relational Markov random fields. Proceedings of Uncertainty in AI
[Singla‟08] Singla, P., & Domingos, P. (2008, July). Lifted First-Order Belief Propagation. In AAAI (Vol. 8, pp. 1094-
1099). [Kersting‟09]
Kersting, K., Ahmadi, B., & Natarajan, S. (2009, June). Counting belief propagation. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (pp. 277-284). AUAI Press.
References [Sen‟08]
Sen, P., Deshpande, A., & Getoor, L. (2008). Exploiting shared correlations in probabilistic databases. Proceedings of the VLDB Endowment, 1(1), 809-820.
[Sen‟09] Sen, P., Deshpande, A., & Getoor, L. (2009). Bisimulation-based approximate lifted inference. In
Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence (pp. 496-505). AUAI Press.
[Gogate‟12] Gogate, V., Jha, A. K., & Venugopal, D. (2012). Advances in Lifted Importance Sampling. In AAAI.
[V.d.Broeck‟12b]
Van den Broeck, G., Choi, A., & Darwiche, A. (2012). Lifted relax, compensate and then recover: From approximate to exact lifted probabilistic inference. Proceedings of Uncertainty in AI
[Nieper‟12] Niepert, M. (2012). Markov chains on orbits of permutation groups. Proceedings of Uncertainty in AI
[Niepert‟13] Niepert, M. (2013). Symmetry-Aware Marginal Density Estimation. Proceedings of AAAI.
[Venugopal‟12] Venugopal, D., & Gogate, V. (2012). On lifting the gibbs sampling algorithm. In Advances in Neural
Information Processing Systems (pp. 1655-1663).
References [Choi‟12]
Choi, J., & Amir, E. (2012). Lifted relational variational inference. Proceedings of Uncertainty in AI [Bui‟12]
Bui, H. H., Huynh, T. N., & Riedel, S. (2012). Automorphism groups of graphical models and lifted variational inference. StarAI
[Mladenov‟14] Mladenov, M., Kersting, K., & Globerson, A. (2014). Efficient Lifting of MAP LP Relaxations Using k-Locality.
In Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics (pp. 623-632).
[Apsel‟14] Apsel, U., Kersting, K., & Mladenov, M. (2014). Lifting Relational MAP-LPs using Cluster Signatures.
Proceedings of AAAI
[Ahmadi‟11] Ahmadi, B., Kersting, K., & Sanner, S. (2011, July). Multi-evidence lifted message passing, with application
to pagerank and the kalman filter. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence (Vol. 22, No. 1, p. 1152).
[Choi‟11] Choi, J., Guzman-Rivera, A., & Amir, E. (2011, June). Lifted Relational Kalman Filtering. In IJCAI (pp. 2092-
2099).
References [Mladenov‟12]
Mladenov, M., Ahmadi, B., & Kersting, K. (2012). Lifted linear programming. In International Conference on Artificial Intelligence and Statistics (pp. 788-797).
[Kimmig‟14] Kimmig, A., Mihalkova, L., & Getoor, L. (2014). Lifted graphical models: a survey. Machine Learning, 1-45.
[V.d.Broeck‟13b] Van den Broeck, G., & Darwiche, A. (2013). On the complexity and approximation of binary evidence in lifted
inference. In Advances in Neural Information Processing Systems (pp. 2868-2876).