A Brief Introduction to Markov Logic Network

transcript

Zihao Ye

August 3, 2016

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 1 / 28

Overview

Background

A Markov Logic Network(MLN) is a probabilistic logic which appliesthe ideas of a Markov network to first-order logic, enabling uncertaininference.Markov logic networks generalize first-order logic, in a certain limit, allunsatisfiable statements have a probability of zero, and all tautologieshave the probability one.

History

Work in this area began in 20003 by Pedro Domingos and MattRichardson, and they began to use the term MLN to describe it.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 2 / 28

Overview

Background

A Markov Logic Network(MLN) is a probabilistic logic which appliesthe ideas of a Markov network to first-order logic, enabling uncertaininference.Markov logic networks generalize first-order logic, in a certain limit, allunsatisfiable statements have a probability of zero, and all tautologieshave the probability one.

History

Work in this area began in 20003 by Pedro Domingos and MattRichardson, and they began to use the term MLN to describe it.

First-order knowledge base(KB)

Definition

A set of sentences or formulas in first-order logic.

like(A,B)

dislike(A,C)

∀x, smoking(x) =⇒ cancer(x)

∀x∀y, boy(x) ∧ boy(y) ∧ friends(x, y) =⇒ friends(y, x)

· · · · · ·

Formulas

Formula symbols

constants. represent objects in the domain of interest.

Alice,Bob

variables. range over the objects in the domain.

∀people,Smoke?(people)

functions. represents mappings from tuples of objects to objects.

MotherOf(A)

predicates. represent relations among objects in the domain.

Smoke?(X)

Formulas

Formula symbols

Alice,Bob

MotherOf(A)

Smoke?(X)

Formulas

Formula symbols

Alice,Bob

MotherOf(A)

Smoke?(X)

Formulas

Formula symbols

Alice,Bob

MotherOf(A)

Smoke?(X)

Formulas

Combination

If F1 and F2 are formulas:

negation: ¬F1

conjunction: F1 ∧ F2

disjunction: F1 ∨ F2

implication: F1 =⇒ F2

equivalence: F1 ⇐⇒ F2

universal quantification: ∀x, F1

existential quantification: ∃x, F1

Some more definitions:term:A constant, a variable, or a function applied to a tuple of terms.

Anna, x, gcd(x, y)

atom:A predicate symbol applied to a tuple of terms.

Friends(x, MotherOf(Anna))

Some more definitions:term:A constant, a variable, or a function applied to a tuple of terms.

Anna, x, gcd(x, y)

atom:A predicate symbol applied to a tuple of terms.

Friends(x, MotherOf(Anna))

Some more definitions:

ground term:A term containing no variables, likewise, we can define groundatom.

StudentOf(John)

possible world:An assignment of truth value to each possible ground atom.

Boy(Bicheng) = 0,Girl(Papi) = 1

Some more definitions:

ground term:A term containing no variables, likewise, we can define groundatom.

StudentOf(John)

possible world:An assignment of truth value to each possible ground atom.

Boy(Bicheng) = 0,Girl(Papi) = 1

Satisfiable

A formula is satisfiable iff there exists at least one world in which itis true.

Inference

The formulas in a KB are implicitly conjoined, thus a KB can beviewed as a single large formula.

The basic inference problem in first-order logic is to determinewhether a knowledge base KB entails a formula F , which is alwaysdone by Refutation:

Whether KB ∨ ¬F is unsatisfiable.

Satisfiable

Inference

Satisfiable

Inference

Markov Network

Definition

It is composed of an undirected graph G(a node for each variable) andpotential function φk for each clique in the graph.

Markov Network

Probability Distribution Function

Joint distribution of X = (X1, X2, · · · , Xn) ∈ X :

P (X = x) =1

φk(x{k}

)where x{k} is the the state of the kth clique.Z(partition function) is used for formalization: Z =

∑X∈X

φk(x{k}

Markov Network

Definition

Markov blanket: In a Markov Network, the Markov blanket for anode A is its set of neighboring nodes. A Markov blanket may alsobe denoted by MB(A) or ∂A.

Markov Property

A probability has the Markov property when:

Pr(A | ∂A,B) = Pr(A | ∂A)

Markov Network

Definition

Markov blanket: In a Markov Network, the Markov blanket for anode A is its set of neighboring nodes. A Markov blanket may alsobe denoted by MB(A) or ∂A.

Markov Property

A probability has the Markov property when:

Pr(A | ∂A,B) = Pr(A | ∂A)

Markov Network

log-linear models

Each clique potential replaced by an exponentiated weighted sum offeatures of the state:

P (X = x) =1

wjfj(x)

We will focus on binary features, fj(x) ∈ {0, 1}.

Markov Logic Network

Definition

A Markov logic network L is a set of pairs (Fi, wi), where Fi is aformula in first-order logic and wi is a real number, together with a setof constants C =

{c1, c2, · · · , c|C|

From a Markov logic network we can construst a Markov networkML,C .

Clausal Form

Conjunction Normal Form(CNF)

CNF : CNF ∧ (clause)| (clause)

clause : literal ∨ clause| literal

literal : atom | ¬atom

It has been proved every FOL can be represented as CNF.

Clausal Form

Conjunction Normal Form(CNF)

CNF : CNF ∧ (clause)| (clause)

clause : literal ∨ clause| literal

literal : atom | ¬atom

It has been proved every FOL can be represented as CNF.

Example

Assumptions

Assumption 1: Unique names.

Assumption 2: Domain closure: f(C) ⊆ C.

Assumption 3: Known functions.

These assumptions ensure that Markov Network be finite.

Example

Assumptions

Assumption 1: Unique names.

Assumption 2: Domain closure: f(C) ⊆ C.

Assumption 3: Known functions.

These assumptions ensure that Markov Network be finite.

Construction

∀x∀y,Friends(x, y) ⇒ (Smokes(x) ⇐⇒ Smokes(y))

∀x,Smokes(x) ⇒ Cancer(x)

Probability Distribution Function

By combining groundings correspond to the same formula, we get:

P (X = x) =1

wini(x)

φi(x{i}

)ni(i)

Where ni(x) is the number of true groundings of Fi in x, x{i} is thestate(true values) of the atoms appearing in Fi, φi

)= ewi .

Examples

P (0, 0) =1

Ze1·w

P (0, 1) =1

Ze1·w

P (1, 0) =1

Ze0·w

P (1, 1) =1

Ze1·w

Z = 3ew + 1, P (Cancer(A) = 1|Smokes(A) = 1) =ew

ew + 1

w → +∞, P → 1

Examples

P (0, 0) =1

Ze1·w

P (0, 1) =1

Ze1·w

P (1, 0) =1

Ze0·w

P (1, 1) =1

Ze1·w

ew + 1

w → +∞, P → 1

Examples

P (0, 0) =1

Ze1·w

P (0, 1) =1

Ze1·w

P (1, 0) =1

Ze0·w

P (1, 1) =1

Ze1·w

ew + 1

w → +∞, P → 1

Inference

What is the probability that formula F1 holds given that formula F2

does?P (F1|F2, L, C) (1)

Logic Inferencing

Whether KB entails F?

P (F |LKB, CKB,F )?= 1

LKB is the MLN obtained by assigning +∞ to all the formulas in KB.It’s implied in (1) just let F2 = True.

Inference

does?P (F1|F2, L, C) (1)

Logic Inferencing

Inference

does?P (F1|F2, L, C) (1)

Logic Inferencing

Inference

P (F1 | F2, L, C) = P (F1 | F2,ML,C)

=P (F1 ∧ F2 |ML,C)

P (F2 |ML,C)

∑x∈XF1

∩XF2

P (X = x |ML,C)∑x∈XF2

P (X = x |ML,C)

Algorithm: MCMCRejects all moves to state where F2 doesn’t hold, and counts thenumber of samples in which F1 holds.

Inference

P (F1 | F2, L, C) = P (F1 | F2,ML,C)

=P (F1 ∧ F2 |ML,C)

P (F2 |ML,C)

∑x∈XF1

∩XF2

P (X = x |ML,C)∑x∈XF2

P (X = x |ML,C)

Algorithm: MCMCRejects all moves to state where F2 doesn’t hold, and counts thenumber of samples in which F1 holds.

Optimization

MCMC is likely to be too slow for arbitrary formulas.A more efficient algorithm can be implemented on cases(most frequent)that F1 and F2 are conjunctions of ground literals.

l1 ∧ l2 ∧ l3 · · · ln

Algorithm: Gibbs sampling on Markov Blankets(MB).

Training

We learn MLN weights from one or more relational databases.

Database

A database is effectively a vector x = (x1, · · · , xl, · · · , xn) where xl isthe truth value of the lth ground atom.

Closed world assumption

If a ground atom is not in the database, it’s assumed to be false.

Training

Database

Training

Database

Training

Object: Find the proper w to maximize Pw(X = x).

Method: Nonlinear optimization.

Gradient:

∂wilogPw(X = x) = ni(x)−

∑x′

P (X = x′)ni(x′)

Counting the ni(x) is intractable(#P-complete in the length of theclause).Computing the expectation of true groundings is also intractable.

Training

Gradient:

∑x′

P (X = x′)ni(x′)

Training

Gradient:

∑x′

P (X = x′)ni(x′)

Training

Alternative: Pseudo-likelihood method:

P ∗w(X = x) =

n∏l=1

Pw(Xl = xl | ∂xXl)

Gradient:

∂wilogP ∗w(X = x) =

n∑i=1

[ni(x)− Pw(Xl = 0 | ∂xXl)ni(x[Xl=0])

−Pw(Xl = 1 | ∂xXl)ni(x[Xl=1])]

Doesn’t require inference over the model.

Method: Limited-memory BFGS algorithm.

Tricks

Training

P ∗w(X = x) =

n∏l=1

Gradient:

n∑i=1

Tricks

Training

P ∗w(X = x) =

n∏l=1

Gradient:

n∑i=1

Tricks

Training

P ∗w(X = x) =

n∏l=1

Gradient:

n∑i=1

Tricks

Training

P ∗w(X = x) =

n∏l=1

Gradient:

n∑i=1

Tricks

Applications

Collective Classification

Attri(x1, v), R(x1, x2)⇒ Attrj(x2, v)

Link Prediction

Attri(x1, v), Attrj(x2, v)⇒ R(x1, x2)

Link-Based Clustering

Social Network Modeling

Object IdentificationEqual(x1, x2)

Application in DSTC-4

First-order logic:1 word in transcript(seg,$t,$s,$w) ∧ value in transcript(seg,$t,$s,v) ⇒ state(seg,$t,$s,v)

2 ......

Experiments

ProbCog

A Toolbox for Statistical Relational Learning and Reasoning.The following representation formalisms for probabilistic knowledge aresupported:

Bayesian Logic Networks(BLNs)

Markov Logic Networks(MLNs) & Adaptive Markov LogicNetworks (AMLNs)

Bayesian Networks(BNs)

https://github.com/opcode81/ProbCog

Thanks for listening.

A Brief Introduction to Markov Logic Network

Science