A Brief Introduction to Markov Logic Network

Post on 13-Apr-2017

49 views 3 download

transcript

A Brief Introduction to Markov Logic Network

Zihao Ye

SJTU

August 3, 2016

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 1 / 28

Overview

Background

A Markov Logic Network(MLN) is a probabilistic logic which appliesthe ideas of a Markov network to first-order logic, enabling uncertaininference.Markov logic networks generalize first-order logic, in a certain limit, allunsatisfiable statements have a probability of zero, and all tautologieshave the probability one.

History

Work in this area began in 20003 by Pedro Domingos and MattRichardson, and they began to use the term MLN to describe it.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 2 / 28

Overview

Background

A Markov Logic Network(MLN) is a probabilistic logic which appliesthe ideas of a Markov network to first-order logic, enabling uncertaininference.Markov logic networks generalize first-order logic, in a certain limit, allunsatisfiable statements have a probability of zero, and all tautologieshave the probability one.

History

Work in this area began in 20003 by Pedro Domingos and MattRichardson, and they began to use the term MLN to describe it.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 2 / 28

First-order knowledge base(KB)

Definition

A set of sentences or formulas in first-order logic.

like(A,B)

dislike(A,C)

∀x, smoking(x) =⇒ cancer(x)

∀x∀y, boy(x) ∧ boy(y) ∧ friends(x, y) =⇒ friends(y, x)

· · · · · ·

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 3 / 28

Formulas

Formula symbols

constants. represent objects in the domain of interest.

Alice,Bob

variables. range over the objects in the domain.

∀people,Smoke?(people)

functions. represents mappings from tuples of objects to objects.

MotherOf(A)

predicates. represent relations among objects in the domain.

Smoke?(X)

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 4 / 28

Formulas

Formula symbols

constants. represent objects in the domain of interest.

Alice,Bob

variables. range over the objects in the domain.

∀people,Smoke?(people)

functions. represents mappings from tuples of objects to objects.

MotherOf(A)

predicates. represent relations among objects in the domain.

Smoke?(X)

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 4 / 28

Formulas

Formula symbols

constants. represent objects in the domain of interest.

Alice,Bob

variables. range over the objects in the domain.

∀people,Smoke?(people)

functions. represents mappings from tuples of objects to objects.

MotherOf(A)

predicates. represent relations among objects in the domain.

Smoke?(X)

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 4 / 28

Formulas

Formula symbols

constants. represent objects in the domain of interest.

Alice,Bob

variables. range over the objects in the domain.

∀people,Smoke?(people)

functions. represents mappings from tuples of objects to objects.

MotherOf(A)

predicates. represent relations among objects in the domain.

Smoke?(X)

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 4 / 28

Formulas

Combination

If F1 and F2 are formulas:

negation: ¬F1

conjunction: F1 ∧ F2

disjunction: F1 ∨ F2

implication: F1 =⇒ F2

equivalence: F1 ⇐⇒ F2

universal quantification: ∀x, F1

existential quantification: ∃x, F1

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 5 / 28

First-order knowledge base(KB)

Some more definitions:term:A constant, a variable, or a function applied to a tuple of terms.

Anna, x, gcd(x, y)

atom:A predicate symbol applied to a tuple of terms.

Friends(x, MotherOf(Anna))

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 6 / 28

First-order knowledge base(KB)

Some more definitions:term:A constant, a variable, or a function applied to a tuple of terms.

Anna, x, gcd(x, y)

atom:A predicate symbol applied to a tuple of terms.

Friends(x, MotherOf(Anna))

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 6 / 28

First-order knowledge base(KB)

Some more definitions:

ground term:A term containing no variables, likewise, we can define groundatom.

StudentOf(John)

possible world:An assignment of truth value to each possible ground atom.

Boy(Bicheng) = 0,Girl(Papi) = 1

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 7 / 28

First-order knowledge base(KB)

Some more definitions:

ground term:A term containing no variables, likewise, we can define groundatom.

StudentOf(John)

possible world:An assignment of truth value to each possible ground atom.

Boy(Bicheng) = 0,Girl(Papi) = 1

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 7 / 28

First-order knowledge base(KB)

Satisfiable

A formula is satisfiable iff there exists at least one world in which itis true.

Inference

The formulas in a KB are implicitly conjoined, thus a KB can beviewed as a single large formula.

The basic inference problem in first-order logic is to determinewhether a knowledge base KB entails a formula F , which is alwaysdone by Refutation:

Whether KB ∨ ¬F is unsatisfiable.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 8 / 28

First-order knowledge base(KB)

Satisfiable

A formula is satisfiable iff there exists at least one world in which itis true.

Inference

The formulas in a KB are implicitly conjoined, thus a KB can beviewed as a single large formula.

The basic inference problem in first-order logic is to determinewhether a knowledge base KB entails a formula F , which is alwaysdone by Refutation:

Whether KB ∨ ¬F is unsatisfiable.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 8 / 28

First-order knowledge base(KB)

Satisfiable

A formula is satisfiable iff there exists at least one world in which itis true.

Inference

The formulas in a KB are implicitly conjoined, thus a KB can beviewed as a single large formula.

The basic inference problem in first-order logic is to determinewhether a knowledge base KB entails a formula F , which is alwaysdone by Refutation:

Whether KB ∨ ¬F is unsatisfiable.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 8 / 28

Markov Network

Definition

It is composed of an undirected graph G(a node for each variable) andpotential function φk for each clique in the graph.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic Network August 3, 2016 9 / 28

Markov Network

Probability Distribution Function

Joint distribution of X = (X1, X2, · · · , Xn) ∈ X :

P (X = x) =1

Z

∏k

φk(x{k}

)where x{k} is the the state of the kth clique.Z(partition function) is used for formalization: Z =

∑X∈X

∏k

φk(x{k}

).

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 10 / 28

Markov Network

Definition

Markov blanket: In a Markov Network, the Markov blanket for anode A is its set of neighboring nodes. A Markov blanket may alsobe denoted by MB(A) or ∂A.

Markov Property

A probability has the Markov property when:

Pr(A | ∂A,B) = Pr(A | ∂A)

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 11 / 28

Markov Network

Definition

Markov blanket: In a Markov Network, the Markov blanket for anode A is its set of neighboring nodes. A Markov blanket may alsobe denoted by MB(A) or ∂A.

Markov Property

A probability has the Markov property when:

Pr(A | ∂A,B) = Pr(A | ∂A)

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 11 / 28

Markov Network

log-linear models

Each clique potential replaced by an exponentiated weighted sum offeatures of the state:

P (X = x) =1

Zexp

∑j

wjfj(x)

We will focus on binary features, fj(x) ∈ {0, 1}.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 12 / 28

Markov Logic Network

Definition

A Markov logic network L is a set of pairs (Fi, wi), where Fi is aformula in first-order logic and wi is a real number, together with a setof constants C =

{c1, c2, · · · , c|C|

}.

From a Markov logic network we can construst a Markov networkML,C .

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 13 / 28

Clausal Form

Conjunction Normal Form(CNF)

CNF : CNF ∧ (clause)| (clause)

clause : literal ∨ clause| literal

literal : atom | ¬atom

It has been proved every FOL can be represented as CNF.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 14 / 28

Clausal Form

Conjunction Normal Form(CNF)

CNF : CNF ∧ (clause)| (clause)

clause : literal ∨ clause| literal

literal : atom | ¬atom

It has been proved every FOL can be represented as CNF.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 14 / 28

Markov Logic Network

Example

Assumptions

Assumption 1: Unique names.

Assumption 2: Domain closure: f(C) ⊆ C.

Assumption 3: Known functions.

These assumptions ensure that Markov Network be finite.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 15 / 28

Markov Logic Network

Example

Assumptions

Assumption 1: Unique names.

Assumption 2: Domain closure: f(C) ⊆ C.

Assumption 3: Known functions.

These assumptions ensure that Markov Network be finite.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 15 / 28

Markov Logic Network

Construction

∀x∀y,Friends(x, y) ⇒ (Smokes(x) ⇐⇒ Smokes(y))

∀x,Smokes(x) ⇒ Cancer(x)

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 16 / 28

Markov Logic Network

Probability Distribution Function

By combining groundings correspond to the same formula, we get:

P (X = x) =1

Zexp

(∑i

wini(x)

)=

1

Z

∏i

φi(x{i}

)ni(i)

Where ni(x) is the number of true groundings of Fi in x, x{i} is thestate(true values) of the atoms appearing in Fi, φi

(x{i}

)= ewi .

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 17 / 28

Markov Logic Network

Examples

P (0, 0) =1

Ze1·w

P (0, 1) =1

Ze1·w

P (1, 0) =1

Ze0·w

P (1, 1) =1

Ze1·w

Z = 3ew + 1, P (Cancer(A) = 1|Smokes(A) = 1) =ew

ew + 1

w → +∞, P → 1

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 18 / 28

Markov Logic Network

Examples

P (0, 0) =1

Ze1·w

P (0, 1) =1

Ze1·w

P (1, 0) =1

Ze0·w

P (1, 1) =1

Ze1·w

Z = 3ew + 1, P (Cancer(A) = 1|Smokes(A) = 1) =ew

ew + 1

w → +∞, P → 1

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 18 / 28

Markov Logic Network

Examples

P (0, 0) =1

Ze1·w

P (0, 1) =1

Ze1·w

P (1, 0) =1

Ze0·w

P (1, 1) =1

Ze1·w

Z = 3ew + 1, P (Cancer(A) = 1|Smokes(A) = 1) =ew

ew + 1

w → +∞, P → 1

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 18 / 28

Markov Logic Network

Inference

What is the probability that formula F1 holds given that formula F2

does?P (F1|F2, L, C) (1)

Logic Inferencing

Whether KB entails F?

P (F |LKB, CKB,F )?= 1

LKB is the MLN obtained by assigning +∞ to all the formulas in KB.It’s implied in (1) just let F2 = True.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 19 / 28

Markov Logic Network

Inference

What is the probability that formula F1 holds given that formula F2

does?P (F1|F2, L, C) (1)

Logic Inferencing

Whether KB entails F?

P (F |LKB, CKB,F )?= 1

LKB is the MLN obtained by assigning +∞ to all the formulas in KB.It’s implied in (1) just let F2 = True.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 19 / 28

Markov Logic Network

Inference

What is the probability that formula F1 holds given that formula F2

does?P (F1|F2, L, C) (1)

Logic Inferencing

Whether KB entails F?

P (F |LKB, CKB,F )?= 1

LKB is the MLN obtained by assigning +∞ to all the formulas in KB.It’s implied in (1) just let F2 = True.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 19 / 28

Markov Logic Network

Inference

P (F1 | F2, L, C) = P (F1 | F2,ML,C)

=P (F1 ∧ F2 |ML,C)

P (F2 |ML,C)

=

∑x∈XF1

∩XF2

P (X = x |ML,C)∑x∈XF2

P (X = x |ML,C)

Algorithm: MCMCRejects all moves to state where F2 doesn’t hold, and counts thenumber of samples in which F1 holds.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 20 / 28

Markov Logic Network

Inference

P (F1 | F2, L, C) = P (F1 | F2,ML,C)

=P (F1 ∧ F2 |ML,C)

P (F2 |ML,C)

=

∑x∈XF1

∩XF2

P (X = x |ML,C)∑x∈XF2

P (X = x |ML,C)

Algorithm: MCMCRejects all moves to state where F2 doesn’t hold, and counts thenumber of samples in which F1 holds.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 20 / 28

Markov Logic Network

Optimization

MCMC is likely to be too slow for arbitrary formulas.A more efficient algorithm can be implemented on cases(most frequent)that F1 and F2 are conjunctions of ground literals.

l1 ∧ l2 ∧ l3 · · · ln

Algorithm: Gibbs sampling on Markov Blankets(MB).

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 21 / 28

Markov Logic Network

Training

We learn MLN weights from one or more relational databases.

Database

A database is effectively a vector x = (x1, · · · , xl, · · · , xn) where xl isthe truth value of the lth ground atom.

Closed world assumption

If a ground atom is not in the database, it’s assumed to be false.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 22 / 28

Markov Logic Network

Training

We learn MLN weights from one or more relational databases.

Database

A database is effectively a vector x = (x1, · · · , xl, · · · , xn) where xl isthe truth value of the lth ground atom.

Closed world assumption

If a ground atom is not in the database, it’s assumed to be false.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 22 / 28

Markov Logic Network

Training

We learn MLN weights from one or more relational databases.

Database

A database is effectively a vector x = (x1, · · · , xl, · · · , xn) where xl isthe truth value of the lth ground atom.

Closed world assumption

If a ground atom is not in the database, it’s assumed to be false.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 22 / 28

Markov Logic Network

Training

Object: Find the proper w to maximize Pw(X = x).

Method: Nonlinear optimization.

Gradient:

∂wilogPw(X = x) = ni(x)−

∑x′

P (X = x′)ni(x′)

Counting the ni(x) is intractable(#P-complete in the length of theclause).Computing the expectation of true groundings is also intractable.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 23 / 28

Markov Logic Network

Training

Object: Find the proper w to maximize Pw(X = x).

Method: Nonlinear optimization.

Gradient:

∂wilogPw(X = x) = ni(x)−

∑x′

P (X = x′)ni(x′)

Counting the ni(x) is intractable(#P-complete in the length of theclause).Computing the expectation of true groundings is also intractable.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 23 / 28

Markov Logic Network

Training

Object: Find the proper w to maximize Pw(X = x).

Method: Nonlinear optimization.

Gradient:

∂wilogPw(X = x) = ni(x)−

∑x′

P (X = x′)ni(x′)

Counting the ni(x) is intractable(#P-complete in the length of theclause).Computing the expectation of true groundings is also intractable.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 23 / 28

Training

Alternative: Pseudo-likelihood method:

P ∗w(X = x) =

n∏l=1

Pw(Xl = xl | ∂xXl)

Gradient:

∂wilogP ∗w(X = x) =

n∑i=1

[ni(x)− Pw(Xl = 0 | ∂xXl)ni(x[Xl=0])

−Pw(Xl = 1 | ∂xXl)ni(x[Xl=1])]

Doesn’t require inference over the model.

Method: Limited-memory BFGS algorithm.

Tricks

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 24 / 28

Training

Alternative: Pseudo-likelihood method:

P ∗w(X = x) =

n∏l=1

Pw(Xl = xl | ∂xXl)

Gradient:

∂wilogP ∗w(X = x) =

n∑i=1

[ni(x)− Pw(Xl = 0 | ∂xXl)ni(x[Xl=0])

−Pw(Xl = 1 | ∂xXl)ni(x[Xl=1])]

Doesn’t require inference over the model.

Method: Limited-memory BFGS algorithm.

Tricks

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 24 / 28

Training

Alternative: Pseudo-likelihood method:

P ∗w(X = x) =

n∏l=1

Pw(Xl = xl | ∂xXl)

Gradient:

∂wilogP ∗w(X = x) =

n∑i=1

[ni(x)− Pw(Xl = 0 | ∂xXl)ni(x[Xl=0])

−Pw(Xl = 1 | ∂xXl)ni(x[Xl=1])]

Doesn’t require inference over the model.

Method: Limited-memory BFGS algorithm.

Tricks

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 24 / 28

Training

Alternative: Pseudo-likelihood method:

P ∗w(X = x) =

n∏l=1

Pw(Xl = xl | ∂xXl)

Gradient:

∂wilogP ∗w(X = x) =

n∑i=1

[ni(x)− Pw(Xl = 0 | ∂xXl)ni(x[Xl=0])

−Pw(Xl = 1 | ∂xXl)ni(x[Xl=1])]

Doesn’t require inference over the model.

Method: Limited-memory BFGS algorithm.

Tricks

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 24 / 28

Training

Alternative: Pseudo-likelihood method:

P ∗w(X = x) =

n∏l=1

Pw(Xl = xl | ∂xXl)

Gradient:

∂wilogP ∗w(X = x) =

n∑i=1

[ni(x)− Pw(Xl = 0 | ∂xXl)ni(x[Xl=0])

−Pw(Xl = 1 | ∂xXl)ni(x[Xl=1])]

Doesn’t require inference over the model.

Method: Limited-memory BFGS algorithm.

Tricks

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 24 / 28

Markov Logic Network

Applications

Collective Classification

Attri(x1, v), R(x1, x2)⇒ Attrj(x2, v)

Link Prediction

Attri(x1, v), Attrj(x2, v)⇒ R(x1, x2)

Link-Based Clustering

Social Network Modeling

Object IdentificationEqual(x1, x2)

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 25 / 28

Application in DSTC-4

First-order logic:1 word in transcript(seg,$t,$s,$w) ∧ value in transcript(seg,$t,$s,v) ⇒ state(seg,$t,$s,v)

2 ......

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 26 / 28

Experiments

ProbCog

A Toolbox for Statistical Relational Learning and Reasoning.The following representation formalisms for probabilistic knowledge aresupported:

Bayesian Logic Networks(BLNs)

Markov Logic Networks(MLNs) & Adaptive Markov LogicNetworks (AMLNs)

Bayesian Networks(BNs)

https://github.com/opcode81/ProbCog

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 27 / 28

End

Thanks for listening.

Zihao Ye (SJTU) A Brief Introduction to Markov Logic NetworkAugust 3, 2016 28 / 28