+ All Categories
Home > Documents > Learning the Structure of Markov Logic Networks

Learning the Structure of Markov Logic Networks

Date post: 23-Jan-2016
Category:
Upload: sage
View: 31 times
Download: 0 times
Share this document with a friend
Description:
Learning the Structure of Markov Logic Networks. Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington. Overview. Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion. Motivation. Statistical Relational Learning (SRL) - PowerPoint PPT Presentation
Popular Tags:
60
1 Learning the Structure of Markov Logic Networks Stanley Kok & Pedro Domingos Dept. of Computer Science and Eng. University of Washington
Transcript
Page 1: Learning the Structure of  Markov Logic Networks

1

Learning the Structure of Markov Logic

Networks

Stanley Kok & Pedro Domingos

Dept. of Computer Science and Eng.

University of Washington

Page 2: Learning the Structure of  Markov Logic Networks

2

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 3: Learning the Structure of  Markov Logic Networks

3

Motivation Statistical Relational Learning (SRL)

combines the benefits of: Statistical Learning: uses probability to handle

uncertainty in a robust and principled way Relational Learning: models domains with

multiple relations

Page 4: Learning the Structure of  Markov Logic Networks

4

Motivation Many SRL approaches combine a logical

language and Bayesian networks e.g. Probabilistic Relational Models

[Friedman et al., 1999]

The need to avoid cycles in Bayesian networks causes many difficulties [Taskar et al., 2002]

Started using Markov networks instead

Page 5: Learning the Structure of  Markov Logic Networks

5

Motivation Relational Markov Networks [Taskar et al., 2002]

conjunctive database queries + Markov networks Require space exponential in the size of the cliques

Markov Logic Networks [Richardson & Domingos, 2004]

First-order logic + Markov networks Compactly represent large cliques Did not learn structure (used external ILP system)

Page 6: Learning the Structure of  Markov Logic Networks

6

Motivation Relational Markov Networks [Taskar et al., 2002]

conjunctive database queries + Markov networks Require space exponential in the size of the cliques

Markov Logic Networks [Richardson & Domingos, 2004]

First-order logic + Markov networks Compactly represent large cliques Did not learn structure (used external ILP system)

This paper develops a fast algorithm that learns MLN structure Most powerful SRL learner to date

Page 7: Learning the Structure of  Markov Logic Networks

7

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 8: Learning the Structure of  Markov Logic Networks

8

Markov Logic Networks

First-order KB: set of hard constraints Violate one formula, a world has zero probability

MLNs soften constraints OK to violate formulas The fewer formulas a world violates,

the more probable it is Gives each formula a weight,

reflects how strong a constraint it is

Page 9: Learning the Structure of  Markov Logic Networks

9

MLN Definition A Markov Logic Network (MLN) is a set of

pairs (F, w) where F is a formula in first-order logic w is a real number

Together with a finite set of constants,it defines a Markov network with One node for each grounding of each predicate

in the MLN One feature for each grounding of each formula F

in the MLN, with the corresponding weight w

Page 10: Learning the Structure of  Markov Logic Networks

10

Ground Markov Network

Student(STAN)

Professor(PEDRO)

AdvisedBy(STAN,PEDRO)

Professor(STAN)

Student(PEDRO)

AdvisedBy(PEDRO,STAN)

AdvisedBy(STAN,STAN)

AdvisedBy(PEDRO,PEDRO)

AdvisedBy(S,P) ) Student(S) ^ Professor(P)2.7

constants: STAN, PEDRO

Page 11: Learning the Structure of  Markov Logic Networks

11

MLN Model

Page 12: Learning the Structure of  Markov Logic Networks

12

MLN Model

Vector of value assignments to ground predicates

Page 13: Learning the Structure of  Markov Logic Networks

13

MLN Model

Vector of value assignments to ground predicates

Partition function. Sums over all possiblevalue assignments to ground predicates

Page 14: Learning the Structure of  Markov Logic Networks

14

MLN Model

Vector of value assignments to ground predicates

Partition function. Sums over all possiblevalue assignments to ground predicates

Weight of ith formula

Page 15: Learning the Structure of  Markov Logic Networks

15

MLN Model

Vector of value assignments to ground predicates

Partition function. Sums over all possiblevalue assignments to ground predicates

Weight of ith formula

# of true groundings of ith formula

Page 16: Learning the Structure of  Markov Logic Networks

16

MLN Weight Learning

Likelihood is concave function of weights Quasi-Newton methods to find optimal weights

e.g. L-BFGS [Liu & Nocedal, 1989]

Page 17: Learning the Structure of  Markov Logic Networks

17

MLN Weight Learning

Likelihood is concave function of weights Quasi-Newton methods to find optimal weights

e.g. L-BFGS [Liu & Nocedal, 1989]

SLOW#P-complete

Page 18: Learning the Structure of  Markov Logic Networks

18

MLN Weight Learning

Likelihood is concave function of weights Quasi-Newton methods to find optimal weights

e.g. L-BFGS [Liu & Nocedal, 1989]

SLOW#P-completeSLOW

#P-complete

Page 19: Learning the Structure of  Markov Logic Networks

19

MLN Weight Learning R&D used pseudo-likelihood [Besag, 1975]

Page 20: Learning the Structure of  Markov Logic Networks

20

MLN Weight Learning R&D used pseudo-likelihood [Besag, 1975]

Page 21: Learning the Structure of  Markov Logic Networks

21

MLN Structure Learning

R&D “learned” MLN structure in two disjoint steps: Learn first-order clauses with an off-the-shelf

ILP system (CLAUDIEN [De Raedt & Dehaspe, 1997]) Learn clause weights by optimizing

pseudo-likelihood Unlikely to give best results because CLAUDIEN

find clauses that hold with some accuracy/frequency in the data

don’t find clauses that maximize data’s (pseudo-)likelihood

Page 22: Learning the Structure of  Markov Logic Networks

22

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 23: Learning the Structure of  Markov Logic Networks

23

This paper develops an algorithm that: Learns first-order clauses by directly optimizing

pseudo-likelihood Is fast enough Performs better than R&D, pure ILP,

purely KB and purely probabilistic approaches

MLN Structure Learning

Page 24: Learning the Structure of  Markov Logic Networks

24

Structure Learning Algorithm

High-level algorithmREPEAT

MLN Ã MLN [ FindBestClauses(MLN)UNTIL FindBestClauses(MLN) returns NULL

FindBestClauses(MLN)Create candidate clausesFOR EACH candidate clause c

Compute increase in evaluation measureof adding c to MLN

RETURN k clauses with greatest increase

Page 25: Learning the Structure of  Markov Logic Networks

25

Structure Learning Evaluation measure Clause construction operators Search strategies Speedup techniques

Page 26: Learning the Structure of  Markov Logic Networks

26

Evaluation Measure

R&D used pseudo-log-likelihood

This gives undue weight to predicates with large # of groundings

Page 27: Learning the Structure of  Markov Logic Networks

27

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

Page 28: Learning the Structure of  Markov Logic Networks

28

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

weight given to predicate r

Page 29: Learning the Structure of  Markov Logic Networks

29

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

sums over groundings of predicate r

weight given to predicate r

Page 30: Learning the Structure of  Markov Logic Networks

30

Weighted pseudo-log-likelihood (WPLL)

Gaussian weight prior Structure prior

Evaluation Measure

sums over groundings of predicate r

weight given to predicate r CLL: conditional log-likelihood

Page 31: Learning the Structure of  Markov Logic Networks

31

Clause Construction Operators Add a literal (negative/positive) Remove a literal Flip signs of literals Limit # of distinct variables to restrict

search space

Page 32: Learning the Structure of  Markov Logic Networks

32

Beam Search

Same as that used in ILP & rule induction Repeatedly find the single best clause

Page 33: Learning the Structure of  Markov Logic Networks

33

Shortest-First Search (SFS)

1. Start from empty or hand-coded MLN2. FOR L Ã 1 TO MAX_LENGTH3. Apply each literal addition & deletion to

each clause to create clauses of length L4. Repeatedly add K best clauses of length L

to the MLN until no clause of length L improves WPLL

Similar to Della Pietra et al. (1997), McCallum (2003)

Page 34: Learning the Structure of  Markov Logic Networks

34

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS) of adding c to MLN

RETURN k clauses with greatest increase

Page 35: Learning the Structure of  Markov Logic Networks

35

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS)of adding c to MLN

RETURN k clauses with greatest increase

SLOWMany candidates

Page 36: Learning the Structure of  Markov Logic Networks

36

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS)of adding c to MLN

RETURN k clauses with greatest increase

SLOWMany candidates

SLOWMany CLLs

SLOWEach CLL involves a#P-complete problem

Page 37: Learning the Structure of  Markov Logic Networks

37

Speedup Techniques FindBestClauses(MLN)

Creates candidate clausesFOR EACH candidate clause c

Compute increase in WPLL (using L-BFGS)of adding c to MLN

RETURN k clauses with greatest increase

SLOWMany candidates

NOT THAT FAST

SLOWMany CLLs

SLOWEach CLL involves a#P-complete problem

Page 38: Learning the Structure of  Markov Logic Networks

38

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 39: Learning the Structure of  Markov Logic Networks

39

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 40: Learning the Structure of  Markov Logic Networks

40

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 41: Learning the Structure of  Markov Logic Networks

41

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 42: Learning the Structure of  Markov Logic Networks

42

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 43: Learning the Structure of  Markov Logic Networks

43

Speedup Techniques Clause Sampling Predicate Sampling Avoid Redundancy Loose Convergence Thresholds Ignore Unrelated Clauses Weight Thresholding

Page 44: Learning the Structure of  Markov Logic Networks

44

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 45: Learning the Structure of  Markov Logic Networks

45

Experiments UW-CSE domain

22 predicates, e.g., AdvisedBy(X,Y), Student(X), etc. 10 types, e.g., Person, Course, Quarter, etc. # ground predicates ¼ 4 million # true ground predicates ¼ 3000 Handcrafted KB with 94 formulas

Each student has at most one advisor If a student is an author of a paper, so is her advisor

Cora domain Computer science research papers Collective deduplication of author, venue, title

Page 46: Learning the Structure of  Markov Logic Networks

46

Systems

MLN(SLB): structure learning with beam searchMLN(SLS): structure learning with SFS

Page 47: Learning the Structure of  Markov Logic Networks

47

Systems

MLN(SLB) MLN(SLS)

KB: hand-coded KBCL: CLAUDIENFO: FOILAL: Aleph

Page 48: Learning the Structure of  Markov Logic Networks

48

Systems

MLN(SLB) MLN(SLS)

KBCLFOAL

MLN(KB)MLN(CL)MLN(FO)MLN(AL)

Page 49: Learning the Structure of  Markov Logic Networks

49

Systems

MLN(SLB) MLN(SLS)

NB: Naïve Bayes

BN: Bayesian

networks

KBCLFOAL

MLN(KB)MLN(CL)MLN(FO)MLN(AL)

Page 50: Learning the Structure of  Markov Logic Networks

50

Methodology UW-CSE domain

DB divided into 5 areas: AI, Graphics, Languages, Systems, Theory

Leave-one-out testing by area Measured

average CLL of the ground predicates average area under the precision-recall curve

of the ground predicates (AUC)

Page 51: Learning the Structure of  Markov Logic Networks

51

0.5330.472

0.306

0.140 0.148

0.429

0.1700.131 0.117

0.266

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

-1.0

-0.5

0.0M

LN(S

LS)

MLN

(SLB

)

MLN

(CL)

MLN

(FO

)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

CLL

AU

C

MLN

(SLS

)M

LN(S

LB)

MLN

(CL)

MLN

(FO)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

UW-CSE

Page 52: Learning the Structure of  Markov Logic Networks

52

0.5330.472

0.306

0.140 0.148

0.429

0.1700.131 0.117

0.266

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

-1.0

-0.5

0.0M

LN(S

LS)

MLN

(SLB

)

MLN

(CL)

MLN

(FO

)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

CLL

AU

C

MLN

(SLS

)M

LN(S

LB)

MLN

(CL)

MLN

(FO)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

UW-CSE

Page 53: Learning the Structure of  Markov Logic Networks

53

0.5330.472

0.306

0.140 0.148

0.429

0.1700.131 0.117

0.266

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

-1.0

-0.5

0.0M

LN(S

LS)

MLN

(SLB

)

MLN

(CL)

MLN

(FO

)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

CLL

AU

C

MLN

(SLS

)M

LN(S

LB)

MLN

(CL)

MLN

(FO)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

UW-CSE

Page 54: Learning the Structure of  Markov Logic Networks

54

0.5330.472

0.306

0.140 0.148

0.429

0.1700.131 0.117

0.266

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.151-0.208 -0.223

-0.142

-0.574-0.661

-0.579

-0.812

-1.0

-0.5

0.0M

LN(S

LS)

MLN

(SLB

)

MLN

(CL)

MLN

(FO

)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

CLL

AU

C

MLN

(SLS

)M

LN(S

LB)

MLN

(CL)

MLN

(FO)

MLN

(AL)

MLN

(KB)

CL

FO AL

KB

UW-CSE

Page 55: Learning the Structure of  Markov Logic Networks

55

0.533

0.472

0.390 0.397

0.0

0.3

0.6

UW-CSE-0.061 -0.088

-0.370

-0.166

-1.0

-0.5

0.0

CLL

AU

C

MLN

(SLS

)

MLN

(SLB

)

NB

BN

MLN

(SLS

)

MLN

(SLB

)

NB

BN

UW-CSE

Page 56: Learning the Structure of  Markov Logic Networks

56

Timing MLN(SLS) on UW-CSE

Cluster of 15 dual-CPUs 2.8 GHz Pentium 4 machines

Without speedups: did not finish in 24 hrs With speedups: 5.3 hrs

Page 57: Learning the Structure of  Markov Logic Networks

57

4.0

21.6

8.46.5

4.1

24.8

0.0

10.0

20.0

30.0

Lesion Study Disable one speedup technique at a time; SFS

UW-CSE (one-fold)

Hour

all speedups

no clausesampling

no predicatesampling

don’t avoidredundancy

no looseconverg.

threshold

no weight thresholding

Page 58: Learning the Structure of  Markov Logic Networks

58

Overview Motivation Background Structure Learning Algorithm Experiments Future Work & Conclusion

Page 59: Learning the Structure of  Markov Logic Networks

59

Future Work Speed up counting of # true

groundings of clause Probabilistically bound the loss in

accuracy due to subsampling Probabilistic predicate discovery

Page 60: Learning the Structure of  Markov Logic Networks

60

Conclusion Markov logic networks: a powerful combination

of first-order logic and probability Richardson & Domingos (2004) did not learn

MLN structure We develop an algorithm that automatically learns

both first-order clauses and their weights We develop speedup techniques to make our

algorithm fast enough to be practical We show experimentally that our algorithm

outperforms Richardson & Domingos Pure ILP Purely KB approaches Purely probabilistic approaches

(For software, email: [email protected])


Recommended