Margarit is May 15 Reading Group

7/28/2019 Margarit is May 15 Reading Group

1/66

Exploiting PearlsTheorems for GraphicalModel Structure

Discovery

Dimitris Margaritis

(joint work with Facundo Bromberg and

Vasant Honavar)

Department of Computer Science

Iowa State University


2/66

2 / 66

The problem

General problem:

Learn probabilistic graphical models from data

Specific problem: Learn the structure of probabilistic graphical models


3/66

3 / 66

Why graphical probabilisticmodels?

Tools forreasoning under uncertainty

can use them to calculate the probability of any

propositional formula (probabilistic inference) given the

facts (known values of some variables)

Efficient representation of the joint probability usingconditional independences

Most popular graphical models:

Markov networks (undirected) Bayesian networks (directed acyclic)


4/66

4 / 66

Markov Networks

Defineneighborhood structure among variables (i,j):

MNs assumption: Siconditionally independent of all butits neighbors:

Intuitively: variable X is conditionally independent (CI) of variable Ygiven set of variables ZifZshields any influence between X to Y

Notation:

Implies decomposition:


5/66

5 / 66

Markov Network Example

Target random variable: crop yield X

Observable random variables:

Soil acidity Y1

Soil humidity Y2

Concentration of potassium Y3

Concentration of sodium Y4


6/66

6 / 66

Example: Markov network for cropfield

The crop field is organized spatially as a regular grid

Defines adependency structure

that matches spatial

structure


7/66

7 / 66

Markov Networks (MN)

We can represent structure graphically using Markovnetwork G=(V, E):

V: nodes represent random variables,

E: undirected edges represent structure i.e.,

(i; j) 2 E () (i; j) 2 N

Example MN for:

V = f0, 1,2, 3,4,5,6,7g

N = f(1; 4); (4; 7); (7; 0); (7; 5);(6; 5); (0; 3); (5; 3); (3; 2)g


8/66

8 / 66

Markov network semantics

The CIs of probability distribution Pare be encoded in aMN G by vertex-separation:

3 ??= 7 j f0g

3 ?? 7 j f0; 5g

(Pearl 88) If the CIs in the graph match exactly those

of distribution P, P is said to be graph-isomorph.

Denoting conditional dependence by ,


9/66

9 / 66

True probability distribution:

Unknown

The problem revisited

Learnstructure of Markov networks from data

Data sampled from distribution:

Known!

Learning

algorithm

Pr(1,2, ; 7)

Learned networkTrue network


10/66

10 / 66

Structure Learning of GraphicalModels

Approaches toStructure Learning:

Search for graphwith optimal score

(Likelihood, MDL) Score computation

intractable in Markov

networks

Score-based

Infer graph using

information ofindependences that

hold in underlying

model

Independence

based

Other

isolated

approaches


11/66

11 / 66

so this structure (e.g.) is inconsistent!but this, instead, is consistent!

Is variable 7 independent of

variable 3 given variables {0,5}?

Independence-based approach

Assumes existence ofindependence-query oracle thatanswers the CIs that hold in the true probability distribution

Proceeds iteratively:1. Query independence query oracle for CI value h in true model

2. Discardstructures that violate CI h3. Repeat until a single structure is left (uniqueness under assumptions)

Oracle says NO:3 ??= 7 j f0; 5g

independence query oracle


12/66

12 / 66

But an oracle does not exist!

Can be approximated by a statistical independencetest(SIT) e.g. Pearsons c2or Wilks G2

Given as input:

a data set D (sampled from the true distribution), and

a triplet (X,Y| Z)

The SIT computes thep-value: probability of error in

assuming dependence when in fact variables are

independent

and decides:


13/66

13 / 66

Outline

Introductory Remarks

The GSMN and GSIMN algorithms The Argumentative Independence Test

Conclusions


14/66

14 / 66

GSMN and GSIMN Algorithms


15/66

15 / 66

GSMN algorithm

We introduce (the first) two independence-based

algorithms for MN structure learning: GSMN and

GSIMN

GSMN (Grow-Shrink Markov Network structure

inference algorithm) is a direct adaptation of the grow-

shrink (GS) algorithm (Margaritis, 2000) for learning a

variables Markov blanket using independence tests

D e nitio n: A M arkov b la nket B L(X) ofX 2 V is a ny s ub se t S o f va ri ab le sth at s hie ldX from all oth ers varia b l es , t ha t is , (X ?? V S f X g j S).


16/66

16 / 66

Markov blanket is the set ofneighbors in the

structure (Pearl and Paz 85).

Therefore, we can learn the structure by learning theMarkov blankets:

GSMN (contd)

1: for ever y X 2 V

2: BL(X) get Markov blanket of X using GS algorithm.

3: for every Y 2 BL(X)

4: add e dge (X; Y) to E(G):

GSMNextends above algorithm with heuristic ordering

for grow and shrink phases of GS

N


17/66

17 / 66

Initially No Arcs

C

A

B

D

F G

E

K L


18/66

18 / 66

G

Markov blanket of A = {}

B

Growing phase

C

A

D

F

E

K L

1. B dependent ofA

given {}?

2. F dependent ofA

given {B}?3. G dependent ofA

given {B}?

4. C dependent ofA

given {B,G}?

6. D dependent ofA

given {B,G,C,K}?

7. E dependent ofA

given {B,G,C,K,D}?

5. K dependent ofA

given {B,G,C}?

8. L dependent ofA

given {B,G,C,K,D,E}?

F

L

Markov blanket of A = {B}

B

Markov blanket of A = {B,G}

G

Markov blanket of A = {B,G,C}

C

Markov blanket of A = {B,G,C,K}

K

Markov blanket of A = {B,G,C,K,D}

D

Markov blanket of A = {B,G,C,K,D,E}

E


19/66

19 / 66

Markov blanket of A = {B,G,C,K,D,E}

Minimum

Markov Blanket

Shrinking phase

C

A

B

D

F G

K L

9. G dependent ofA

given {B,C,K,D,E}?

(i.e. the set-{G})

E

10. K dependent ofA

given {B,C,D,E}?

Markov blanket ofA = {B,C,D,E}Markov blanket ofA = {B,C,K,D,E}


20/66

20 / 66

GSIMN

Undirected axioms (Pearl 88)

GSIMN (Grow-Shrink Inference Markov Network) usesproperties ofCIsas inference rules to infernovel tests,

avoiding costly SITs.

Pearl (88) introduced properties satisfied by the CIs ofdistributions isomorphic to Markov networks:

GSIMN modifies GSMN by exploiting these axioms to

infer novel tests


21/66

21 / 66

Axioms as inference rules

=) (1 ?? 3 j f4g)(1 ?? 7 j f4g) ^ (7 ??= 3 j f4g)

[ Transitivity] (X ?? W j Z) ^(W 6?? Y j Z) =) (X ?? Y j Z)


22/66

22 / 66

Triangle theorems

(X ??W j Z1) ^ (W 6?? Y j Z

1

[ Z2)

=) (X ?? Y j Z1):

(X 6??W j Z1) ^ (W 6?? Y j Z

2)

=) (X 6?? Y j Z1\Z

2)

GSIMN actually uses the Triangle Theorem rules,

derived from (only): Strong Union and Transitivity:

Rearranges GSMN visit order to maximize benefits

Applies these rules only once (as opposed to

computing the closure)

Despite these simplifications, GSIMN infers >95% of

inferable tests (shown experimentally)


23/66

23 / 66

Experiments

Our goal: Demonstrate GSIMN requires fewer tests

than GSMN, without significantly affectingaccuracy


24/66

24 / 66

Results for exact learning

We assume independence query oracle, so tests are 100% accurate

output network = true network(proof omitted)


25/66

25 / 66

Sampled data: weighted number oftests


26/66

26 / 66

Sampled data: Accuracy


27/66

27 / 66

Real-world data

More challenging because: Non-random topologies (e.g. regular lattices, small world,

chains, etc.)

Underlying distribution may not be graph-isomorph


28/66

28 / 66

Outline

Introductory Remarks

The GSMN and GSIMN algorithms The Argumentative Independence Test

Conclusions


29/66

29 / 66

The Argumentative Independence Test(AIT)


30/66

30 / 66

The Problem

Statistical Independence tests (SITs) unreliable for

small data sets

Produce erroneous networks when used by

independence-based algorithms

This problem is one of the most important criticisms of

independence-based approach

Our contribution A new general purpose independence test: the

argumentative independence test orAIT that

improves reliability for small data sets


31/66

31 / 66

Main Idea

The new independence test (AIT) improves accuracy

by correcting outcomes of a statistical independence

test (SIT):

Incorrect SITs may produce CIs inconsistent with Pearls

properties of conditional independences Thus, resolving inconsistencies among SITs may correct

the errors

Propositional knowledge base (KB)

propositions are CIs (i.e., for (X, Y| Z),

or )

inference rulesare Pearls conditional independence axioms


32/66

32 / 66

Pearls axioms

We presented above the undirected axioms Pearl (1988) also introduced, forany distribution:

general axioms

Directed axioms

For distributions isomorphic to directed graphs:


33/66

33 / 66

Example

Consider the following KB of CIs, constructed using aSIT.

A.

B.

C.

Assume Cis wrong (SITs mistake).

Assuming the Composition axiom holds, then

D.

Inconsistency: D and Ccontradict each other

( 0 ?? 1 j f2; 3g)

( 0 ?? 4 j f2; 3g)

(0 6?? f1; 4g j f2; 3g)

( 0 ?? 1 j f2; 3g) ^( 0 ?? 4 j f2; 3g) =) ( 0 ?? f1; 4g j f2; 3g)


34/66

34 / 66

Example (contd)

( 0 ?? 1 j f2; 3g)

( 0 ?? 4 j f2; 3g)(0 6?? f1; 4g j f2; 3g)

A.B.C.

( 0 ?? 1 j f2; 3g) ^( 0 ?? 4 j f2; 3g) =) ( 0 ?? f1; 4g j f2; 3g)D.

Inconsistent andIncorrect KB:Consistent butIncorrect KB:

Consistent andcorrect KB:

At least two ways to resolve inconsistency: rejectingD orrejecting C

If we can resolve inconsistency in favor ofD, error

could be corrected

The argumentation framework presented nextprovides a principled approach for resolving

inconsistencies


35/66

35 / 66

Preference-based ArgumentationFramework

Instance ofdefeasible (non-monotonic) logics

Main contributors: Dung 95 (basic framework), Amgoud

and Cayrol 02 (added preferences)

The framework consists on three elements:

Set of arguments

Attack relation among arguments

Preference order over arguments

PAF=hA; R; i

A:R::


36/66

36 / 66

Arguments

Argument (H, h) is an if-then rule (ifHthen h) Support His a set ofconsistent propositions

Headh

In independence KBs if-then rules are instances(propositionalizations) of Pearls universally quantified

rules. For example these

are instances ofWeak Union:

Propositional arguments: arguments ({h}, h) forindividual CI proposition h


37/66

37 / 66

Example

The set of arguments corresponding to KB of previous

example is:

A.B.C.D.

Name (H, h) Correct?(f(0 ?? 1 j f2; 3g)g; ( 0 ?? 1 j f2; 3g) )

(f(0 ?? 4 j f2; 3g)g; ( 0 ?? 4 j f2; 3g) )

(f(0 6?? f1; 4g j f2; 3g)g; ( 0 6?? f1; 4g j f2; 3g))f( 0 ?? 1 j f2; 3g); ( 0 ?? 4 j f2; 3g)g; ( 0 ?? f1; 4g j f2; 3g)


38/66

38 / 66

Preferences

Preference overarguments obtained from

preferences over CI propositions

We say argument (H, h) preferred over argument (H,h) iff it is more likely forall propositions in H to be

correct:

The probability n(h) that h is correct is obtained fromp-value ofh, computed using a statistical test (SIT)

on data


39/66

39 / 66

Example

Lets extend the arguments with preferences:

A.B.C.D.

Name (H,h) Correct? n(H)

0.8

0.7

0.5

0.8x0.7=0.56

(f(0 ?? 1 j f2; 3g)g; (0 ?? 1 j f2; 3g) )

(f(0 ?? 4 j f2; 3g)g; (0 ?? 4 j f2; 3g) )

(f( 0 6?? f1; 4g j f2; 3g)g; (0 6?? f1; 4g j f2; 3g))f( 0 ?? 1 j f2; 3g); ( 0 ?? 4 j f2; 3g)g; ( 0 ?? f1; 4g j f2; 3g)

R


40/66

40 / 66

Attack relation

Since argument (H1,h1) models ifHthenhrules, it can belogically contradicted by (H2,h2) if:

(H1,h1) rebuts (H2,h2) iff h1h2 (H1,h1) undercuts (H2,h2) iff $hH2such thathh1

R

Definition:Argumentbattacks argument a iffblogically

contradicts a and ais not preferred overb

The attack relation formalizes and extends the notion oflogical contradiction:


41/66

41 / 66

Example

A.B.C.D.

C and Drebut each other, and

C is not preferred overD, so

DattacksC

Name (H, h) Correct? n(H)

0.8

0.7

0.5

0.8x0.7=0.56

(f(0 ?? 1 j f2; 3g)g; (0 ?? 1 j f2; 3g) )

(f(0 ?? 4 j f2; 3g)g; (0 ?? 4 j f2; 3g) )

(f(0 6?? f1; 4g j f2; 3g)g; (0 6?? f1; 4g j f2; 3g) )f( 0 ?? 1 j f2; 3g); ( 0 ?? 4 j f2; 3g)g; (0 ?? f1; 4g j f2; 3g)


42/66

42 / 66

Inference = Acceptability

Inference modeled in argumentation frameworks by

acceptability

An argument r is:

inferred iff it is accepted

not inferred iffrejected, or

in abeyance if neither

Dung-Amgouds idea: accept argument r if r is not attacked, or

r is attacked, but its attackers are also attacked


43/66

43 / 66

Example

A.B.C.D. We had that DattacksC (and no other attack).

Since nothing attacks D, D is accepted.

C is attacked by an accepted argument, so C is rejected.

Argumentation resolved the inconsistency in favor of

correct proposition D!

In practice, we have thousands of arguments. How to

compute acceptability status of all of them?

Name (H, h) Correct? n(H)0.8

0.7

0.5

0.8x0.7=0.56

(f(0 ?? 1 j f2; 3g)g; (0 ?? 1 j f2; 3g) )

(f(0 ?? 4 j f2; 3g)g; (0 ?? 4 j f2; 3g) )

(f( 0 6?? f1; 4g j f2; 3g)g; (0 6?? f1; 4g j f2; 3g) )f( 0 ?? 1 j f2; 3g); ( 0 ?? 4 j f2; 3g)g; ( 0 ?? f1; 4g j f2; 3g)


44/66

44 / 66

Computing Acceptability Bottom-up

accept if not attacked, or if all attackers attacked.


45/66

45 / 66




46/66

46 / 66




47/66

47 / 66




48/66

48 / 66




49/66

49 / 66

Top-down algorithm

Bottom-up algorithm highly inefficient

Computes acceptability ofall possible arguments

Top-down is an alternative

Given argument r, it responds whetherr accepted or

rejected

accept if all attackers are rejected, and

reject if at least one attacker is accepted

We illustrate this with an example


50/66

50 / 66

Computing Acceptability Top-down

accept if all attackers rejected, reject if at least one accepted.

1

2

3

5

4

6

9 12

11

8

710

13

7 Target node


51/66

51 / 66



1

2

3

5

4

6

9 12

11

8

710

13

7 Target node

3

6

11 attackers


52/66

52 / 66



1

2

3

5

4

6

9 12

11

8

710

13

7 Target node

3

6

11 attackers

4 512


53/66

53 / 66



1

2

3

5

4

6

9 12

11

8

710

13

7 Target node

3

6

11

4 512

leaf


54/66

54 / 66



1

2

3

5

4

6

9 12

11

8

710

13

7 Target node

3

6

11

4 512

2 1 13

leaf

leaf leaf leaf


55/66

55 / 66



1

2

3

5

4

6

9 12

11

8

710

13

7 Target node

3

6

11

4 512

2 1 13


56/66

56 / 66



1

2

3

5

4

6

9 12

11

8

710

13

7 Target node

3 611

4 512

2 1 13


57/66

57 / 66



1

2

3

5

4

6

9 12

11

8

710

13

7 Target node

3 611

4 512

2 1 13

We didnt evaluate arguments 8, 9 and 10!


58/66

58 / 66

Approximate top-down algorithm

It is a tree-traversal, we chose iterative deepening

Time complexity: O(bd)

Difficulties:1. Exponential in depth d.

2. By nature of Pearl rules, # attackers of some nodes(branching factorb) may be exponential

Approximation: To solve (1), we limit d to 3.

To solve (2), we consider an alternative propositionalizationof Pearls rules that bounds b to polynomial size (details

omitted here)

b=3

d=3


59/66

59 / 66

Experiments

We considered 3 variations of each AIT, one per set of

Pearl axioms: general, directed, and undirected

Experiments on data sampled from Markov and

Bayesian networks (directed graphical models)

Approximate top down algorithm:


60/66

60 / 66

Approximate top-down algorithm:accuracy on data

Axioms: generalTruemodel: BN

Axioms: directed

Truemodel: BN

Axioms: general

Truemodel: MN

Axioms: undirected

Truemodel: MN


61/66

61 / 66

Top-down runtime: approximate vs. exact

PC algorithm

GSMN

algorithm

We show results only for specific axioms


62/66

62 / 66

Top-down accuracy: approx vs. exact

Experiments show accuracies of both match in all but fewcases: (only specific axioms)


63/66

63 / 66

Conclusions


64/66

64 / 66

Summary

I presented two uses of Pearls independenceaxioms/theorems:

1. the GSIMN algorithm

Uses axioms to infer independence test results from

known ones when learning the domain Markov network faster execution

2. The AIT general-purpose independence test

Uses multiple tests on data and the axioms as integrityconstraints to return the most reliable value

more reliable tests on small data sets


65/66

65 / 66

Further Research

Explore other methods of resolving inconsistencies inKB of known independences

Use such constraints to improve Bayesian network

and Markov network structure learning from smalldata sets (instead of just improving individual tests)

Develop faster methods of inferring independences

using Pearls axiomsProlog tricks?


66/66

Thank you!

Questions?

Date post:	03-Apr-2018
Category:	Documents
Upload:	fauziah-rustam
View:	218 times
Download:	0 times

Margarit is May 15 Reading Group

Documents