Date post: | 03-Apr-2018 |
Category: |
Documents |
Upload: | fauziah-rustam |
View: | 218 times |
Download: | 0 times |
of 66
7/28/2019 Margarit is May 15 Reading Group
1/66
Exploiting PearlsTheorems for GraphicalModel Structure
Discovery
Dimitris Margaritis
(joint work with Facundo Bromberg and
Vasant Honavar)
Department of Computer Science
Iowa State University
7/28/2019 Margarit is May 15 Reading Group
2/66
2 / 66
The problem
General problem:
Learn probabilistic graphical models from data
Specific problem: Learn the structure of probabilistic graphical models
7/28/2019 Margarit is May 15 Reading Group
3/66
3 / 66
Why graphical probabilisticmodels?
Tools forreasoning under uncertainty
can use them to calculate the probability of any
propositional formula (probabilistic inference) given the
facts (known values of some variables)
Efficient representation of the joint probability usingconditional independences
Most popular graphical models:
Markov networks (undirected) Bayesian networks (directed acyclic)
7/28/2019 Margarit is May 15 Reading Group
4/66
4 / 66
Markov Networks
Defineneighborhood structure among variables (i,j):
MNs assumption: Siconditionally independent of all butits neighbors:
Intuitively: variable X is conditionally independent (CI) of variable Ygiven set of variables ZifZshields any influence between X to Y
Notation:
Implies decomposition:
7/28/2019 Margarit is May 15 Reading Group
5/66
5 / 66
Markov Network Example
Target random variable: crop yield X
Observable random variables:
Soil acidity Y1
Soil humidity Y2
Concentration of potassium Y3
Concentration of sodium Y4
7/28/2019 Margarit is May 15 Reading Group
6/66
6 / 66
Example: Markov network for cropfield
The crop field is organized spatially as a regular grid
Defines adependency structure
that matches spatial
structure
7/28/2019 Margarit is May 15 Reading Group
7/66
7 / 66
Markov Networks (MN)
We can represent structure graphically using Markovnetwork G=(V, E):
V: nodes represent random variables,
E: undirected edges represent structure i.e.,
(i; j) 2 E () (i; j) 2 N
Example MN for:
V = f0, 1,2, 3,4,5,6,7g
N = f(1; 4); (4; 7); (7; 0); (7; 5);(6; 5); (0; 3); (5; 3); (3; 2)g
7/28/2019 Margarit is May 15 Reading Group
8/66
8 / 66
Markov network semantics
The CIs of probability distribution Pare be encoded in aMN G by vertex-separation:
3 ??= 7 j f0g
3 ?? 7 j f0; 5g
(Pearl 88) If the CIs in the graph match exactly those
of distribution P, P is said to be graph-isomorph.
Denoting conditional dependence by ,
7/28/2019 Margarit is May 15 Reading Group
9/66
9 / 66
True probability distribution:
Unknown
The problem revisited
Learnstructure of Markov networks from data
Data sampled from distribution:
Known!
Learning
algorithm
Pr(1,2, ; 7)
Learned networkTrue network
7/28/2019 Margarit is May 15 Reading Group
10/66
10 / 66
Structure Learning of GraphicalModels
Approaches toStructure Learning:
Search for graphwith optimal score
(Likelihood, MDL) Score computation
intractable in Markov
networks
Score-based
Infer graph using
information ofindependences that
hold in underlying
model
Independence
based
Other
isolated
approaches
7/28/2019 Margarit is May 15 Reading Group
11/66
11 / 66
so this structure (e.g.) is inconsistent!but this, instead, is consistent!
Is variable 7 independent of
variable 3 given variables {0,5}?
Independence-based approach
Assumes existence ofindependence-query oracle thatanswers the CIs that hold in the true probability distribution
Proceeds iteratively:1. Query independence query oracle for CI value h in true model
2. Discardstructures that violate CI h3. Repeat until a single structure is left (uniqueness under assumptions)
Oracle says NO:3 ??= 7 j f0; 5g
independence query oracle
7/28/2019 Margarit is May 15 Reading Group
12/66
12 / 66
But an oracle does not exist!
Can be approximated by a statistical independencetest(SIT) e.g. Pearsons c2or Wilks G2
Given as input:
a data set D (sampled from the true distribution), and
a triplet (X,Y| Z)
The SIT computes thep-value: probability of error in
assuming dependence when in fact variables are
independent
and decides:
7/28/2019 Margarit is May 15 Reading Group
13/66
13 / 66
Outline
Introductory Remarks
The GSMN and GSIMN algorithms The Argumentative Independence Test
Conclusions
7/28/2019 Margarit is May 15 Reading Group
14/66
14 / 66
GSMN and GSIMN Algorithms
7/28/2019 Margarit is May 15 Reading Group
15/66
15 / 66
GSMN algorithm
We introduce (the first) two independence-based
algorithms for MN structure learning: GSMN and
GSIMN
GSMN (Grow-Shrink Markov Network structure
inference algorithm) is a direct adaptation of the grow-
shrink (GS) algorithm (Margaritis, 2000) for learning a
variables Markov blanket using independence tests
D e nitio n: A M arkov b la nket B L(X) ofX 2 V is a ny s ub se t S o f va ri ab le sth at s hie ldX from all oth ers varia b l es , t ha t is , (X ?? V S f X g j S).
7/28/2019 Margarit is May 15 Reading Group
16/66
16 / 66
Markov blanket is the set ofneighbors in the
structure (Pearl and Paz 85).
Therefore, we can learn the structure by learning theMarkov blankets:
GSMN (contd)
1: for ever y X 2 V
2: BL(X) get Markov blanket of X using GS algorithm.
3: for every Y 2 BL(X)
4: add e dge (X; Y) to E(G):
GSMNextends above algorithm with heuristic ordering
for grow and shrink phases of GS
N
7/28/2019 Margarit is May 15 Reading Group
17/66
17 / 66
Initially No Arcs
C
A
B
D
F G
E
K L
7/28/2019 Margarit is May 15 Reading Group
18/66
18 / 66
G
Markov blanket of A = {}
B
Growing phase
C
A
D
F
E
K L
1. B dependent ofA
given {}?
2. F dependent ofA
given {B}?3. G dependent ofA
given {B}?
4. C dependent ofA
given {B,G}?
6. D dependent ofA
given {B,G,C,K}?
7. E dependent ofA
given {B,G,C,K,D}?
5. K dependent ofA
given {B,G,C}?
8. L dependent ofA
given {B,G,C,K,D,E}?
F
L
Markov blanket of A = {B}
B
Markov blanket of A = {B,G}
G
Markov blanket of A = {B,G,C}
C
Markov blanket of A = {B,G,C,K}
K
Markov blanket of A = {B,G,C,K,D}
D
Markov blanket of A = {B,G,C,K,D,E}
E
7/28/2019 Margarit is May 15 Reading Group
19/66
19 / 66
Markov blanket of A = {B,G,C,K,D,E}
Minimum
Markov Blanket
Shrinking phase
C
A
B
D
F G
K L
9. G dependent ofA
given {B,C,K,D,E}?
(i.e. the set-{G})
E
10. K dependent ofA
given {B,C,D,E}?
Markov blanket ofA = {B,C,D,E}Markov blanket ofA = {B,C,K,D,E}
7/28/2019 Margarit is May 15 Reading Group
20/66
20 / 66
GSIMN
Undirected axioms (Pearl 88)
GSIMN (Grow-Shrink Inference Markov Network) usesproperties ofCIsas inference rules to infernovel tests,
avoiding costly SITs.
Pearl (88) introduced properties satisfied by the CIs ofdistributions isomorphic to Markov networks:
GSIMN modifies GSMN by exploiting these axioms to
infer novel tests
7/28/2019 Margarit is May 15 Reading Group
21/66
21 / 66
Axioms as inference rules
=) (1 ?? 3 j f4g)(1 ?? 7 j f4g) ^ (7 ??= 3 j f4g)
[ Transitivity] (X ?? W j Z) ^(W 6?? Y j Z) =) (X ?? Y j Z)
7/28/2019 Margarit is May 15 Reading Group
22/66
22 / 66
Triangle theorems
(X ??W j Z1) ^ (W 6?? Y j Z
1
[ Z2)
=) (X ?? Y j Z1):
(X 6??W j Z1) ^ (W 6?? Y j Z
2)
=) (X 6?? Y j Z1\Z
2)
GSIMN actually uses the Triangle Theorem rules,
derived from (only): Strong Union and Transitivity:
Rearranges GSMN visit order to maximize benefits
Applies these rules only once (as opposed to
computing the closure)
Despite these simplifications, GSIMN infers >95% of
inferable tests (shown experimentally)
7/28/2019 Margarit is May 15 Reading Group
23/66
23 / 66
Experiments
Our goal: Demonstrate GSIMN requires fewer tests
than GSMN, without significantly affectingaccuracy
7/28/2019 Margarit is May 15 Reading Group
24/66
24 / 66
Results for exact learning
We assume independence query oracle, so tests are 100% accurate
output network = true network(proof omitted)
7/28/2019 Margarit is May 15 Reading Group
25/66
25 / 66
Sampled data: weighted number oftests
7/28/2019 Margarit is May 15 Reading Group
26/66
26 / 66
Sampled data: Accuracy
7/28/2019 Margarit is May 15 Reading Group
27/66
27 / 66
Real-world data
More challenging because: Non-random topologies (e.g. regular lattices, small world,
chains, etc.)
Underlying distribution may not be graph-isomorph
7/28/2019 Margarit is May 15 Reading Group
28/66
28 / 66
Outline
Introductory Remarks
The GSMN and GSIMN algorithms The Argumentative Independence Test
Conclusions
7/28/2019 Margarit is May 15 Reading Group
29/66
29 / 66
The Argumentative Independence Test(AIT)
7/28/2019 Margarit is May 15 Reading Group
30/66
30 / 66
The Problem
Statistical Independence tests (SITs) unreliable for
small data sets
Produce erroneous networks when used by
independence-based algorithms
This problem is one of the most important criticisms of
independence-based approach
Our contribution A new general purpose independence test: the
argumentative independence test orAIT that
improves reliability for small data sets
7/28/2019 Margarit is May 15 Reading Group
31/66
31 / 66
Main Idea
The new independence test (AIT) improves accuracy
by correcting outcomes of a statistical independence
test (SIT):
Incorrect SITs may produce CIs inconsistent with Pearls
properties of conditional independences Thus, resolving inconsistencies among SITs may correct
the errors
Propositional knowledge base (KB)
propositions are CIs (i.e., for (X, Y| Z),
or )
inference rulesare Pearls conditional independence axioms
7/28/2019 Margarit is May 15 Reading Group
32/66
32 / 66
Pearls axioms
We presented above the undirected axioms Pearl (1988) also introduced, forany distribution:
general axioms
Directed axioms
For distributions isomorphic to directed graphs:
7/28/2019 Margarit is May 15 Reading Group
33/66
33 / 66
Example
Consider the following KB of CIs, constructed using aSIT.
A.
B.
C.
Assume Cis wrong (SITs mistake).
Assuming the Composition axiom holds, then
D.
Inconsistency: D and Ccontradict each other
( 0 ?? 1 j f2; 3g)
( 0 ?? 4 j f2; 3g)
(0 6?? f1; 4g j f2; 3g)
( 0 ?? 1 j f2; 3g) ^( 0 ?? 4 j f2; 3g) =) ( 0 ?? f1; 4g j f2; 3g)
7/28/2019 Margarit is May 15 Reading Group
34/66
34 / 66
Example (contd)
( 0 ?? 1 j f2; 3g)
( 0 ?? 4 j f2; 3g)(0 6?? f1; 4g j f2; 3g)
A.B.C.
( 0 ?? 1 j f2; 3g) ^( 0 ?? 4 j f2; 3g) =) ( 0 ?? f1; 4g j f2; 3g)D.
Inconsistent andIncorrect KB:Consistent butIncorrect KB:
Consistent andcorrect KB:
At least two ways to resolve inconsistency: rejectingD orrejecting C
If we can resolve inconsistency in favor ofD, error
could be corrected
The argumentation framework presented nextprovides a principled approach for resolving
inconsistencies
7/28/2019 Margarit is May 15 Reading Group
35/66
35 / 66
Preference-based ArgumentationFramework
Instance ofdefeasible (non-monotonic) logics
Main contributors: Dung 95 (basic framework), Amgoud
and Cayrol 02 (added preferences)
The framework consists on three elements:
Set of arguments
Attack relation among arguments
Preference order over arguments
PAF=hA; R; i
A:R::
7/28/2019 Margarit is May 15 Reading Group
36/66
36 / 66
Arguments
Argument (H, h) is an if-then rule (ifHthen h) Support His a set ofconsistent propositions
Headh
In independence KBs if-then rules are instances(propositionalizations) of Pearls universally quantified
rules. For example these
are instances ofWeak Union:
Propositional arguments: arguments ({h}, h) forindividual CI proposition h
7/28/2019 Margarit is May 15 Reading Group
37/66
37 / 66
Example
The set of arguments corresponding to KB of previous
example is:
A.B.C.D.
Name (H, h) Correct?(f(0 ?? 1 j f2; 3g)g; ( 0 ?? 1 j f2; 3g) )
(f(0 ?? 4 j f2; 3g)g; ( 0 ?? 4 j f2; 3g) )
(f(0 6?? f1; 4g j f2; 3g)g; ( 0 6?? f1; 4g j f2; 3g))f( 0 ?? 1 j f2; 3g); ( 0 ?? 4 j f2; 3g)g; ( 0 ?? f1; 4g j f2; 3g)
7/28/2019 Margarit is May 15 Reading Group
38/66
38 / 66
Preferences
Preference overarguments obtained from
preferences over CI propositions
We say argument (H, h) preferred over argument (H,h) iff it is more likely forall propositions in H to be
correct:
The probability n(h) that h is correct is obtained fromp-value ofh, computed using a statistical test (SIT)
on data
7/28/2019 Margarit is May 15 Reading Group
39/66
39 / 66
Example
Lets extend the arguments with preferences:
A.B.C.D.
Name (H,h) Correct? n(H)
0.8
0.7
0.5
0.8x0.7=0.56
(f(0 ?? 1 j f2; 3g)g; (0 ?? 1 j f2; 3g) )
(f(0 ?? 4 j f2; 3g)g; (0 ?? 4 j f2; 3g) )
(f( 0 6?? f1; 4g j f2; 3g)g; (0 6?? f1; 4g j f2; 3g))f( 0 ?? 1 j f2; 3g); ( 0 ?? 4 j f2; 3g)g; ( 0 ?? f1; 4g j f2; 3g)
R
7/28/2019 Margarit is May 15 Reading Group
40/66
40 / 66
Attack relation
Since argument (H1,h1) models ifHthenhrules, it can belogically contradicted by (H2,h2) if:
(H1,h1) rebuts (H2,h2) iff h1h2 (H1,h1) undercuts (H2,h2) iff $hH2such thathh1
R
Definition:Argumentbattacks argument a iffblogically
contradicts a and ais not preferred overb
The attack relation formalizes and extends the notion oflogical contradiction:
7/28/2019 Margarit is May 15 Reading Group
41/66
41 / 66
Example
A.B.C.D.
C and Drebut each other, and
C is not preferred overD, so
DattacksC
Name (H, h) Correct? n(H)
0.8
0.7
0.5
0.8x0.7=0.56
(f(0 ?? 1 j f2; 3g)g; (0 ?? 1 j f2; 3g) )
(f(0 ?? 4 j f2; 3g)g; (0 ?? 4 j f2; 3g) )
(f(0 6?? f1; 4g j f2; 3g)g; (0 6?? f1; 4g j f2; 3g) )f( 0 ?? 1 j f2; 3g); ( 0 ?? 4 j f2; 3g)g; (0 ?? f1; 4g j f2; 3g)
7/28/2019 Margarit is May 15 Reading Group
42/66
42 / 66
Inference = Acceptability
Inference modeled in argumentation frameworks by
acceptability
An argument r is:
inferred iff it is accepted
not inferred iffrejected, or
in abeyance if neither
Dung-Amgouds idea: accept argument r if r is not attacked, or
r is attacked, but its attackers are also attacked
7/28/2019 Margarit is May 15 Reading Group
43/66
43 / 66
Example
A.B.C.D. We had that DattacksC (and no other attack).
Since nothing attacks D, D is accepted.
C is attacked by an accepted argument, so C is rejected.
Argumentation resolved the inconsistency in favor of
correct proposition D!
In practice, we have thousands of arguments. How to
compute acceptability status of all of them?
Name (H, h) Correct? n(H)0.8
0.7
0.5
0.8x0.7=0.56
(f(0 ?? 1 j f2; 3g)g; (0 ?? 1 j f2; 3g) )
(f(0 ?? 4 j f2; 3g)g; (0 ?? 4 j f2; 3g) )
(f( 0 6?? f1; 4g j f2; 3g)g; (0 6?? f1; 4g j f2; 3g) )f( 0 ?? 1 j f2; 3g); ( 0 ?? 4 j f2; 3g)g; ( 0 ?? f1; 4g j f2; 3g)
7/28/2019 Margarit is May 15 Reading Group
44/66
44 / 66
Computing Acceptability Bottom-up
accept if not attacked, or if all attackers attacked.
7/28/2019 Margarit is May 15 Reading Group
45/66
45 / 66
Computing Acceptability Bottom-up
accept if not attacked, or if all attackers attacked.
7/28/2019 Margarit is May 15 Reading Group
46/66
46 / 66
Computing Acceptability Bottom-up
accept if not attacked, or if all attackers attacked.
7/28/2019 Margarit is May 15 Reading Group
47/66
47 / 66
Computing Acceptability Bottom-up
accept if not attacked, or if all attackers attacked.
7/28/2019 Margarit is May 15 Reading Group
48/66
48 / 66
Computing Acceptability Bottom-up
accept if not attacked, or if all attackers attacked.
7/28/2019 Margarit is May 15 Reading Group
49/66
49 / 66
Top-down algorithm
Bottom-up algorithm highly inefficient
Computes acceptability ofall possible arguments
Top-down is an alternative
Given argument r, it responds whetherr accepted or
rejected
accept if all attackers are rejected, and
reject if at least one attacker is accepted
We illustrate this with an example
7/28/2019 Margarit is May 15 Reading Group
50/66
50 / 66
Computing Acceptability Top-down
accept if all attackers rejected, reject if at least one accepted.
1
2
3
5
4
6
9 12
11
8
710
13
7 Target node
7/28/2019 Margarit is May 15 Reading Group
51/66
51 / 66
Computing Acceptability Top-down
accept if all attackers rejected, reject if at least one accepted.
1
2
3
5
4
6
9 12
11
8
710
13
7 Target node
3
6
11 attackers
7/28/2019 Margarit is May 15 Reading Group
52/66
52 / 66
Computing Acceptability Top-down
accept if all attackers rejected, reject if at least one accepted.
1
2
3
5
4
6
9 12
11
8
710
13
7 Target node
3
6
11 attackers
4 512
7/28/2019 Margarit is May 15 Reading Group
53/66
53 / 66
Computing Acceptability Top-down
accept if all attackers rejected, reject if at least one accepted.
1
2
3
5
4
6
9 12
11
8
710
13
7 Target node
3
6
11
4 512
leaf
7/28/2019 Margarit is May 15 Reading Group
54/66
54 / 66
Computing Acceptability Top-down
accept if all attackers rejected, reject if at least one accepted.
1
2
3
5
4
6
9 12
11
8
710
13
7 Target node
3
6
11
4 512
2 1 13
leaf
leaf leaf leaf
7/28/2019 Margarit is May 15 Reading Group
55/66
55 / 66
Computing Acceptability Top-down
accept if all attackers rejected, reject if at least one accepted.
1
2
3
5
4
6
9 12
11
8
710
13
7 Target node
3
6
11
4 512
2 1 13
7/28/2019 Margarit is May 15 Reading Group
56/66
56 / 66
Computing Acceptability Top-down
accept if all attackers rejected, reject if at least one accepted.
1
2
3
5
4
6
9 12
11
8
710
13
7 Target node
3 611
4 512
2 1 13
7/28/2019 Margarit is May 15 Reading Group
57/66
57 / 66
Computing Acceptability Top-down
accept if all attackers rejected, reject if at least one accepted.
1
2
3
5
4
6
9 12
11
8
710
13
7 Target node
3 611
4 512
2 1 13
We didnt evaluate arguments 8, 9 and 10!
7/28/2019 Margarit is May 15 Reading Group
58/66
58 / 66
Approximate top-down algorithm
It is a tree-traversal, we chose iterative deepening
Time complexity: O(bd)
Difficulties:1. Exponential in depth d.
2. By nature of Pearl rules, # attackers of some nodes(branching factorb) may be exponential
Approximation: To solve (1), we limit d to 3.
To solve (2), we consider an alternative propositionalizationof Pearls rules that bounds b to polynomial size (details
omitted here)
b=3
d=3
7/28/2019 Margarit is May 15 Reading Group
59/66
59 / 66
Experiments
We considered 3 variations of each AIT, one per set of
Pearl axioms: general, directed, and undirected
Experiments on data sampled from Markov and
Bayesian networks (directed graphical models)
Approximate top down algorithm:
7/28/2019 Margarit is May 15 Reading Group
60/66
60 / 66
Approximate top-down algorithm:accuracy on data
Axioms: generalTruemodel: BN
Axioms: directed
Truemodel: BN
Axioms: general
Truemodel: MN
Axioms: undirected
Truemodel: MN
7/28/2019 Margarit is May 15 Reading Group
61/66
61 / 66
Top-down runtime: approximate vs. exact
PC algorithm
GSMN
algorithm
We show results only for specific axioms
7/28/2019 Margarit is May 15 Reading Group
62/66
62 / 66
Top-down accuracy: approx vs. exact
Experiments show accuracies of both match in all but fewcases: (only specific axioms)
7/28/2019 Margarit is May 15 Reading Group
63/66
63 / 66
Conclusions
7/28/2019 Margarit is May 15 Reading Group
64/66
64 / 66
Summary
I presented two uses of Pearls independenceaxioms/theorems:
1. the GSIMN algorithm
Uses axioms to infer independence test results from
known ones when learning the domain Markov network faster execution
2. The AIT general-purpose independence test
Uses multiple tests on data and the axioms as integrityconstraints to return the most reliable value
more reliable tests on small data sets
7/28/2019 Margarit is May 15 Reading Group
65/66
65 / 66
Further Research
Explore other methods of resolving inconsistencies inKB of known independences
Use such constraints to improve Bayesian network
and Markov network structure learning from smalldata sets (instead of just improving individual tests)
Develop faster methods of inferring independences
using Pearls axiomsProlog tricks?
7/28/2019 Margarit is May 15 Reading Group
66/66
Thank you!
Questions?