+ All Categories
Home > Documents > Probabilistic Graphical Models

Probabilistic Graphical Models

Date post: 22-Jan-2016
Category:
Upload: ban
View: 33 times
Download: 0 times
Share this document with a friend
Description:
Probabilistic Graphical Models. David Madigan Rutgers University [email protected]. IF the infection is primary-bacteremia AND the site of the culture is one of the sterile sites AND the suspected portal of entry is the gastrointestinal tract - PowerPoint PPT Presentation
Popular Tags:
26
Probabilistic Graphical Models David Madigan Rutgers University [email protected]
Transcript
Page 1: Probabilistic Graphical Models

Probabilistic Graphical Models

David MadiganRutgers University

[email protected]

Page 2: Probabilistic Graphical Models

Expert Systems

•Explosion of interest in “Expert Systems” in the early 1980’s

IF the infection is primary-bacteremia AND the site of the culture is one of the sterile sites

AND the suspected portal of entry is the gastrointestinal tract THEN there is suggestive evidence (0.7) that infection is bacteroid.

•Many companies (Teknowledge, IntelliCorp, Inference, etc.), many IPO’s, much media hype

•Ad-hoc uncertainty handling

Page 3: Probabilistic Graphical Models

Uncertainty in Expert Systems

If A then C (p1)If B then C (p2)

What if both A and B true?

Then C true with CF:

p1 + (p2 X (1- p1))

“Currently fashionable ad-hoc mumbo jumbo”A.F.M. Smith

Page 4: Probabilistic Graphical Models

Eschewed Probabilistic Approach

•Computationally intractable

•Inscrutable

•Requires vast amounts of

data/elicitatione.g., for n dichotomous variables need 2n - 1 probabilities to fully specify the joint distribution

Page 5: Probabilistic Graphical Models

Conditional Independence

X Y | Z )|()|()|,( |||, zyfzxfzyxf ZYZXZYX

Page 6: Probabilistic Graphical Models

Conditional Independence

•Suppose A and B are marginally

independent. Pr(A), Pr(B), Pr(C|AB) X 4

= 6 probabilities

•Suppose A and C are conditionally

independent given B: Pr(A), Pr(B|A) X 2,

Pr(C|B) X 2 = 5

•Chain with 50 variables requires 99

probabilities versus 2100-1

A B C

C A | B

Page 7: Probabilistic Graphical Models

Properties of Conditional Independence (Dawid, 1980)

CI 1: A B [P] B A [P]

CI 2: A B C [P] A B [P]

CI 3: A B C [P] A B | C [P]

CI 4: A B and A C | B [P] A B C [P]

For any probability measure P and random variables A, B, and C:

Some probability measures also satisfy:

CI 5: A B | C and A C | B [P] A B C [P]

CI5 satisfied whenever P has a positive joint probability density with respect to some product

measure

Page 8: Probabilistic Graphical Models

Markov Properties for Undirected Graphs

A

E

B

C

D

(Global) S separates A from B A B | S

(Local) V \ cl() | bd ()

(Pairwise) | V \ {,}

(G) (L) (P)

B E, D | A, C (1)

B D | A, C, E (2)

To go from (2) to (1) need E B | A,C? or CI5

Lauritzen, Dawid, Larsen & Leimer (1990)

Page 9: Probabilistic Graphical Models

Factorizations

A density f is said to “factorize according to G” if: f(x) = C(xC)

C C

Proposition: If f factorizes according to a UG G, then it also obeys the global Markov property

“Proof”: Let S separate A from B in G and assume Let CA be the set of cliques with non-empty intersection with A. Since S separates A from B, we must have for all C in CA. Then:

“clique potentials”• cliques are maximally complete subgraphs

.SBAV

CB

)()()()()( 21\

SBSACCC

CCCC

CC xfxfxxxfAA

Page 10: Probabilistic Graphical Models

Markov Properties for Acyclic Directed Graphs(Bayesian Networks)

B

A

(Global) S separates A from B in Gan(A,B,S)m A B | S

(Local) nd()\pa() | pa ()

(G) (L)

Lauritzen, Dawid, Larsen & Leimer (1990)

S

B

A

S

Page 11: Probabilistic Graphical Models

Factorizations

ADG Global Markov Property f(x) = f(xv | xpa(v) ) v V

A density f admits a “recursive factorization” according to an ADG G if f(x) = f(xv | xpa(v) )

Lemma: If P admits a recursive factorization according to an ADG G, then P factorizes according GM (and chordal supergraphs of GM)

Lemma: If P admits a recursive factorization according to an ADG G, and A is an ancestral set in G, then PA admits a recursive factorization according to the subgraph GA

Page 12: Probabilistic Graphical Models

Markov Properties for Acyclic Directed Graphs(Bayesian Networks)

(Global) S separates A from B in Gan(A,B,S)m A B | S

(Local) nd()\pa() | pa ()

(G) (L) nd() is an ancestral set; pa() obviouslyseparates from nd()\pa() in Gan(nd())

m

(L) (factorization) induction on the number of vertices

Page 13: Probabilistic Graphical Models

d-separationA chain from a to b in an acyclic directed graph G is said to be blocked by S if it contains a vertex such that either:

- S and arrows of do not meet head to head at , or

- S nor has any descendents in S, and arrows of do meet head to head at gTwo subsets A and B are d-separated by S if all chains from A to B are blocked by S

Page 14: Probabilistic Graphical Models
Page 15: Probabilistic Graphical Models

d-separation and global markov property

Let A, B, and S be disjoint subsets of a directed, acyclic graph, G. Then S d-separates A from B if

and only if S separates A from B in Gan(A,B,S)m

Page 16: Probabilistic Graphical Models

UG – ADG Intersection

A

C

B

A B C

A B C

A C

C A | B

A C | B

A

C

D

A B CA B | C,DC D | A,B

A C | B

B

A B C

A C | B

Page 17: Probabilistic Graphical Models

UG – ADG Intersection

UG ADG

Decomposable

•UG is decomposable if chordal

•ADG is decomposable if moral

•Decomposable ~ closed-form log-

linear models

No CI5

Page 18: Probabilistic Graphical Models

Chordal Graphs and RIP

•Chordal graphs (uniquely) admit

clique orderings that have the

Running Intersection Property

T

V

L

A

X D B

S

1. {V,T}

2. {A,L,T}

3. {L,A,B}

4. {S,L,B}

5. {A,B,D}

6. {A,X}

•The intersection of each set with those earlier in the list is fully contained in previous set•Can compute cond. probabilities (e.g. Pr(X|V)) by message passing (Lauritzen & Spiegelhalter, Dawid, Jensen)

Page 19: Probabilistic Graphical Models

Probabilistic Expert System

•Computationally intractable

•Inscrutable

•Requires vast amounts of

data/elicitation•Chordal UG models facilitate fast inference

•ADG models better for expert system applications – more natural to specify Pr( v | pa(v) )

Page 20: Probabilistic Graphical Models

Factorizations

UG Global Markov Property f(x) = C(xC)C C

ADG Global Markov Property f(x) = f(xv | xpa(v) ) v V

Page 21: Probabilistic Graphical Models

Lauritzen-Spiegelhalter Algorithm

B

A

S

C D

(C,S,D) Pr(S|C, D)(A,E) Pr(E|A) Pr(A) (C,E) Pr(C|E)(F,D,B) Pr(D|F)Pr(B|F)Pr(F) (D,B,S) 1 (B,S,G) Pr(G|S,B) (H,S) Pr(H|S)

Algorithm is widely deployed in commercial software

E F

GH

B

A

S

C D

E F

GH

•Moralize•Triangulate

Page 22: Probabilistic Graphical Models

L&S Toy Example

A B C Pr(C|B)=0.2 Pr(C|¬B)=0.6Pr(B|A)=0.5 Pr(B|¬A)=0.1Pr(A)=0.7

(A,B) Pr(B|A)Pr(A) (B,C) Pr(C|B)

A B C

AB B BC A

B

0.35 0.35

¬A 0.03 0.27

¬B

B

C

0.2 0.8

¬B 0.6 0.4

¬CB

1 1

¬B

Message Schedule: AB BC BC AB

B

0.38 0.62

¬BB

C

0.0760.304

¬B0.3720.248

¬CB

C

0.076 0

¬B0.372 0

¬C

Pr(A|C)

Page 23: Probabilistic Graphical Models

Other Theoretical Developments

Do the UG and ADG global Markov properties identify all the conditional independences implied by the corresponding factorizations?

Yes. Completeness for ADGs by Geiger and Pearl (1988); for UGs by Frydenberg (1988)

Graphical characterization of collapsibility in hierarchical log-linear

models (Asmussen and Edwards, 1983)

Page 24: Probabilistic Graphical Models

Bayesian Learning for ADG’s

• Example: three binary variables

• Five parameters:

Page 25: Probabilistic Graphical Models

Local and Global Independence

Page 26: Probabilistic Graphical Models

Bayesian learningConsider a particular state pa(v)+ of pa(v)


Recommended