+ All Categories
Home > Documents > Bayesian networks Chapter 14. Outline Syntax Semantics.

Bayesian networks Chapter 14. Outline Syntax Semantics.

Date post: 26-Dec-2015
Category:
Upload: david-page
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
41
Bayesian networks Chapter 14
Transcript
Page 1: Bayesian networks Chapter 14. Outline Syntax Semantics.

Bayesian networks

Chapter 14

Page 2: Bayesian networks Chapter 14. Outline Syntax Semantics.

Outline

• Syntax

• Semantics

Page 3: Bayesian networks Chapter 14. Outline Syntax Semantics.

Bayesian networks

• A simple, graphical notation for conditional independence assertions and hence for compact specification of full joint distributions

• Syntax:– a set of nodes, one per variable– a directed, acyclic graph (link ≈ "directly influences")– a conditional distribution for each node given its parents:

P (Xi | Parents (Xi))

• In the simplest case, conditional distribution represented as a conditional probability table (CPT) giving the distribution over Xi for each combination of parent values

Page 4: Bayesian networks Chapter 14. Outline Syntax Semantics.

Example

• Topology of network encodes conditional independence assertions:

• Weather is independent of the other variables• Toothache and Catch are conditionally independent

given Cavity

Page 5: Bayesian networks Chapter 14. Outline Syntax Semantics.

Example

• I'm at work, neighbor John calls to say my alarm is ringing, but neighbor Mary doesn't call. Sometimes it's set off by minor earthquakes. Is there a burglar?

• Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls

• Network topology reflects "causal" knowledge:– A burglar can set the alarm off– An earthquake can set the alarm off– The alarm can cause Mary to call– The alarm can cause John to call

Page 6: Bayesian networks Chapter 14. Outline Syntax Semantics.

Example contd.

Page 7: Bayesian networks Chapter 14. Outline Syntax Semantics.

Compactness

• A CPT for Boolean Xi with k Boolean parents has 2k rows for the combinations of parent values

• Each row requires one number p for Xi = true(the number for Xi = false is just 1-p)

• If each variable has no more than k parents, the complete network requires O(n · 2k) numbers

• I.e., grows linearly with n, vs. O(2n) for the full joint distribution

• For burglary net, 1 + 1 + 4 + 2 + 2 = 10 numbers (vs. 25-1 = 31)

Page 8: Bayesian networks Chapter 14. Outline Syntax Semantics.

Semantics

The full joint distribution is defined as the product of the local conditional distributions:

P (X1, … ,Xn) = πi = 1 P (Xi | Parents(Xi))

e.g., P(j m a b e)

= P (j | a) P (m | a) P (a | b, e) P (b) P (e)

n

Page 9: Bayesian networks Chapter 14. Outline Syntax Semantics.

Constructing Bayesian networks

• 1. Choose an ordering of variables X1, … ,Xn

• 2. For i = 1 to n– add Xi to the network– select parents from X1, … ,Xi-1 such that

P (Xi | Parents(Xi)) = P (Xi | X1, ... Xi-1)

This choice of parents guarantees:

P (X1, … ,Xn) = πi =1 P (Xi | X1, … , Xi-1) (chain rule)

= πi =1P (Xi | Parents(Xi)) (by construction)

•n

n

Page 10: Bayesian networks Chapter 14. Outline Syntax Semantics.

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J)?

Example

Page 11: Bayesian networks Chapter 14. Outline Syntax Semantics.

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J) No

P(A | J, M) = P(A | J)? P(A | J, M) = P(A)?

Example

Page 12: Bayesian networks Chapter 14. Outline Syntax Semantics.

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J) No

P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No

P(B | A, J, M) = P(B | A)?

P(B | A, J, M) = P(B)?

Example

Page 13: Bayesian networks Chapter 14. Outline Syntax Semantics.

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J) No

P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No

P(B | A, J, M) = P(B | A)? Yes

P(B | A, J, M) = P(B)? No

P(E | B, A ,J, M) = P(E | A)?

P(E | B, A, J, M) = P(E | A, B)?

Example

Page 14: Bayesian networks Chapter 14. Outline Syntax Semantics.

• Suppose we choose the ordering M, J, A, B, E

P(J | M) = P(J) No

P(A | J, M) = P(A | J)? P(A | J, M) = P(A)? No

P(B | A, J, M) = P(B | A)? Yes

P(B | A, J, M) = P(B)? No

P(E | B, A ,J, M) = P(E | A)? No

P(E | B, A, J, M) = P(E | A, B)? Yes

Example

Page 15: Bayesian networks Chapter 14. Outline Syntax Semantics.

Example contd.

• Deciding conditional independence is hard in noncausal directions

• (Causal models and conditional independence seem hardwired for humans!)

• Network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers needed

Page 16: Bayesian networks Chapter 14. Outline Syntax Semantics.

Inference

• Complexity of inference Bayes Net– Bayes Net reduces space complexity– Bayes Net does not reduce time complexity

for general case

Page 17: Bayesian networks Chapter 14. Outline Syntax Semantics.

Conditional independence and D-separation

• Two sets of nodes, X and Y, are conditionally independent given an evidence set of nodes, E if every undirected path from a node in X to a node in Y is d-seperated by E.

• A set of nodes, E d-separates to sets of nodes, X and Y, if every undirected path from a node in X to a node in Y is blocked by E

• A path is blocked given E if there is a node Z on the path for which one of the following holds:

Page 18: Bayesian networks Chapter 14. Outline Syntax Semantics.

Conditional independence and D-separation - example

Page 19: Bayesian networks Chapter 14. Outline Syntax Semantics.

Inference

• Inference by enumerationP(B | j, m) = P(B, j, m) / P(j, m)

= P(B, j, m)

= a e P(B, e, a, j, m)

= P(B) e P(e) a P(a | B,e) P(j | a) P(m | a)

= <0.0005922, 0.001492> = <0.284,0.716>

Page 20: Bayesian networks Chapter 14. Outline Syntax Semantics.

Inference for polytrees

Page 21: Bayesian networks Chapter 14. Outline Syntax Semantics.

Time complexity

• Dynamic programming– Linear space and time complexity?– Works well for polytrees (where there is at

most one path between any two nodes)– Doesn’t work for multiply connected networks

(exponential space and time complexity on worst case)

Page 22: Bayesian networks Chapter 14. Outline Syntax Semantics.

Inference in multiply connected networks

• Clustering

• Cutset conditioning

• Stochastic methods – Monte Carlo– likelihood weighting– Markov Chain Monte Carlo

• Will be skipped in this course

Page 23: Bayesian networks Chapter 14. Outline Syntax Semantics.

Clustering

• Merge variables, replace their CPTs by their combined CPT

Page 24: Bayesian networks Chapter 14. Outline Syntax Semantics.

Cutset conditioning

• cutting off cycles to obtain polytree.

• Sum over all instantiations of cutset variables.

Page 25: Bayesian networks Chapter 14. Outline Syntax Semantics.

Cutset conditioning - example

P(W|S) = RC P(W|S,R,C) P(R,C|S)

= RC P(W|S,R) P(R,C|S)

P(R,C|S) = P(R,S|C) P(C) / P(S) = P(R|C) P(S|C) P(C) / P(S)

(Notice here that node C d-separates R and S)

P(C = +) + P(C = -)

Page 26: Bayesian networks Chapter 14. Outline Syntax Semantics.

Stochastic Inference

• It’s expensive to work with the full joint distribution… whether as a table or as a Bayes Network

• Is approximation good enough?

• Monte Carlo

Page 27: Bayesian networks Chapter 14. Outline Syntax Semantics.

Monte Carlo

• Use samples to approximate solution– Simulated annealing used Monte Carlo

theories to justify why random guesses and sometimes going uphill can lead to optimality

• More samples = better approximation– How many are needed?– Where should you take the samples?

Page 28: Bayesian networks Chapter 14. Outline Syntax Semantics.

Prior sampling

• An ability to model the prior probabilities of a set of random variables

Page 29: Bayesian networks Chapter 14. Outline Syntax Semantics.

Approximating true distribution

• With enough samples, perfect modeling is possible

Page 30: Bayesian networks Chapter 14. Outline Syntax Semantics.

Rejection sampling

• Compute P(X | e)– Use PriorSample (SPS) and create N samples

– Denote Ne number of samples consistent with e (namely E = e)

– Denote Nex number of samples consistent with E = e) AND X = x

– P(X | e) can be computed from Nex / Ne

Page 31: Bayesian networks Chapter 14. Outline Syntax Semantics.

Example

– P(Rain | Sprinkler = true)– Use Bayes Net to generate 100 samples

• Suppose 73 have Sprinkler=false• Suppose 27 have Sprinkler=true

– 8 have Rain=true– 19 have Rain=false

– P(Rain | Sprinkler=true) = <8/27; 19/27>

Page 32: Bayesian networks Chapter 14. Outline Syntax Semantics.

Problems with rejection sampling

– Standard deviation of the error in probability is proportional to 1/sqrt(n), where n is the number of samples consistent with evidence

– As problems become complex, number of samples consistent with evidence becomes small and it becomes harder to construct accurate estimates

Page 33: Bayesian networks Chapter 14. Outline Syntax Semantics.

Likelihood weighting

• We only want to generate samples that are consistent with the evidence, e– We’ll sample the Bayes Net, but we won’t let

every random variable be sampled, some will be forced to produce a specific output

Page 34: Bayesian networks Chapter 14. Outline Syntax Semantics.

Example

• P (Rain | Sprinkler=true, WetGrass=true)

Page 35: Bayesian networks Chapter 14. Outline Syntax Semantics.

Example

• P (Rain | Sprinkler=true, WetGrass=true)– First, weight vector, w, set to 1.0

Page 36: Bayesian networks Chapter 14. Outline Syntax Semantics.

Example

• Notice that weight is reduced according to how likely an evidence variable’s output is given its parents– So final probability is a function of what comes from

sampling the free variables while constraining the evidence variables

Page 37: Bayesian networks Chapter 14. Outline Syntax Semantics.

Comparing techniques

– In likelihood weighting, attention is paid to evidence variables before samples are collected

– In rejection sampling, evidence variables are considered after the sampling

– Likelihood weighting isn’t as accurate as the true posterior distribution, P(z | e) because the sampled variables ignore evidence among z’s non-ancestors

Page 38: Bayesian networks Chapter 14. Outline Syntax Semantics.

Likelihood weighting

– Uses all the samples– As evidence variables increase, it becomes

harder to keep the weighting constant high and estimate quality drops

Page 39: Bayesian networks Chapter 14. Outline Syntax Semantics.

Gibbs sampling

• Start with an arbitrary state of the network, with Evidence variables set to their observed values

• Randomly sample a value for a non evidence variable, conditioned on its: parents, children and childrens parents (Markov blanket)

• Each state sampled contributes equally to the estimate of the query

Page 40: Bayesian networks Chapter 14. Outline Syntax Semantics.

Some Applications of BN

Medical diagnosis, e.g., lymph-node diseases Troubleshooting of hardware/software systems

Fraud/uncollectible debt detection Data mining Analysis of genetic sequences Data interpretation, computer vision, image

understanding

Page 41: Bayesian networks Chapter 14. Outline Syntax Semantics.

Case study- Pathfinder system

• Diagnostic system for lymph-node diseases. •

• 60 diseases and 100 symptoms and test-results.

• 14,000 probabilities •

• Expert consulted to create the net. – 8 hours to determine variables. – 35 hours for net topology. – 40 hours for probability table values.

• Apparently, the experts found it quite easy to invent the causal links and probabilities.

• Pathfinder is now outperforming the world experts in diagnosis. Being extended to several dozen other medical domains.


Recommended