+ All Categories
Home > Documents > Approximate Inference

Approximate Inference

Date post: 03-Jan-2016
Category:
Upload: umberto-potter
View: 22 times
Download: 2 times
Share this document with a friend
Description:
Approximate Inference. Slides by Nir Friedman. When can we hope to approximate?. Two situations: Highly stochastic distributions “Far” evidence is discarded “Peaked” distributions improbable values are ignored. X n+1. X 1. X 2. X 3. Stochasticity & Approximations. Consider a chain: - PowerPoint PPT Presentation
34
. Approximate Inference Slides by Nir Friedman
Transcript
Page 1: Approximate Inference

.

Approximate Inference

Slides by Nir Friedman

Page 2: Approximate Inference

When can we hope to approximate?

Two situations: Highly stochastic distributions

“Far” evidence is discarded “Peaked” distributions

improbable values are ignored

Page 3: Approximate Inference

Stochasticity & Approximations

Consider a chain:

P(Xi+1 = t | Xi = t) = 1- P(Xi+1 = f | Xi = f) = 1-

Computing the probability of Xn+1 given X1 , we get

X1 X2 X3Xn+1

2/)1(

0

121211

2/

0

2211

)1(12

)|(

)1(2

)|(

n

k

knkn

n

k

knkn

k

ntXfXP

k

ntXtXP

Even # of flips:

Odd # of flips:

Page 4: Approximate Inference

Plot of P(Xn = t | X1 = t)

0.5

0.6

0.7

0.8

0.9

1

0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5

n = 5

n = 10

n = 20

Page 5: Approximate Inference

Stochastic Processes

This behavior of a chain (a Markov Process) is called Mixing.

In general Bayes nets there is a similar behavior. If probabilities are far from 0 & 1, then effect of

“far” evidence vanishes (and so can be discarded in approximations).

Page 6: Approximate Inference

“Peaked” distributions If the distribution is “peaked”, then most of the

mass is on few instances If we can focus on these instances, we can

ignore the rest

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

Probability

Instances

Page 7: Approximate Inference

Global conditioning

A

L

C

I

D

J

B

M

E

K

Fixing value of A & B

a b c d le

ledcbaPmP...

),...,,,,,()(

Fixing values in the beginning of the summation can decrease tables formed by variable elimination. This way space is traded with time. Special case: choose to fix a set of nodes that “break all loops”. This method is called cutset-conditioning.

L

C

I

J

M

E

K

D

a b b a

Page 8: Approximate Inference

Bounded conditioning

A

B

Fixing value of A & B

By examining only the probable assignment of A & B, we perform several simple computations instead of a complex one.

Page 9: Approximate Inference

Bounded conditioning

Choose A and B so that P(Y,e |a,b) can be computed easily. E.g., a cycle cutset.

Search for highly probable assignments to A,B. Option 1--- select a,b with high P(a,b). Option 2--- select a,b with high P(a,b | e).

We need to search for such high mass values and that can be hard.

obasbleba

b)P(ab)|ayYP)yYPPr,

,,,(,( ee

Page 10: Approximate Inference

Bounded Conditioning

Advantages: Combines exact inference within approximation Continuous: more time can be used to examine more cases Bounds: unexamined mass

used to compute error-bars

Possible problems: P(a,b) is prior mass not the posterior. If posterior is significantly different P(a,b| e), Computation

can be wasted on irrelevant assignments

obableba

b)P(aPr,

,1

Page 11: Approximate Inference

Network Simplifications

In these approaches, we try to replace the original network with a simpler one

the resulting network allows fast exact methods

Page 12: Approximate Inference

Network Simplifications

Typical simplifications: Remove parts of the network Remove edges Reduce the number of values (value abstraction) Replace a sub-network with a simpler one

(model abstraction) These simplifications are often w.r.t. to the

particular evidence and query

Page 13: Approximate Inference

Stochastic Simulation

Suppose our goal is the compute the likelihood of evidence P(e) where e is an assignment to some variables in {X1,…,Xn}.

Assume that we can sample instances <x1,…,xn> according to the distribution P(x1,…,xn).

What is then the probability that a random sample <x1,…,xn> satisfies e?

Answer: simply P(e) which is what we wish to compute.

Each sample simulates the tossing of a biased coin with probability P(e) of “Heads”.

Page 14: Approximate Inference

Stochastic Sampling

Intuition: given a sufficient number of samples x[1],…,x[N], we can estimate

Law of large number implies that as N grows, our estimate will converge to p with high probability

N

[i])|P

NHeads

)P i

xe

e(

#(

Zeros or ones

How many samples do we need to get a reliable estimation?

We will not discuss this issue here.

Page 15: Approximate Inference

Sampling a Bayesian Network

If P(X1,…,Xn) is represented by a Bayesian network, can we efficiently sample from it?

Idea: sample according to structure of the network Write distribution using the chain rule, and then

sample each variable given its parents

Page 16: Approximate Inference

Samples:

B E A C R

Logic sampling

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

e e

0.3 0.001

b

Earthquake

Radio

Burglary

Alarm

Call

0.03

Page 17: Approximate Inference

Samples:

B E A C R

Logic sampling

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

e e

0.3 0.001

eb

Earthquake

Radio

Burglary

Alarm

Call

0.001

Page 18: Approximate Inference

Samples:

B E A C R

Logic sampling

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

e e

0.3 0.001

e ab

0.4

Earthquake

Radio

Burglary

Alarm

Call

Page 19: Approximate Inference

Samples:

B E A C R

Logic sampling

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

e e

0.3 0.001

e a cb

Earthquake

Radio

Burglary

Alarm

Call

0.8

Page 20: Approximate Inference

Samples:

B E A C R

Logic sampling

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

e e

0.3 0.001

e a cb

0.3

Earthquake

Radio

Burglary

Alarm

Call

Page 21: Approximate Inference

Samples:

B E A C R

Logic sampling

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

e e

0.3 0.001

e a cb r

Earthquake

Radio

Burglary

Alarm

Call

Page 22: Approximate Inference

Logic Sampling

Let X1, …, Xn be order of variables consistent with arc direction

for i = 1, …, n do sample xi from P(Xi | pai ) (Note: since Pai {X1,…,Xi-1}, we already

assigned values to them) return x1, …,xn

Page 23: Approximate Inference

Logic Sampling

Sampling a complete instance is linear in number of variables Regardless of structure of the network

However, if P(e) is small, we need many samples to get a decent estimate

Page 24: Approximate Inference

Can we sample from P(Xi|e) ?

If evidence e is in roots of the Bayes network, easily If evidence is in leaves of the network, we have a

problem: Our sampling method proceeds according to the

order of nodes in the network.

Z

R

B

A=a

X

Page 25: Approximate Inference

Likelihood Weighting

Can we ensure that all of our sample satisfy e? One simple (but wrong) solution:

When we need to sample a variable Y that is assigned value by e, use its specified value.

For example: we know Y = 1 Sample X from P(X) Then take Y = 1

Is this a sample from P(X,Y |Y = 1) ? NO.

X Y

Page 26: Approximate Inference

Likelihood Weighting

Problem: these samples of X are from P(X) Solution:

Penalize samples in which P(Y=1|X) is small

We now sample as follows: Let xi be a sample from P(x) Let wi= P(Y = 1|X = xi )

X Y

ii

iii

w

)|XPw)YxXP

xx(1|(

Page 27: Approximate Inference

Likelihood Weighting

Let X1, …, Xn be order of variables consistent with arc direction

w = 1 for i = 1, …, n do

if Xi = xi has been observedw w P(Xi = xi | pai )

elsesample xi from P(Xi | pai )

return x1, …,xn, and w

Page 28: Approximate Inference

Samples:

B E A C R

Likelihood Weighting

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a

0.8 0.05

P(r)

r r

0.3 0.001

b

Earthquake

Radio

Burglary

Alarm

Call

0.03

Weight

= r

a

= a

Page 29: Approximate Inference

Samples:

B E A C R

Likelihood Weighting

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

r r

0.3 0.001

eb

Earthquake

Radio

Burglary

Alarm

Call

0.001

Weight

= r = a

Page 30: Approximate Inference

Samples:

B E A C R

Likelihood Weighting

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

r r

0.3 0.001

eb

0.4

Earthquake

Radio

Burglary

Alarm

Call

Weight

= r = a

0.6a

Page 31: Approximate Inference

Samples:

B E A C R

Likelihood Weighting

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

r r

0.3 0.001

e cb

Earthquake

Radio

Burglary

Alarm

Call

0.05Weight

= r = a

a 0.6

Page 32: Approximate Inference

Samples:

B E A C R

Likelihood Weighting

P(b) 0.03P(e) 0.001

P(a)

b e b e b e b e

0.98 0.40.7 0.01

P(c)

a a

0.8 0.05

P(r)

r r

0.3 0.001

e cb r

0.3

Earthquake

Radio

Burglary

Alarm

Call

Weight

= r = a

a 0.6*0.3

Page 33: Approximate Inference

Likelihood Weighting

Why does this make sense? When N is large, we expect to sample NP(X = x)

samples with x[i] = x Thus,

)xXPPN

PN)x|XP

i

i 1Y|(1)Y(

1)Y, x X(

[i]) x X|1 P(Y

x[i]([i]) x X|1 P(Y

Page 34: Approximate Inference

Summary

Approximate inference is needed for large pedigrees. We have seen a few methods today. Some could fit genetic linkage analysis and some do not. There are many other approximation algorithms: Variational methods, MCMC, and others.

In next semester’s project of Bioinformatics (236524), we will offer projects that seek to implement some approximation methods and embed them in the superlink software.


Recommended