Bayesian Methods II: Model Comparison

Bayesian Methods II:

Model Comparison

“Get your facts first, then you can distort them as much as you please”

Mark Twain

Combining Systematic and measurement Errors

• H0 = 70 km/sec/Mpc - Hubble Constant (errors 10 later)

• vm = (100 5) 103 km/sec – recession velocity of galaxy

• What is the PDF for the galaxy distance x ?

Hubble Distance Determination Problem:

p(x|D,I) p(x| I) p(D|x, I) For a Parameter estimation problem:

We assume the Likelihood p(D|x, I) is given by a Gaussian PDF representing the error function data-model: (vm-H0x). For simplicity, let us call this error Gaussian Gv(x, H0) distributed about vm with = 5 103.

Assume a uniform prior p(x| I). (Improper prior with infinite range is fine for parameter fitting in this case.)


p(x|D,I) = dH0 p(x, H0 |D,I)

Now we consider 4 separate cases for our prior of H0:

p(x|D,I) p(x| I) p(D|x, I) = constant Gv(x ,H0)

CASE1: Assume H0 = 70 km/sec/Mpc (with no error)

CASE2: Assume H0 = 70 10 km/sec/Mpc with Gaussian prior GH(H0)

p(x| I) dH0 p(H0 | I) p(D|x, H0 I)

= p(x| I) dH0 GH(H0) Gv(x ,H0)

Hubble Distance PDFs

Case 1

Case 2

CASE3: Assume H0 = 70 20 km/sec/Mpc with uniform prior


p(x|D,I) = dH0 p(x, H0 |D,I)

Now we consider 4 separate cases for our prior of H0:

p(x| I) dH0 p(H0 | I) p(D|x, H0 I) 50

p(x| I) dH0 constant Gv(x ,H0)

9050

90

50

90

CASE4: Assume H0 = 70 20 km/sec/Mpc with Jeffreys prior

p(x|D,I) = dH0 p(x, H0 |D,I)

p(x| I) dH0 p(H0 | I) p(D|x, H0 I) 50

p(x| I) dH0 (1/ H0 ln(90/50)) Gv(x ,H0)

9050

90

50

90

Hubble Distance PDFs

Case 1

Case 2Case 3Case 4

Two basic classes of Inference

Which of two or more competing models is the most probable given our present state of knowledge?

1. Model Comparison

• Competing models may have free parameters• Models may vary in complexity (some with more free parameters)• Generally, model comparison is not concerned with finding values• Free parameters are usually Marginalized out in the analysis.

Given a certain model, what is the probability density function for each of its free parameters?

2. Parameter Estimation

• Suppose model M has free parameters f and A• We wish to find p( f | D, M, I) and p( A | D, M, I) • p( f | D, M, I) is known as the Marginal Posterior Distribution for f

Model Comparison: the Odds Ratio

p(Mi |D, I) = p(D| I)

p(Mi | I) p(D|Mi , I)

=

I = M1 + M2 + M3 + ... Mn Bayes Theorem:

Oij = p(Mi |D, I) / p(Mj |D, I) We introduce the Odds Ratio:

p(Mi | I) p(D |Mi I) p(Mj | I) p(D |Mj I)

p(Mi | I)p(Mj | I)

Bij

The factor Bij is known as the Bayes Factor.

Prior Odds Ratio

From Odds Ratio to Probabilities

Oi1 = p(Mi |D, I) / p(M1 ,|D, I)

Nmod

i p(Mi |D, I ) = 1

If the Odds ratio for model M1 is Oi1 , how to get probabilities?

= i = i Oi1

Divide through by p(M1 |D, I ) & rearrange:

Nmod1p(M1 |D, I)

Nmod p(Mi |D, I ) p(M1|D, I )

p(Mi |D, I) = Oi1 p(M1 ,|D, I)

Now by definition

Rearranging:

p(Mi |D, I) =Oi1 j Oj1

p(M2 |D, I) = =O21 1 + O21

1 1 + 1/O21

For only 2 models:

Occam’s Razor Quantified

William of Ockham (1285-1349)

Consider a comparison between two models, M0 and M1:

p(D |M1 I) L (M1) p(D |M0 I) L (M0)

B10 =

We have no prior reason to prefer either model (Prior odds ratio = 1.0)

“Entia non sunt multilicanda praeter necessitatem” (entities must not be multiplied beyond necessity)

M1 = f(θ) - one free parameter M0 has fixed θ= θ0 - zero free parameters

To compare models we compute the Bayes Factor B10 in favor of M1 over M0

=

Occam’s Razor Quantified

p(θ|M1,I)= 1/Δθ - Prior

1/Δθ

Δθ

p(D|θ,M1,I)= L (θ) - Likelihood δθ

L (Θ) = p(D|Θ,M1,I)

θ = Θ

dθ p(D|θ, M1 ,I) = p(D|Θ, M1 ,I) δθ

Characteristic Width δθ of the likelihood is defined:

Δθ

Then the likelihood for M1 is:

L (M1) = p(D|M1,I)

= dθ p(θ|M1 ,I) p(D|θ, M1 ,I)

= dθ p(D|θ, M1 ,I) 1Δθ p(D|Θ, M1 ,I) = L (Θ) δθ

ΔθδθΔθ

The Occam FactorThe likelihood for the more complicated M1 is:

L (M1) = p(D|M1,I) p(D|Θ, M1 ,I) = L (Θ) δθΔθ

δθΔθ

L (M0) = p(D|M0 ,I) = p(D|θ0 ,M1 ,I) = L (θ0)

L (M1) L (M0)

B10 =

For the simple model M0 there is no integration to marginalize out any parameters, and so:

Therefore our Bayes factor in favor of M1 over M0 is: L (Θ) δθ

ΔθL (θ0) Occam Factor

for parameter θ

Ωθ

Suppose our model M1 has two free parameters θ and φ:

L (M1) = p(D|M1,I) Lmax Ωθ Ωφ (Occam factors multiply)

Where: 0 = 37 & L = 2 (channels) { }

Spectral Line RevisitedHypothesis Space:M1 Gaussian emission line Channel 37

M2 No real signal, only noise.

Tfi = T exp -(i - 0)2

2L2

As Before: Noise is Gaussian with n = 1

Prior estimates of T from 0.1 to 100

M1 predicts signal

M2 predicts signal = 0.0

Spectral Line Model Comparison= O12 =

p(M1 | I) p(D |M1 I) p(M2 | I) p(D |M2 I)

p(M1 | I)p(M2 | I)

B12

Set Prior Odds Ratio = 1 (then O12 = B12)

p(M1 | D, I) p(M2 | D, I)

p(D| M1, I) = dT p(T| M1,I) p(D| T, M1 ,I)

For model M1 we need to marginalize over T

Prior Likelihood

Jeffreys

UniformJeffreys

Uniform

PDF Prob. Per log interval

Calculate odds ratio for both to contrast Jeffreys and Uniform priors.

Spectral line model comparisonWe already calculated the likelihood P(D|T, M1, I) in the previous lecture:

di = Tfi + ei

P(D|M1,T, I) = P(E1,E2,E3...En||M1,T, I) = i P(Ei||M1,T, I)

{ } 1 exp -(di - Tfi)2

2n2 i

= n sqrt(2)

N

{ } exp-i(di - Tfi)2

2n2 = n

-N (2)-N/2

For Gaussian Noise

Spectral line model comparisonSo for the Uniform Prior case we have:

P(D|M1, I) = { } dT exp-i(di - Tfi)2

2n2

= 1.131 10-38

n-N (2)-N/21

ΔTTmin

Tmax

Now: p(D|M1,I) Lmax(M1) ΩT and Lmax(M1) = 8.520 10-37

Then we have an Occam factor for the uniform prior ΩT = 0.0133

For the Jeffreys Prior case we have:

P(D|M1, I) = { } dT exp -i(di - Tfi)2

2n2

= 1.239 10-37

n-N (2)-N/2

ln(Tmax /Tmin) Tmin

Tmax

The Occam factor for the Jeffreys prior ΩT = 0.145

1T

Spectral line model comparison

The likelihood for the no-signal M2 model P(D|T, M2 , I) is simply the sum of the Gaussian noise terms.

P(D|M2,T, I) = { } exp-i di

2

2n2 n

-N (2)-N/2

= 1.133 10-38

M2 has no free parameters, and hence no Occam factor.

Spectral line models: final odds

Although Lmax(M1) /Lmax(M2) 75

Uniform Prior Odds Ratio :

1.131 10-38

Jeffreys Prior Odds Ratio:

1.239 10-37

O12 = 1.133 10-38 = 0.9985

p(M1 |D, I) =1

1 + 1/O12

= 0.4996Probability:

1.133 10-38 = 10.94

p(M1 |D, I) =1

1 + 1/O12

= 0.916Probability:

(odds influenced by low Occam factor ΩT = 0.0133)

p(M2 |D, I) = 0.084

The Laplace Approximation

We have an un-normalized pdf P*(x) whose normailzation constant is:

Taylor expand the logarithm of P about its peak:

where

Now we can approximate P*(x) by an unnormalized Gaussian:

And we can approximate the normalizing constant Zp by the Gaussian normalization:

Date post:	31-Dec-2015
Category:	Documents
Upload:	basil-bird
View:	31 times
Download:	3 times

Bayesian Methods II: Model Comparison

Documents