+ All Categories
Home > Documents > Bayesian Methods II: Model Comparison

Bayesian Methods II: Model Comparison

Date post: 31-Dec-2015
Category:
Upload: basil-bird
View: 31 times
Download: 3 times
Share this document with a friend
Description:
Bayesian Methods II: Model Comparison. “Get your facts first, then you can distort them as much as you please”. Mark Twain. Combining Systematic and measurement Errors. Hubble Distance Determination Problem:. H 0 = 70 km/sec/Mpc - Hubble Constant (errors  10 later) - PowerPoint PPT Presentation
Popular Tags:
19
Bayesian Methods II: Model Comparison “Get your facts first, then you can distort them as much as you please” Mark Twain
Transcript
Page 1: Bayesian Methods II: Model Comparison

Bayesian Methods II:

Model Comparison

“Get your facts first, then you can distort them as much as you please”

Mark Twain

Page 2: Bayesian Methods II: Model Comparison

Combining Systematic and measurement Errors

• H0 = 70 km/sec/Mpc - Hubble Constant (errors 10 later)

• vm = (100 5) 103 km/sec – recession velocity of galaxy

• What is the PDF for the galaxy distance x ?

Hubble Distance Determination Problem:

p(x|D,I) p(x| I) p(D|x, I) For a Parameter estimation problem:

We assume the Likelihood p(D|x, I) is given by a Gaussian PDF representing the error function data-model: (vm-H0x). For simplicity, let us call this error Gaussian Gv(x, H0) distributed about vm with = 5 103.

Assume a uniform prior p(x| I). (Improper prior with infinite range is fine for parameter fitting in this case.)

Page 3: Bayesian Methods II: Model Comparison

Combining Systematic and measurement Errors

p(x|D,I) = dH0 p(x, H0 |D,I)

Now we consider 4 separate cases for our prior of H0:

p(x|D,I) p(x| I) p(D|x, I) = constant Gv(x ,H0)

CASE1: Assume H0 = 70 km/sec/Mpc (with no error)

CASE2: Assume H0 = 70 10 km/sec/Mpc with Gaussian prior GH(H0)

p(x| I) dH0 p(H0 | I) p(D|x, H0 I)

= p(x| I) dH0 GH(H0) Gv(x ,H0)

Page 4: Bayesian Methods II: Model Comparison

Hubble Distance PDFs

Case 1

Case 2

Page 5: Bayesian Methods II: Model Comparison

CASE3: Assume H0 = 70 20 km/sec/Mpc with uniform prior

Combining Systematic and measurement Errors

p(x|D,I) = dH0 p(x, H0 |D,I)

Now we consider 4 separate cases for our prior of H0:

p(x| I) dH0 p(H0 | I) p(D|x, H0 I) 50

p(x| I) dH0 constant Gv(x ,H0)

9050

90

50

90

CASE4: Assume H0 = 70 20 km/sec/Mpc with Jeffreys prior

p(x|D,I) = dH0 p(x, H0 |D,I)

p(x| I) dH0 p(H0 | I) p(D|x, H0 I) 50

p(x| I) dH0 (1/ H0 ln(90/50)) Gv(x ,H0)

9050

90

50

90

Page 6: Bayesian Methods II: Model Comparison

Hubble Distance PDFs

Case 1

Case 2Case 3Case 4

Page 7: Bayesian Methods II: Model Comparison

Two basic classes of Inference

Which of two or more competing models is the most probable given our present state of knowledge?

1. Model Comparison

• Competing models may have free parameters• Models may vary in complexity (some with more free parameters)• Generally, model comparison is not concerned with finding values• Free parameters are usually Marginalized out in the analysis.

Given a certain model, what is the probability density function for each of its free parameters?

2. Parameter Estimation

• Suppose model M has free parameters f and A• We wish to find p( f | D, M, I) and p( A | D, M, I) • p( f | D, M, I) is known as the Marginal Posterior Distribution for f

Page 8: Bayesian Methods II: Model Comparison

Model Comparison: the Odds Ratio

p(Mi |D, I) = p(D| I)

p(Mi | I) p(D|Mi , I)

=

I = M1 + M2 + M3 + ... Mn Bayes Theorem:

Oij = p(Mi |D, I) / p(Mj |D, I) We introduce the Odds Ratio:

p(Mi | I) p(D |Mi I) p(Mj | I) p(D |Mj I)

p(Mi | I)p(Mj | I)

Bij

The factor Bij is known as the Bayes Factor.

Prior Odds Ratio

Page 9: Bayesian Methods II: Model Comparison

From Odds Ratio to Probabilities

Oi1 = p(Mi |D, I) / p(M1 ,|D, I)

Nmod

i p(Mi |D, I ) = 1

If the Odds ratio for model M1 is Oi1 , how to get probabilities?

= i = i Oi1

Divide through by p(M1 |D, I ) & rearrange:

Nmod1p(M1 |D, I)

Nmod p(Mi |D, I ) p(M1|D, I )

p(Mi |D, I) = Oi1 p(M1 ,|D, I)

Now by definition

Rearranging:

p(Mi |D, I) =Oi1 j Oj1

p(M2 |D, I) = =O21 1 + O21

1 1 + 1/O21

For only 2 models:

Page 10: Bayesian Methods II: Model Comparison

Occam’s Razor Quantified

William of Ockham (1285-1349)

Consider a comparison between two models, M0 and M1:

p(D |M1 I) L (M1) p(D |M0 I) L (M0)

B10 =

We have no prior reason to prefer either model (Prior odds ratio = 1.0)

“Entia non sunt multilicanda praeter necessitatem” (entities must not be multiplied beyond necessity)

M1 = f(θ) - one free parameter M0 has fixed θ= θ0 - zero free parameters

To compare models we compute the Bayes Factor B10 in favor of M1 over M0

=

Page 11: Bayesian Methods II: Model Comparison

Occam’s Razor Quantified

p(θ|M1,I)= 1/Δθ - Prior

1/Δθ

Δθ

p(D|θ,M1,I)= L (θ) - Likelihood δθ

L (Θ) = p(D|Θ,M1,I)

θ = Θ

dθ p(D|θ, M1 ,I) = p(D|Θ, M1 ,I) δθ

Characteristic Width δθ of the likelihood is defined:

Δθ

Then the likelihood for M1 is:

L (M1) = p(D|M1,I)

= dθ p(θ|M1 ,I) p(D|θ, M1 ,I)

= dθ p(D|θ, M1 ,I) 1Δθ p(D|Θ, M1 ,I) = L (Θ) δθ

ΔθδθΔθ

Page 12: Bayesian Methods II: Model Comparison

The Occam FactorThe likelihood for the more complicated M1 is:

L (M1) = p(D|M1,I) p(D|Θ, M1 ,I) = L (Θ) δθΔθ

δθΔθ

L (M0) = p(D|M0 ,I) = p(D|θ0 ,M1 ,I) = L (θ0)

L (M1) L (M0)

B10 =

For the simple model M0 there is no integration to marginalize out any parameters, and so:

Therefore our Bayes factor in favor of M1 over M0 is: L (Θ) δθ

ΔθL (θ0) Occam Factor

for parameter θ

Ωθ

Suppose our model M1 has two free parameters θ and φ:

L (M1) = p(D|M1,I) Lmax Ωθ Ωφ (Occam factors multiply)

Page 13: Bayesian Methods II: Model Comparison

Where: 0 = 37 & L = 2 (channels) { }

Spectral Line RevisitedHypothesis Space:M1 Gaussian emission line Channel 37

M2 No real signal, only noise.

Tfi = T exp -(i - 0)2

2L2

As Before: Noise is Gaussian with n = 1

Prior estimates of T from 0.1 to 100

M1 predicts signal

M2 predicts signal = 0.0

Page 14: Bayesian Methods II: Model Comparison

Spectral Line Model Comparison= O12 =

p(M1 | I) p(D |M1 I) p(M2 | I) p(D |M2 I)

p(M1 | I)p(M2 | I)

B12

Set Prior Odds Ratio = 1 (then O12 = B12)

p(M1 | D, I) p(M2 | D, I)

p(D| M1, I) = dT p(T| M1,I) p(D| T, M1 ,I)

For model M1 we need to marginalize over T

Prior Likelihood

Jeffreys

UniformJeffreys

Uniform

PDF Prob. Per log interval

Calculate odds ratio for both to contrast Jeffreys and Uniform priors.

Page 15: Bayesian Methods II: Model Comparison

Spectral line model comparisonWe already calculated the likelihood P(D|T, M1, I) in the previous lecture:

di = Tfi + ei

P(D|M1,T, I) = P(E1,E2,E3...En||M1,T, I) = i P(Ei||M1,T, I)

{ } 1 exp -(di - Tfi)2

2n2 i

= n sqrt(2)

N

{ } exp-i(di - Tfi)2

2n2 = n

-N (2)-N/2

For Gaussian Noise

Page 16: Bayesian Methods II: Model Comparison

Spectral line model comparisonSo for the Uniform Prior case we have:

P(D|M1, I) = { } dT exp-i(di - Tfi)2

2n2

= 1.131 10-38

n-N (2)-N/21

ΔTTmin

Tmax

Now: p(D|M1,I) Lmax(M1) ΩT and Lmax(M1) = 8.520 10-37

Then we have an Occam factor for the uniform prior ΩT = 0.0133

For the Jeffreys Prior case we have:

P(D|M1, I) = { } dT exp -i(di - Tfi)2

2n2

= 1.239 10-37

n-N (2)-N/2

ln(Tmax /Tmin) Tmin

Tmax

The Occam factor for the Jeffreys prior ΩT = 0.145

1T

Page 17: Bayesian Methods II: Model Comparison

Spectral line model comparison

The likelihood for the no-signal M2 model P(D|T, M2 , I) is simply the sum of the Gaussian noise terms.

P(D|M2,T, I) = { } exp-i di

2

2n2 n

-N (2)-N/2

= 1.133 10-38

M2 has no free parameters, and hence no Occam factor.

Page 18: Bayesian Methods II: Model Comparison

Spectral line models: final odds

Although Lmax(M1) /Lmax(M2) 75

Uniform Prior Odds Ratio :

1.131 10-38

Jeffreys Prior Odds Ratio:

1.239 10-37

O12 = 1.133 10-38 = 0.9985

p(M1 |D, I) =1

1 + 1/O12

= 0.4996Probability:

1.133 10-38 = 10.94

p(M1 |D, I) =1

1 + 1/O12

= 0.916Probability:

(odds influenced by low Occam factor ΩT = 0.0133)

p(M2 |D, I) = 0.084

Page 19: Bayesian Methods II: Model Comparison

The Laplace Approximation

We have an un-normalized pdf P*(x) whose normailzation constant is:

Taylor expand the logarithm of P about its peak:

where

Now we can approximate P*(x) by an unnormalized Gaussian:

And we can approximate the normalizing constant Zp by the Gaussian normalization:


Recommended