Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | basil-bird |
View: | 31 times |
Download: | 3 times |
Bayesian Methods II:
Model Comparison
“Get your facts first, then you can distort them as much as you please”
Mark Twain
Combining Systematic and measurement Errors
• H0 = 70 km/sec/Mpc - Hubble Constant (errors 10 later)
• vm = (100 5) 103 km/sec – recession velocity of galaxy
• What is the PDF for the galaxy distance x ?
Hubble Distance Determination Problem:
p(x|D,I) p(x| I) p(D|x, I) For a Parameter estimation problem:
We assume the Likelihood p(D|x, I) is given by a Gaussian PDF representing the error function data-model: (vm-H0x). For simplicity, let us call this error Gaussian Gv(x, H0) distributed about vm with = 5 103.
Assume a uniform prior p(x| I). (Improper prior with infinite range is fine for parameter fitting in this case.)
Combining Systematic and measurement Errors
p(x|D,I) = dH0 p(x, H0 |D,I)
Now we consider 4 separate cases for our prior of H0:
p(x|D,I) p(x| I) p(D|x, I) = constant Gv(x ,H0)
CASE1: Assume H0 = 70 km/sec/Mpc (with no error)
CASE2: Assume H0 = 70 10 km/sec/Mpc with Gaussian prior GH(H0)
p(x| I) dH0 p(H0 | I) p(D|x, H0 I)
= p(x| I) dH0 GH(H0) Gv(x ,H0)
Hubble Distance PDFs
Case 1
Case 2
CASE3: Assume H0 = 70 20 km/sec/Mpc with uniform prior
Combining Systematic and measurement Errors
p(x|D,I) = dH0 p(x, H0 |D,I)
Now we consider 4 separate cases for our prior of H0:
p(x| I) dH0 p(H0 | I) p(D|x, H0 I) 50
p(x| I) dH0 constant Gv(x ,H0)
9050
90
50
90
CASE4: Assume H0 = 70 20 km/sec/Mpc with Jeffreys prior
p(x|D,I) = dH0 p(x, H0 |D,I)
p(x| I) dH0 p(H0 | I) p(D|x, H0 I) 50
p(x| I) dH0 (1/ H0 ln(90/50)) Gv(x ,H0)
9050
90
50
90
Hubble Distance PDFs
Case 1
Case 2Case 3Case 4
Two basic classes of Inference
Which of two or more competing models is the most probable given our present state of knowledge?
1. Model Comparison
• Competing models may have free parameters• Models may vary in complexity (some with more free parameters)• Generally, model comparison is not concerned with finding values• Free parameters are usually Marginalized out in the analysis.
Given a certain model, what is the probability density function for each of its free parameters?
2. Parameter Estimation
• Suppose model M has free parameters f and A• We wish to find p( f | D, M, I) and p( A | D, M, I) • p( f | D, M, I) is known as the Marginal Posterior Distribution for f
Model Comparison: the Odds Ratio
p(Mi |D, I) = p(D| I)
p(Mi | I) p(D|Mi , I)
=
I = M1 + M2 + M3 + ... Mn Bayes Theorem:
Oij = p(Mi |D, I) / p(Mj |D, I) We introduce the Odds Ratio:
p(Mi | I) p(D |Mi I) p(Mj | I) p(D |Mj I)
p(Mi | I)p(Mj | I)
Bij
The factor Bij is known as the Bayes Factor.
Prior Odds Ratio
From Odds Ratio to Probabilities
Oi1 = p(Mi |D, I) / p(M1 ,|D, I)
Nmod
i p(Mi |D, I ) = 1
If the Odds ratio for model M1 is Oi1 , how to get probabilities?
= i = i Oi1
Divide through by p(M1 |D, I ) & rearrange:
Nmod1p(M1 |D, I)
Nmod p(Mi |D, I ) p(M1|D, I )
p(Mi |D, I) = Oi1 p(M1 ,|D, I)
Now by definition
Rearranging:
p(Mi |D, I) =Oi1 j Oj1
p(M2 |D, I) = =O21 1 + O21
1 1 + 1/O21
For only 2 models:
Occam’s Razor Quantified
William of Ockham (1285-1349)
Consider a comparison between two models, M0 and M1:
p(D |M1 I) L (M1) p(D |M0 I) L (M0)
B10 =
We have no prior reason to prefer either model (Prior odds ratio = 1.0)
“Entia non sunt multilicanda praeter necessitatem” (entities must not be multiplied beyond necessity)
M1 = f(θ) - one free parameter M0 has fixed θ= θ0 - zero free parameters
To compare models we compute the Bayes Factor B10 in favor of M1 over M0
=
Occam’s Razor Quantified
p(θ|M1,I)= 1/Δθ - Prior
1/Δθ
Δθ
p(D|θ,M1,I)= L (θ) - Likelihood δθ
L (Θ) = p(D|Θ,M1,I)
θ = Θ
dθ p(D|θ, M1 ,I) = p(D|Θ, M1 ,I) δθ
Characteristic Width δθ of the likelihood is defined:
Δθ
Then the likelihood for M1 is:
L (M1) = p(D|M1,I)
= dθ p(θ|M1 ,I) p(D|θ, M1 ,I)
= dθ p(D|θ, M1 ,I) 1Δθ p(D|Θ, M1 ,I) = L (Θ) δθ
ΔθδθΔθ
The Occam FactorThe likelihood for the more complicated M1 is:
L (M1) = p(D|M1,I) p(D|Θ, M1 ,I) = L (Θ) δθΔθ
δθΔθ
L (M0) = p(D|M0 ,I) = p(D|θ0 ,M1 ,I) = L (θ0)
L (M1) L (M0)
B10 =
For the simple model M0 there is no integration to marginalize out any parameters, and so:
Therefore our Bayes factor in favor of M1 over M0 is: L (Θ) δθ
ΔθL (θ0) Occam Factor
for parameter θ
Ωθ
Suppose our model M1 has two free parameters θ and φ:
L (M1) = p(D|M1,I) Lmax Ωθ Ωφ (Occam factors multiply)
Where: 0 = 37 & L = 2 (channels) { }
Spectral Line RevisitedHypothesis Space:M1 Gaussian emission line Channel 37
M2 No real signal, only noise.
Tfi = T exp -(i - 0)2
2L2
As Before: Noise is Gaussian with n = 1
Prior estimates of T from 0.1 to 100
M1 predicts signal
M2 predicts signal = 0.0
Spectral Line Model Comparison= O12 =
p(M1 | I) p(D |M1 I) p(M2 | I) p(D |M2 I)
p(M1 | I)p(M2 | I)
B12
Set Prior Odds Ratio = 1 (then O12 = B12)
p(M1 | D, I) p(M2 | D, I)
p(D| M1, I) = dT p(T| M1,I) p(D| T, M1 ,I)
For model M1 we need to marginalize over T
Prior Likelihood
Jeffreys
UniformJeffreys
Uniform
PDF Prob. Per log interval
Calculate odds ratio for both to contrast Jeffreys and Uniform priors.
Spectral line model comparisonWe already calculated the likelihood P(D|T, M1, I) in the previous lecture:
di = Tfi + ei
P(D|M1,T, I) = P(E1,E2,E3...En||M1,T, I) = i P(Ei||M1,T, I)
{ } 1 exp -(di - Tfi)2
2n2 i
= n sqrt(2)
N
{ } exp-i(di - Tfi)2
2n2 = n
-N (2)-N/2
For Gaussian Noise
Spectral line model comparisonSo for the Uniform Prior case we have:
P(D|M1, I) = { } dT exp-i(di - Tfi)2
2n2
= 1.131 10-38
n-N (2)-N/21
ΔTTmin
Tmax
Now: p(D|M1,I) Lmax(M1) ΩT and Lmax(M1) = 8.520 10-37
Then we have an Occam factor for the uniform prior ΩT = 0.0133
For the Jeffreys Prior case we have:
P(D|M1, I) = { } dT exp -i(di - Tfi)2
2n2
= 1.239 10-37
n-N (2)-N/2
ln(Tmax /Tmin) Tmin
Tmax
The Occam factor for the Jeffreys prior ΩT = 0.145
1T
Spectral line model comparison
The likelihood for the no-signal M2 model P(D|T, M2 , I) is simply the sum of the Gaussian noise terms.
P(D|M2,T, I) = { } exp-i di
2
2n2 n
-N (2)-N/2
= 1.133 10-38
M2 has no free parameters, and hence no Occam factor.
Spectral line models: final odds
Although Lmax(M1) /Lmax(M2) 75
Uniform Prior Odds Ratio :
1.131 10-38
Jeffreys Prior Odds Ratio:
1.239 10-37
O12 = 1.133 10-38 = 0.9985
p(M1 |D, I) =1
1 + 1/O12
= 0.4996Probability:
1.133 10-38 = 10.94
p(M1 |D, I) =1
1 + 1/O12
= 0.916Probability:
(odds influenced by low Occam factor ΩT = 0.0133)
p(M2 |D, I) = 0.084
The Laplace Approximation
We have an un-normalized pdf P*(x) whose normailzation constant is:
Taylor expand the logarithm of P about its peak:
where
Now we can approximate P*(x) by an unnormalized Gaussian:
And we can approximate the normalizing constant Zp by the Gaussian normalization: