+ All Categories
Home > Documents > Introduction to Bayesian Statistics for non-mathematicians

Introduction to Bayesian Statistics for non-mathematicians

Date post: 12-Sep-2021
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
98
logo Introduction to Bayesian Statistics for non-mathematicians By: Dr. J. Andr´ es Christen (Centro de Investigaci´ on en Matem´ aticas, CIMAT. Perteneciente a la red de centros CONACYT). Prerequisits: Elements of calulus and probobility (basic). Lenght: 5 hours (two sessions of 2:30 hours each). e-mail: [email protected] . JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43
Transcript
Page 1: Introduction to Bayesian Statistics for non-mathematicians

logo

Introduction to Bayesian Statisticsfor non-mathematicians

By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).

Prerequisits: Elements of calulus and probobility (basic).

Lenght: 5 hours (two sessions of 2:30 hours each).

e-mail: [email protected] .

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43

Page 2: Introduction to Bayesian Statistics for non-mathematicians

logo

Introduction to Bayesian Statisticsfor non-mathematicians

By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).

Prerequisits: Elements of calulus and probobility (basic).

Lenght: 5 hours (two sessions of 2:30 hours each).

e-mail: [email protected] .

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43

Page 3: Introduction to Bayesian Statistics for non-mathematicians

logo

Introduction to Bayesian Statisticsfor non-mathematicians

By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).

Prerequisits: Elements of calulus and probobility (basic).

Lenght: 5 hours (two sessions of 2:30 hours each).

e-mail: [email protected] .

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43

Page 4: Introduction to Bayesian Statistics for non-mathematicians

logo

Introduction to Bayesian Statisticsfor non-mathematicians

By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).

Prerequisits: Elements of calulus and probobility (basic).

Lenght: 5 hours (two sessions of 2:30 hours each).

e-mail: [email protected] .

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43

Page 5: Introduction to Bayesian Statistics for non-mathematicians

logo

Introduction to Bayesian Statisticsfor non-mathematicians

By: Dr. J. Andres Christen (Centro de Investigacion en Matematicas,CIMAT. Perteneciente a la red de centros CONACYT).

Prerequisits: Elements of calulus and probobility (basic).

Lenght: 5 hours (two sessions of 2:30 hours each).

e-mail: [email protected] .

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 1 / 43

Page 6: Introduction to Bayesian Statistics for non-mathematicians

logo

Texts:

1 Lee, P. (1994), Bayesian Statyistics: An Introduction, London:Edward Arnold.

2 J. O. Berger (1985), Statistical Decision Theory: foundations,concepts and methods, Second Edition, Springer-Verlag.

3 Bernardo, J. M. and Smith, A. F. M. (1994), Bayesian Theory, Wiley:Chichester, UK.

4 M. H. DeGroot (1970), Optimal statistical decisions, McGraw–Hill:NY.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 2 / 43

Page 7: Introduction to Bayesian Statistics for non-mathematicians

logo

Texts:

1 Lee, P. (1994), Bayesian Statyistics: An Introduction, London:Edward Arnold.

2 J. O. Berger (1985), Statistical Decision Theory: foundations,concepts and methods, Second Edition, Springer-Verlag.

3 Bernardo, J. M. and Smith, A. F. M. (1994), Bayesian Theory, Wiley:Chichester, UK.

4 M. H. DeGroot (1970), Optimal statistical decisions, McGraw–Hill:NY.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 2 / 43

Page 8: Introduction to Bayesian Statistics for non-mathematicians

logo

Texts:

1 Lee, P. (1994), Bayesian Statyistics: An Introduction, London:Edward Arnold.

2 J. O. Berger (1985), Statistical Decision Theory: foundations,concepts and methods, Second Edition, Springer-Verlag.

3 Bernardo, J. M. and Smith, A. F. M. (1994), Bayesian Theory, Wiley:Chichester, UK.

4 M. H. DeGroot (1970), Optimal statistical decisions, McGraw–Hill:NY.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 2 / 43

Page 9: Introduction to Bayesian Statistics for non-mathematicians

logo

Texts:

1 Lee, P. (1994), Bayesian Statyistics: An Introduction, London:Edward Arnold.

2 J. O. Berger (1985), Statistical Decision Theory: foundations,concepts and methods, Second Edition, Springer-Verlag.

3 Bernardo, J. M. and Smith, A. F. M. (1994), Bayesian Theory, Wiley:Chichester, UK.

4 M. H. DeGroot (1970), Optimal statistical decisions, McGraw–Hill:NY.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 2 / 43

Page 10: Introduction to Bayesian Statistics for non-mathematicians

logo

Texts:

Colin Howson and Peter Urbach (2006) Scientific Reasoning: TheBayesian Approach, (3rd Ed.), Open Court.

“Two English philosophers provocatively argue the case for Bayesian logic,with a minimum of complex math. They claim that Bayesian thinking isidentical to the scientific method and give fascinating examples of how toanalyze beliefs, such as Macbeth’s doubting of the witches’ prophecy, thediscovery of Neptune on the strength of faith in Newton’s laws but zeroevidence, and why people get hooked on Dianetics.”, – Discover.

“For the first time, we have a book that combines philosophical wisdom,mathematical skill, and statistical appreciation, to produce a coherentsystem.” – Dennis V. Lindley, University College, London (ret.).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 3 / 43

Page 11: Introduction to Bayesian Statistics for non-mathematicians

logo

Conditional Probability

A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:

1 What is the probability that if I toss a coin it lands on “heads”?

2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?

3 What is the probability that it rains tomorrow?

4 What is the probability that it rained yesterday in Washington?

5 What is the probability that our Galaxy has more than 109 stars?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43

Page 12: Introduction to Bayesian Statistics for non-mathematicians

logo

Conditional Probability

A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:

1 What is the probability that if I toss a coin it lands on “heads”?

2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?

3 What is the probability that it rains tomorrow?

4 What is the probability that it rained yesterday in Washington?

5 What is the probability that our Galaxy has more than 109 stars?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43

Page 13: Introduction to Bayesian Statistics for non-mathematicians

logo

Conditional Probability

A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:

1 What is the probability that if I toss a coin it lands on “heads”?

2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?

3 What is the probability that it rains tomorrow?

4 What is the probability that it rained yesterday in Washington?

5 What is the probability that our Galaxy has more than 109 stars?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43

Page 14: Introduction to Bayesian Statistics for non-mathematicians

logo

Conditional Probability

A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:

1 What is the probability that if I toss a coin it lands on “heads”?

2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?

3 What is the probability that it rains tomorrow?

4 What is the probability that it rained yesterday in Washington?

5 What is the probability that our Galaxy has more than 109 stars?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43

Page 15: Introduction to Bayesian Statistics for non-mathematicians

logo

Conditional Probability

A cornerstone of Bayesian statistics is its (alternative) definition ofprobability, a definition sufficiently wide to cover many interesting cases.Let’s start with some examples:

1 What is the probability that if I toss a coin it lands on “heads”?

2 What is the probability that your lecturer has more than theequivalent of 50 pesos in his pocket?

3 What is the probability that it rains tomorrow?

4 What is the probability that it rained yesterday in Washington?

5 What is the probability that our Galaxy has more than 109 stars?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 4 / 43

Page 16: Introduction to Bayesian Statistics for non-mathematicians

logo

More over...

A piece of maize with several kernels found in a clay pot believed tobelong to the last days of the Mexica umpire are radiocarbon dated.

What is the age of the pot?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 5 / 43

Page 17: Introduction to Bayesian Statistics for non-mathematicians

logo

More over...

A piece of maize with several kernels found in a clay pot believed tobelong to the last days of the Mexica umpire are radiocarbon dated.

What is the age of the pot?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 5 / 43

Page 18: Introduction to Bayesian Statistics for non-mathematicians

logo

More over...

A piece of maize with several kernels found in a clay pot believed tobelong to the last days of the Mexica umpire are radiocarbon dated.

What is the age of the pot?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 5 / 43

Page 19: Introduction to Bayesian Statistics for non-mathematicians

logo

All probabilities are conditional(on the person or agent speaking, assumptions made, data used, etc.).

Probability statements go beyond favorable/possible calculations.

In Bayesian statistics, all uncertainties about unknowns are measured witha probability distribution.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 6 / 43

Page 20: Introduction to Bayesian Statistics for non-mathematicians

logo

All probabilities are conditional(on the person or agent speaking, assumptions made, data used, etc.).

Probability statements go beyond favorable/possible calculations.

In Bayesian statistics, all uncertainties about unknowns are measured witha probability distribution.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 6 / 43

Page 21: Introduction to Bayesian Statistics for non-mathematicians

logo

All probabilities are conditional(on the person or agent speaking, assumptions made, data used, etc.).

Probability statements go beyond favorable/possible calculations.

In Bayesian statistics, all uncertainties about unknowns are measured witha probability distribution.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 6 / 43

Page 22: Introduction to Bayesian Statistics for non-mathematicians

logo

Informal Bayesian definition of Probability

Probability is an opinion hold by an agent, that may be turned into a betunder suitable circumstances.

If you say the probability of an event E is p, the you would take a bet ofat most a = 1−p

p to 1 on E being true.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 7 / 43

Page 23: Introduction to Bayesian Statistics for non-mathematicians

logo

Informal Bayesian definition of Probability

Probability is an opinion hold by an agent, that may be turned into a betunder suitable circumstances.

If you say the probability of an event E is p, the you would take a bet ofat most a = 1−p

p to 1 on E being true.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 7 / 43

Page 24: Introduction to Bayesian Statistics for non-mathematicians

logo

Preferences among events

Bayesian statistics, unlike other paradigms for inference, is based on atheory, that is, a set of axioms that creates a general procedure to makeinferences. We briefly present the theory given in DeGroot (1970, cap. 6).We begin with a quote by DeGroot (1970, p. 70):

...suitable probabilities can often be assigned objectively andquickly because of wide agreement on the appropriateness of aspecific distribution for a certain type of problem...On the otherhand, there are some situations for which it would be verydifficult to find even two people who would agree on theappropriateness of any specific distribution.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 8 / 43

Page 25: Introduction to Bayesian Statistics for non-mathematicians

logo

Preferences among events

We have a total event Ω and a set o events @ ((Ω,@) is a mesurable set),we have:

A B, A ≺ B, A ∼ B.

to mean that A is less (more, equal) likely that B. Also

A B

means that A is no more likely than B.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 9 / 43

Page 26: Introduction to Bayesian Statistics for non-mathematicians

logo

Preferences among events

We have a total event Ω and a set o events @ ((Ω,@) is a mesurable set),we have:

A B, A ≺ B, A ∼ B.

to mean that A is less (more, equal) likely that B. Also

A B

means that A is no more likely than B.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 9 / 43

Page 27: Introduction to Bayesian Statistics for non-mathematicians

logo

Axioms

A set of axioms are given for the preference relation , for a rationalagent:

A complete ordering axiom:

Axiom

For any two events A,B ∈ @, we have exactly one of the three followingpreference relations: A B, A ≺ B, A ∼ B.

A transitivity axiom similar to this (a more general version is neededthough):

Axiom

Si A,B,C ∈ @, are three events A B y B C , then A C .

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 10 / 43

Page 28: Introduction to Bayesian Statistics for non-mathematicians

logo

Axioms

A set of axioms are given for the preference relation , for a rationalagent:

A complete ordering axiom:

Axiom

For any two events A,B ∈ @, we have exactly one of the three followingpreference relations: A B, A ≺ B, A ∼ B.

A transitivity axiom similar to this (a more general version is neededthough):

Axiom

Si A,B,C ∈ @, are three events A B y B C , then A C .

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 10 / 43

Page 29: Introduction to Bayesian Statistics for non-mathematicians

logo

Axioms

A set of axioms are given for the preference relation , for a rationalagent:

A complete ordering axiom:

Axiom

For any two events A,B ∈ @, we have exactly one of the three followingpreference relations: A B, A ≺ B, A ∼ B.

A transitivity axiom similar to this (a more general version is neededthough):

Axiom

Si A,B,C ∈ @, are three events A B y B C , then A C .

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 10 / 43

Page 30: Introduction to Bayesian Statistics for non-mathematicians

logo

A non triviality axiom

Axiom

For A ∈ @ any event, then ∅ A. Moreover, ∅ Ω.

And a continuity axiom, a technicality to be able to work with continuosdistributions, like the gaussian:

Axiom

If A1 ⊃ A2 ⊃ · · · ia a decreasing sequence of events in @ and B ∈ @ isanother event such that Ai B for all i , then ∩∞i=1Ai B.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 11 / 43

Page 31: Introduction to Bayesian Statistics for non-mathematicians

logo

A non triviality axiom

Axiom

For A ∈ @ any event, then ∅ A. Moreover, ∅ Ω.

And a continuity axiom, a technicality to be able to work with continuosdistributions, like the gaussian:

Axiom

If A1 ⊃ A2 ⊃ · · · ia a decreasing sequence of events in @ and B ∈ @ isanother event such that Ai B for all i , then ∩∞i=1Ai B.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 11 / 43

Page 32: Introduction to Bayesian Statistics for non-mathematicians

logo

The auxiliary experiment

One further axiom is needed. This axiom more or less says that some“standard” events are added to our sets of events, and this in turn arecompared with the standard events.Suppose for example, that we spin a roulette and all events regarding thefinal position of the roulette are compared with our “relevant” events.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 12 / 43

Page 33: Introduction to Bayesian Statistics for non-mathematicians

logo

Bayesian Inference

Uncertainty is quantified with a probability measure

Bayes’ Theorem: Modify our probability measure with evidence

All probability is conditional (to assumptions made, agent speaking etc.)

P(· | H), with H = particular context, agent speaking etc..

Now, let B ∈ @ and observable event What is the probability of A ∈ @given that we have observed B?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 13 / 43

Page 34: Introduction to Bayesian Statistics for non-mathematicians

logo

Bayesian Inference

Uncertainty is quantified with a probability measure

Bayes’ Theorem: Modify our probability measure with evidence

All probability is conditional (to assumptions made, agent speaking etc.)

P(· | H), with H = particular context, agent speaking etc..

Now, let B ∈ @ and observable event What is the probability of A ∈ @given that we have observed B?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 13 / 43

Page 35: Introduction to Bayesian Statistics for non-mathematicians

logo

Bayesian Inference

Uncertainty is quantified with a probability measure

Bayes’ Theorem: Modify our probability measure with evidence

All probability is conditional (to assumptions made, agent speaking etc.)

P(· | H), with H = particular context, agent speaking etc..

Now, let B ∈ @ and observable event What is the probability of A ∈ @given that we have observed B?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 13 / 43

Page 36: Introduction to Bayesian Statistics for non-mathematicians

logo

We are talking about the event A | H,B and we may calculate itsprobability by means of

P(A | H,B) =P(A ∩ B | H)

P(B | H),

or

P(A | H,B) =P(B | H,A)P(A | H)

P(B | H).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 14 / 43

Page 37: Introduction to Bayesian Statistics for non-mathematicians

logo

Let’s look closer at

P(A | H,B) =P(B | H,A)P(A | H)

P(B | H).

P(A | H) we call it a priori probability or “prior”, for A.

P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.

P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?

P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that

P(· | H,B) ∝ P(B | H, ·)P(· | H).

Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43

Page 38: Introduction to Bayesian Statistics for non-mathematicians

logo

Let’s look closer at

P(A | H,B) =P(B | H,A)P(A | H)

P(B | H).

P(A | H) we call it a priori probability or “prior”, for A.

P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.

P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?

P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that

P(· | H,B) ∝ P(B | H, ·)P(· | H).

Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43

Page 39: Introduction to Bayesian Statistics for non-mathematicians

logo

Let’s look closer at

P(A | H,B) =P(B | H,A)P(A | H)

P(B | H).

P(A | H) we call it a priori probability or “prior”, for A.

P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.

P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?

P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that

P(· | H,B) ∝ P(B | H, ·)P(· | H).

Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43

Page 40: Introduction to Bayesian Statistics for non-mathematicians

logo

Let’s look closer at

P(A | H,B) =P(B | H,A)P(A | H)

P(B | H).

P(A | H) we call it a priori probability or “prior”, for A.

P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.

P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?

P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that

P(· | H,B) ∝ P(B | H, ·)P(· | H).

Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43

Page 41: Introduction to Bayesian Statistics for non-mathematicians

logo

Let’s look closer at

P(A | H,B) =P(B | H,A)P(A | H)

P(B | H).

P(A | H) we call it a priori probability or “prior”, for A.

P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.

P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?

P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that

P(· | H,B) ∝ P(B | H, ·)P(· | H).

Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43

Page 42: Introduction to Bayesian Statistics for non-mathematicians

logo

Let’s look closer at

P(A | H,B) =P(B | H,A)P(A | H)

P(B | H).

P(A | H) we call it a priori probability or “prior”, for A.

P(A | H,B) we call it a posteriori o posterior probability for A, giventhat we have observed B.

P(B | H,A) is our model...How the observables would be if we knewA? How the data B would be if we knew what we don’t know A(unknown parameters, for example)?

P(B | H) is a normalization constant P(· | H,B) is a modifiedmeasure, then we may say that

P(· | H,B) ∝ P(B | H, ·)P(· | H).

Commonly, conditioning on H is only done implicitly.JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 15 / 43

Page 43: Introduction to Bayesian Statistics for non-mathematicians

logo

Posterior Distribution

Suppose a random variable with Xi = 0, 1, that is Xi | p ∼ Be(p)

independent and uncertainty about p ∈ [0, 1] is quantified with f (p) andp ∼ Beta(α, β) a priori. We obtain that X = (X1,X2, . . . ,Xn) and

P(p ≤ p0 | X) =P(X | p ≤ p0)P(p ≤ p0)

P(X).

But

P(X | p ≤ p0)P(p ≤ p0) = P(X, p ≤ p0) =

∫ p0

0f (X, p)dp.

Now f (X, p) = f (X | p)f (p) and then

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 16 / 43

Page 44: Introduction to Bayesian Statistics for non-mathematicians

logo

Posterior Distribution

Suppose a random variable with Xi = 0, 1, that is Xi | p ∼ Be(p)

independent and uncertainty about p ∈ [0, 1] is quantified with f (p) andp ∼ Beta(α, β) a priori. We obtain that X = (X1,X2, . . . ,Xn) and

P(p ≤ p0 | X) =P(X | p ≤ p0)P(p ≤ p0)

P(X).

But

P(X | p ≤ p0)P(p ≤ p0) = P(X, p ≤ p0) =

∫ p0

0f (X, p)dp.

Now f (X, p) = f (X | p)f (p) and then

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 16 / 43

Page 45: Introduction to Bayesian Statistics for non-mathematicians

logo

P(p ≤ p0 | X) ∝∫ p0

0f (X | p)f (p)dp.

The left hand side of the above expression is the posterior cdf of p, andthus by definition its posterior density is

f (p | X) ∝ f (X | p)f (p).

Moreover

f (X | p) =n∏

i=1

f (Xi |p) = pPn

i=1 Xi (1− p)n−Pn

i=1 Xi

andf (p) = B(α, β)−1pα−1(1− p)β−1,

and then

f (p | X) ∝ p(α+Pn

i=1 Xi )−1(1− p)(β+n−Pn

i=1 Xi )−1.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 17 / 43

Page 46: Introduction to Bayesian Statistics for non-mathematicians

logo

Therefore

p | X ∼ Beta

(α +

n∑i=1

Xi , β + n −n∑

i=1

Xi

).

We present some priors and posterior (Beta) for p

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 18 / 43

Page 47: Introduction to Bayesian Statistics for non-mathematicians

logo

Therefore

p | X ∼ Beta

(α +

n∑i=1

Xi , β + n −n∑

i=1

Xi

).

We present some priors and posterior (Beta) for p

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 18 / 43

Page 48: Introduction to Bayesian Statistics for non-mathematicians

logo

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 19 / 43

Page 49: Introduction to Bayesian Statistics for non-mathematicians

logo

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 20 / 43

Page 50: Introduction to Bayesian Statistics for non-mathematicians

logo

Example 2

We have a couple that has had 5 pregnancies and all 5 have been male,What is the probability that thire next pregnancy results is female?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 21 / 43

Page 51: Introduction to Bayesian Statistics for non-mathematicians

logo

Example 2

1 Are pregnancies independent with respect to the resulting gender?

2 Are there only two possible outputs?

Then the Bernoulli inference model explained above is valid and should beused.Check possibilities in R.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 22 / 43

Page 52: Introduction to Bayesian Statistics for non-mathematicians

logo

Example 2

1 Are pregnancies independent with respect to the resulting gender?

2 Are there only two possible outputs?

Then the Bernoulli inference model explained above is valid and should beused.Check possibilities in R.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 22 / 43

Page 53: Introduction to Bayesian Statistics for non-mathematicians

logo

Example 2

1 Are pregnancies independent with respect to the resulting gender?

2 Are there only two possible outputs?

Then the Bernoulli inference model explained above is valid and should beused.Check possibilities in R.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 22 / 43

Page 54: Introduction to Bayesian Statistics for non-mathematicians

logo

Example 2

1 Are pregnancies independent with respect to the resulting gender?

2 Are there only two possible outputs?

Then the Bernoulli inference model explained above is valid and should beused.Check possibilities in R.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 22 / 43

Page 55: Introduction to Bayesian Statistics for non-mathematicians

logo

Example 2

But the question is ...what prior would you use?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 23 / 43

Page 56: Introduction to Bayesian Statistics for non-mathematicians

logo

Example 2

But the question is ...what prior would you use?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 23 / 43

Page 57: Introduction to Bayesian Statistics for non-mathematicians

logo

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 24 / 43

Page 58: Introduction to Bayesian Statistics for non-mathematicians

logo

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 25 / 43

Page 59: Introduction to Bayesian Statistics for non-mathematicians

logo

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 26 / 43

Page 60: Introduction to Bayesian Statistics for non-mathematicians

logo

Example: Normal sampling

In this case Xi ∼ N(θ, σ2), i = 1, 2, . . . , n (independent) with σ knownand θ ∼ N(θ0, σ

20) a priori :

f (θ | X) ∝ exp−

(θ − θ0)2

2σ20

+n∑

i=1

(xi − θ)2

2σ2

.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 27 / 43

Page 61: Introduction to Bayesian Statistics for non-mathematicians

logo

We see that the posterior is of the form exp h(θ), where h(·) is a quadraticfunction of θ. Then θ | X has a Normal distribution. Compleating thesquares we obtain

f (θ | X) ∝ exp

−(θ − θp)2

2σ2p

+ C

,

where σ2p = 1/(σ−2

0 + nσ−2), θp = σ2p(µ0/σ

20 + nm/σ2), m = 1/n

∑ni=1 xi

and C does not depend on θ. Then

θ | X ∼ N(θp, σ2p).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 28 / 43

Page 62: Introduction to Bayesian Statistics for non-mathematicians

logo

Point and interval estimation

The main objective of any Bayesian analysis is finding the posteriordistribution of interest. A secondary (although very important issue) ismaking proper outlines of this posterior distribution. For example, if wehave

f (θ1, θ2 | X)

(a bivariate distribution), what would you do if only θ1 is of interest?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 29 / 43

Page 63: Introduction to Bayesian Statistics for non-mathematicians

logo

We need the posterior of θ1, and this may be obtained my marginalization,that is

f (θ1 | X) =

∫f (θ1, θ2 | X)dθ2.

This is the so called marginal posterior density of θ1 and etc.

Assuming we have the posterior f (θ | X), we only need to report itsomehow: How would you report the following distributions (see figura 2).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 30 / 43

Page 64: Introduction to Bayesian Statistics for non-mathematicians

logo

We need the posterior of θ1, and this may be obtained my marginalization,that is

f (θ1 | X) =

∫f (θ1, θ2 | X)dθ2.

This is the so called marginal posterior density of θ1 and etc.

Assuming we have the posterior f (θ | X), we only need to report itsomehow: How would you report the following distributions (see figura 2).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 30 / 43

Page 65: Introduction to Bayesian Statistics for non-mathematicians

logo

−4 −2 0 2 40.

00.

10.

20.

30.

4

Parámetro

Dens

idad

0 5 10 15 20 25 30

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

Parámetro

Dens

idad

(a) (b)

0.0 0.2 0.4 0.6 0.8 1.0

0.6

0.8

1.0

1.2

1.4

Parámetro

Dens

idad

0

0.05

0.1

0.15

0.2

0.25

25 30 35 40 45 50 55 60 65 70

Pro

bab

ilit

y

Number of species

PriorPosterior

(c) (d)

Figure: How would you report the following posterior distributions?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 31 / 43

Page 66: Introduction to Bayesian Statistics for non-mathematicians

logo

The concept of (point or interval or else) “estimation” in Bayesianstatistics is only understood as outlines of the relevant posteriordistribution (of course, there are good and bad outlines). Therefore, forexample, point estimation may be understood as making an outline of acomplete probability distribution with a single point, as absurd as this maybe.

We could use the expected value of the posterior distribution.Or we could use the maximum of the posterior distribution, this is the socalled the MAP (Maximum a posteriori).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 32 / 43

Page 67: Introduction to Bayesian Statistics for non-mathematicians

logo

The concept of (point or interval or else) “estimation” in Bayesianstatistics is only understood as outlines of the relevant posteriordistribution (of course, there are good and bad outlines). Therefore, forexample, point estimation may be understood as making an outline of acomplete probability distribution with a single point, as absurd as this maybe.

We could use the expected value of the posterior distribution.Or we could use the maximum of the posterior distribution, this is the socalled the MAP (Maximum a posteriori).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 32 / 43

Page 68: Introduction to Bayesian Statistics for non-mathematicians

logo

The concept of (point or interval or else) “estimation” in Bayesianstatistics is only understood as outlines of the relevant posteriordistribution (of course, there are good and bad outlines). Therefore, forexample, point estimation may be understood as making an outline of acomplete probability distribution with a single point, as absurd as this maybe.

We could use the expected value of the posterior distribution.Or we could use the maximum of the posterior distribution, this is the socalled the MAP (Maximum a posteriori).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 32 / 43

Page 69: Introduction to Bayesian Statistics for non-mathematicians

logo

Calculus

At the end of the day, we will need

1 f (X | θ1, θ2, . . . , θn), a model.

2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.

3 The normalization constant

f (X) =

∫ ∫· · ·∫

f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn

4 To obtain the posterior

f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)

f (X).

5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =

∫ ∫· · ·∫

f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43

Page 70: Introduction to Bayesian Statistics for non-mathematicians

logo

Calculus

At the end of the day, we will need

1 f (X | θ1, θ2, . . . , θn), a model.

2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.

3 The normalization constant

f (X) =

∫ ∫· · ·∫

f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn

4 To obtain the posterior

f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)

f (X).

5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =

∫ ∫· · ·∫

f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43

Page 71: Introduction to Bayesian Statistics for non-mathematicians

logo

Calculus

At the end of the day, we will need

1 f (X | θ1, θ2, . . . , θn), a model.

2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.

3 The normalization constant

f (X) =

∫ ∫· · ·∫

f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn

4 To obtain the posterior

f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)

f (X).

5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =

∫ ∫· · ·∫

f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43

Page 72: Introduction to Bayesian Statistics for non-mathematicians

logo

Calculus

At the end of the day, we will need

1 f (X | θ1, θ2, . . . , θn), a model.

2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.

3 The normalization constant

f (X) =

∫ ∫· · ·∫

f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn

4 To obtain the posterior

f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)

f (X).

5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =

∫ ∫· · ·∫

f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43

Page 73: Introduction to Bayesian Statistics for non-mathematicians

logo

Calculus

At the end of the day, we will need

1 f (X | θ1, θ2, . . . , θn), a model.

2 f (θ1, θ2, . . . , θn) a prior distribution for parameters.

3 The normalization constant

f (X) =

∫ ∫· · ·∫

f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)dθ1dθ2 · · · dθn

4 To obtain the posterior

f (θ1, θ2, . . . , θn | X) =f (X | θ1, θ2, . . . , θn)f (θ1, θ2, . . . , θn)

f (X).

5 And outlines of these posteriors, like marginal distributions etc.f (θ1 | X) =

∫ ∫· · ·∫

f (θ1, θ2, . . . , θn | X)dθ2dθ3 · · · dθn.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 33 / 43

Page 74: Introduction to Bayesian Statistics for non-mathematicians

logo

Hypotheses testing as an inference problem

If θ ∈ Θ our interest is

H1 : θ ∈ Θ1, H2 : θ ∈ Θ2

these hypotheses, where Θ1 y Θ2 form a partition of Θ, that is,Θ1 ∩Θ2 = ∅ y Θ1 ∪Θ2 = Ω. In Bayesian statistics terms, given a modelf (X | θ), a a priori f (θ) and observations X = (X1,X2, . . . ,Xn), Whatcould it mean to “test” the above hypotheses?Remmember:Uncertanty is quantified with a probability measure

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 34 / 43

Page 75: Introduction to Bayesian Statistics for non-mathematicians

logo

Hypotheses testing as an inference problem

If θ ∈ Θ our interest is

H1 : θ ∈ Θ1, H2 : θ ∈ Θ2

these hypotheses, where Θ1 y Θ2 form a partition of Θ, that is,Θ1 ∩Θ2 = ∅ y Θ1 ∪Θ2 = Ω. In Bayesian statistics terms, given a modelf (X | θ), a a priori f (θ) and observations X = (X1,X2, . . . ,Xn), Whatcould it mean to “test” the above hypotheses?Remmember:Uncertanty is quantified with a probability measure

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 34 / 43

Page 76: Introduction to Bayesian Statistics for non-mathematicians

logo

Let f (θ) an a priori for θ. We calculate

P(Hi ) =

∫Θi

f (θ | X)dθ

and “prefer” or “data support” H1 if P(H1) > P(H2) (equivalently for H2).Moreover, we could have more than two hypotheses

Hi : θ ∈ Θi ,

and we would only require the the corresponding posterior probability foreach of them.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 35 / 43

Page 77: Introduction to Bayesian Statistics for non-mathematicians

logo

Let f (θ) an a priori for θ. We calculate

P(Hi ) =

∫Θi

f (θ | X)dθ

and “prefer” or “data support” H1 if P(H1) > P(H2) (equivalently for H2).Moreover, we could have more than two hypotheses

Hi : θ ∈ Θi ,

and we would only require the the corresponding posterior probability foreach of them.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 35 / 43

Page 78: Introduction to Bayesian Statistics for non-mathematicians

logo

Example: Hypothese testing

We have an experimental treatment for a condition which is used in 20patients with similar cohort characteristics, from which 15 have recoverfrom the condition (success). The standard treatment has a probability ofsuccess of 50%. The following hypotheses is stated The experimentaltreatment is superior to the standard treatment.

The hypotheses can be translated as

H1 : θ > 0.5, H2 : θ ≤ 0.5

where θ is the probability of success of the experimental treatment. Notmuch is known about the experimental treatment and a uniform (flat;Beta( 1, 1)) prior is used. The corresponding posterior is Beta( 16, 6), seefigure.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 36 / 43

Page 79: Introduction to Bayesian Statistics for non-mathematicians

logo

Example: Hypothese testing

We have an experimental treatment for a condition which is used in 20patients with similar cohort characteristics, from which 15 have recoverfrom the condition (success). The standard treatment has a probability ofsuccess of 50%. The following hypotheses is stated The experimentaltreatment is superior to the standard treatment.

The hypotheses can be translated as

H1 : θ > 0.5, H2 : θ ≤ 0.5

where θ is the probability of success of the experimental treatment. Notmuch is known about the experimental treatment and a uniform (flat;Beta( 1, 1)) prior is used. The corresponding posterior is Beta( 16, 6), seefigure.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 36 / 43

Page 80: Introduction to Bayesian Statistics for non-mathematicians

logo

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 37 / 43

Page 81: Introduction to Bayesian Statistics for non-mathematicians

logo

We have that a priori P(H2) = 0.5 and a posterioriP(H2 | X) = 0.01330185.

H1 : θ > 0.5, H2 : θ ≤ 0.5

We may we (or your) conclude?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 38 / 43

Page 82: Introduction to Bayesian Statistics for non-mathematicians

logo

We have that a priori P(H2) = 0.5 and a posterioriP(H2 | X) = 0.01330185.

H1 : θ > 0.5, H2 : θ ≤ 0.5

We may we (or your) conclude?

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 38 / 43

Page 83: Introduction to Bayesian Statistics for non-mathematicians

logo

Radiocarbon Calibration, one det.

We have that y ∼ N(µ(θ), σ). Considering the errors in the calibration

curve the model should be yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

Therefore the likelihood is

f (Data|θ) = f (y |θ)

And the posterior is f (θ|y) ∝ f (θ)f (y |θ)

Kf (θ)1√

σ(θ)2 + σ2exp

(y − µ(θ))2

2σ2

,

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 39 / 43

Page 84: Introduction to Bayesian Statistics for non-mathematicians

logo

Radiocarbon Calibration, one det.

We have that y ∼ N(µ(θ), σ). Considering the errors in the calibration

curve the model should be yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

Therefore the likelihood is

f (Data|θ) = f (y |θ)

And the posterior is f (θ|y) ∝ f (θ)f (y |θ)

Kf (θ)1√

σ(θ)2 + σ2exp

(y − µ(θ))2

2σ2

,

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 39 / 43

Page 85: Introduction to Bayesian Statistics for non-mathematicians

logo

Radiocarbon Calibration, one det.

We have that y ∼ N(µ(θ), σ). Considering the errors in the calibration

curve the model should be yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

Therefore the likelihood is

f (Data|θ) = f (y |θ)

And the posterior is f (θ|y) ∝ f (θ)f (y |θ)

Kf (θ)1√

σ(θ)2 + σ2exp

(y − µ(θ))2

2σ2

,

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 39 / 43

Page 86: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.

2 It is assumed that the pot was made “around” the same time as thecorn was harvested.

3 Prior information on θ is provided by f (θ).

We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.

From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be

yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43

Page 87: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.

2 It is assumed that the pot was made “around” the same time as thecorn was harvested.

3 Prior information on θ is provided by f (θ).

We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.

From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be

yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43

Page 88: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.

2 It is assumed that the pot was made “around” the same time as thecorn was harvested.

3 Prior information on θ is provided by f (θ).

We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.

From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be

yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43

Page 89: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.

2 It is assumed that the pot was made “around” the same time as thecorn was harvested.

3 Prior information on θ is provided by f (θ).

We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.

From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be

yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43

Page 90: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.

2 It is assumed that the pot was made “around” the same time as thecorn was harvested.

3 Prior information on θ is provided by f (θ).

We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.

From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be

yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43

Page 91: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.

2 It is assumed that the pot was made “around” the same time as thecorn was harvested.

3 Prior information on θ is provided by f (θ).

We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.

From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be

yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43

Page 92: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

1 All radiocarbon dated corn kernels are associated to the samecalendar date θ.

2 It is assumed that the pot was made “around” the same time as thecorn was harvested.

3 Prior information on θ is provided by f (θ).

We have a series of radiocarbon determinations y1, y2, . . . , ym with theirstandard errors σ1, σ2, . . . , σm corresponding to m corn kernels.

From point 2 above we have that yj ∼ N(µ(θ), σj), and we also assumethat these are independent, conditional on θ (and the standard errors).Considering the errors in the calibration curve the model should be

yj ∼ N(µ(θ),

√σ(θ)2 + σ2

j

).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 40 / 43

Page 93: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

Therefore the likelihood is

f (Data|θ) = f (y1, y2, . . . , ym|θ) =m∏

j=1

f (yj |θ)

And the posterior is f (θ|y1, . . . , ym) ∝ f (θ)∏m

j=1 f (yj |θ), or

f (θ|y1, . . . , ym) = Kf (θ)m∏

j=1

1√σ(θ)2 + σ2

j

exp

(yj − µ(θ))2

2σ2j

,

where K is a normalizing constant.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 41 / 43

Page 94: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

Therefore the likelihood is

f (Data|θ) = f (y1, y2, . . . , ym|θ) =m∏

j=1

f (yj |θ)

And the posterior is f (θ|y1, . . . , ym) ∝ f (θ)∏m

j=1 f (yj |θ), or

f (θ|y1, . . . , ym) = Kf (θ)m∏

j=1

1√σ(θ)2 + σ2

j

exp

(yj − µ(θ))2

2σ2j

,

where K is a normalizing constant.

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 41 / 43

Page 95: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

Four radiocarbon dates are taken from 4 of the maize kernels. Theobtained dates are:

sim1 340 20sim2 370 20sim3 355 20sim4 360 20

The posterior distribution is calculated as above, see next slide, Figure (a).

However, knowledge of basic Mexican history tells us that the Mexicaumpire fell to Conquistador Hernan Cortez in 1521 AD. Including suchprior information we obtain the next slide, Figure (b).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 42 / 43

Page 96: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

Four radiocarbon dates are taken from 4 of the maize kernels. Theobtained dates are:

sim1 340 20sim2 370 20sim3 355 20sim4 360 20

The posterior distribution is calculated as above, see next slide, Figure (a).

However, knowledge of basic Mexican history tells us that the Mexicaumpire fell to Conquistador Hernan Cortez in 1521 AD. Including suchprior information we obtain the next slide, Figure (b).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 42 / 43

Page 97: Introduction to Bayesian Statistics for non-mathematicians

logo

Solution to the Mexica pot problem

Four radiocarbon dates are taken from 4 of the maize kernels. Theobtained dates are:

sim1 340 20sim2 370 20sim3 355 20sim4 360 20

The posterior distribution is calculated as above, see next slide, Figure (a).

However, knowledge of basic Mexican history tells us that the Mexicaumpire fell to Conquistador Hernan Cortez in 1521 AD. Including suchprior information we obtain the next slide, Figure (b).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 42 / 43

Page 98: Introduction to Bayesian Statistics for non-mathematicians

logo

550 500 450 400 350 300 250

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

Cal. BP

Probability

550 500 450 400 350 300 250

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

Cal. BP

Probability

(a) (b)

Figure: Posterior distribution for the age of the maize kernels, (1) no prior(constant), (b) a priori distribution indicating θ ≥ 429 BP (= 1521 AD).

JA Christen (CIMAT) Intro to Bayesian Stats. January 2008 43 / 43


Recommended