Chapter 7: Bayesian Econometrics - univ-orleans.fr · 1. Introduction References Geweke J. (2005),...

Post on 20-May-2020

23 views 0 download

transcript

Chapter 7: Bayesian Econometrics

Christophe Hurlin

University of Orléans

June 26, 2014

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 1 / 246

Section 1

Introduction

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 2 / 246

1. Introduction

The outline of this chapter is the following:

Section 2. Prior and posterior distribution

Section 3. Posterior distributions and inference

Section 4. Applications (VAR and DSGE)

Section 4. Numerical simulations

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 3 / 246

1. Introduction

References

Geweke J. (2005), Contemporary Bayesian Econometrics and Statistics. NewYork: John Wiley and Sons. (advanced)

Geweke J., Koop G. and Van Dijk H. (2011), The Oxford Handbook ofBayesian Econometrics. Oxford University Press.

Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil

Greenberg E. (2008), Introduction to Bayesian Econometrics, CambridgeUniversity Press. (recommended)

Koop, G. (2003), Bayesian Econometrics. New York: JohnWiley and Sons.

Lancaster T. (2004), An Introduction to Modern Bayesian Inference. OxfordUniversity Press.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 4 / 246

1. Introduction

Notations: In this course, I will (try to...) follow some conventions ofnotation.

Y random variabley realisationfY (y) probability density function or probability mass functionFY (y) cumulative distribution functionPr (.) probabilityy vectorY matrix

Problem: this system of notations does not allow to discriminate betweena vector (matrix) of random elements and a vector (matrix) ofnon-stochastic elements (realisation).

Abadir and Magnus (2002), Notation in econometrics: a proposal for astandard, Econometrics Journal.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 5 / 246

Section 2

Prior and posterior distributions

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 6 / 246

2. Prior and posterior distributions

Objectives

The objective of this section are the following:

1 Introduce the concept of prior distribution

2 Dene the hyperparameters

3 Dene the concept of posterior distribution

4 Dene the concept of unormalised posterior distribution

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 7 / 246

2. Prior and posterior distributions

In statistics, there a distinction between two concepts of probability:

1 Frequentist probability

2 Subjective probability

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 8 / 246

2. Prior and posterior distributions

Frequentist probability

Frequentists restrict the assignment of probabilities to statements thatdescribe the outcome of an experiment that can be repeated.

ExampleStatement A1: A coin tossed three times will come up heads either twoor three times.We can imagine repeating the experiment of tossing acoin three times and recording the number of times that two or threeheads were reported. Then:

Pr (A1) = limn!∞

number of times two or three heads occursn

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 9 / 246

2. Prior and posterior distributions

Subjective probability

1 Those who take the subjective view of probability believe thatprobability theory is applicable to any situation in which there isuncertainty.

2 Outcomes of repeated experiments fall in that category, but so dostatements about tomorrows weather, which are not the outcomes ofrepeated experiments.

3 Calling probabilities subjectivedoes not imply that they can be setarbitrarily, and probabilities set in accordance with the axioms areconsistent.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 10 / 246

2. Prior and posterior distributions

Reminder

The probability of event A is denoted by Pr (A). Probabilities are assumedto satisfy the following axioms:

1 0 Pr (A) 1

2 Pr (A) = 1 if A represents a logical truth

3 If A and B describe disjoint events (events that cannot both occur),then Pr (A[ B) = Pr (A) + Pr (B)

4 Let Pr (AjB) denote the probability of A, given (or conditioned onthe assumption) that B is true.Then

Pr (AjB) = Pr (A\ B)Pr (B)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 11 / 246

2. Prior and posterior distributions

Reminder

DenitionThe union of A and B is the event that A or B (or both) occur; it isdenoted by A[ B.

DenitionThe intersection of A and B is the event that both A and B occur; it isdenoted by A\ B.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 12 / 246

2. Prior and posterior distributions

Questions

1 What is the fundamental idea of the posterior distribution?

2 How it can be computed from the likelihood function and the priordistribution?

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 13 / 246

2. Prior and posterior distributions

Example (Subjective view of probability)Let Y a binary variable with Y = 1 if a coin toss results in a head and 0otherwise, and let

Pr (Y = 1) = θ

Pr (Y = 0) = 1 θ

which is assumed to be constant for each trial. In this model, θ is aparameter and the value of Y is the data (realisation y).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 14 / 246

2. Prior and posterior distributions

Example (contd)Under these assumptions, Y is said to have the Bernoulli distribution,written as

Y Be (θ)with a probability mass function (pmf)

fY (y ; θ) = Pr (Y = y) = θy (1 θ)1y

We consider a sample of i .i .d . variables (Y1, ..,Yn) that corresponds to nrepeated experiments. The realisation is denoted by (y1, .., yn) .

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 15 / 246

2. Prior and posterior distributions

Frequentist approach

1 From the frequentist point of view, probability theory can tell ussomething about the distribution of the data for a given θ.

Y Be (θ)

2 The parameter θ is an unknown number between zero and one.

3 It is not given a probability distribution of its own, because it isnot regarded as being the outcome of a repeated experiment.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 16 / 246

2. Prior and posterior distributions

Fact (Frequentist approach)

In a frequentist approach, the parameters θ are considered as constantterms and the aim is to study the distribution of the data given θ,through the likelihood of the sample.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 17 / 246

2. Prior and posterior distributions

Reminder

The likelihood function is dened as to be:

Ln : ΘRn! R+

(θ; y1, .., yn) 7! Ln (θ; y1, .., yn) = fY1,..,YN (y1, y2, .., yn; θ)

Under the i .i .d . assumption

(θ; y1, .., yn) 7! Ln (θ; y1, .., yn) =n

∏i=1fY (yi ; θ)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 18 / 246

2. Prior and posterior distributions

Remark: the (log-)likelihood function depends on two arguments: thesample (realisation) and θ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 19 / 246

2. Prior and posterior distributions

Reminder

In our example, Y is a discrete variable and the likelihood can beinterpreted as the joint probability to observe the sample (realisation)(y1, .., yn) given a value of θ

Ln (θ; y1.., yn) = Pr ((Y1 = y1) \ ...\ (Yn = yn))

If the sample (Y1, ..,Yn) is i .i .d ., then:

Ln (θ; y1.., yn) =n

∏i=1Pr (Yi = yi ) =

n

∏i=1fY (yi ; θ)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 20 / 246

2. Prior and posterior distributions

Reminder

In our example, the likelihood of the sample (y1, .., yn) is

Ln (θ; y1.., yn) =n

∏i=1fY (yi ; θ)

=n

∏i=1

θyi (1 θ)1yi

Or equivalently

Ln (θ; y1.., yn) = θΣyi (1 θ)Σ(1yi )

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 21 / 246

2. Prior and posterior distributions

Notations:

In the rest of the chapter, I will use the following alternative notations:

Ln (θ; y) L (θ; y1, .., yn) Ln (θ)

`n (θ; y) ln Ln (θ; y) ln L (θ; y1, .., yn) ln Ln (θ)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 22 / 246

2. Prior and posterior distributions

Frequentist approach

From the subjective point of view, however, θ is an unknownquantity.

Since there is uncertainty over its value, it can be regarded as arandom variable and assigned a probability distribution.

Before seeing the data, it is assigned a prior distribution

π (θ) with 0 θ 1

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 23 / 246

2. Prior and posterior distributions

Denition (Prior distribution)

In a Bayesian framework, the parameters θ associated to the distributionof the data, are considered as random variables. Their distribution iscalled the prior distribution and is denoted by π (θ) .

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 24 / 246

2. Prior and posterior distributions

Remark

In our example, the endogenous variable Yi is discrete (0 or 1):

Yi Be (θ)

but the parameter θ = Pr (Yi = 1) can be considered as a continuousrandom variable over [0, 1]: in this case π (θ) is a pdf.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 25 / 246

2. Prior and posterior distributions

Example (Prior distribution)For instance, we may postulate that:

θ U[0,1]

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 26 / 246

2. Prior and posterior distributions

Remark

Whatever the type of the distribution of the endogenous variable (discreteor continuous), the prior distribution is generally continuous.

π (θ) = probability density function (pdf)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 27 / 246

2. Prior and posterior distributions

Denition (Uninformative prior)

An uninformative prior is a at prior. Example: θ U[0,1]

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 28 / 246

2. Prior and posterior distributions

Remark

In most of cases, the prior distribution is parametrised, i.e. the pdfπ (θ;γ) depends on a set of parameters γ.

Denition (Hyperparameters)The parameters of the prior distribution, called hyperparameters, aresupplied by the researcher.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 29 / 246

2. Prior and posterior distributions

Example (Hyperparameters)

If θ 2 R and if the prior distribution is normal

π (θ;γ) =1

σp2π

exp

(θ µ)2

2σ2

!

with γ =µ, σ2

> the vector of hyperparameters.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 30 / 246

2. Prior and posterior distributions

Example (Beta prior distribution)

If θ 2 [0, 1] , a common (parametrised) prior distribution is the Betadistribution denoted B (α, β) .

π (θ;γ) =Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1 α, β > 0 θ 2 [0, 1]

with γ = (α, β)> the vector of hyperparameters.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 31 / 246

2. Prior and posterior distributions

Beta distribution: reminder

The gamma function, denoted Γ (p) , is dened as to be:

Γ (p) =Z +∞

0exxp1dx p > 0

with:Γ (p) = (p 1) Γ (p 1)

Γ (p) = (p 1)! = (p 1) (p 2) ... 2 1 if p 2 N

Γ12

=p

π

.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 32 / 246

2. Prior and posterior distributions

The Beta distribution B (α, β) has very interesting features:

1 Depending on the choice of α and β, this prior can capture beliefs thatindicate θ is centered at 1/2, or it can shade θ toward zero or one;

2 It can be highly concentrated, or it can be spread out;

3 When both parameters are less than one, it can have two modes.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 33 / 246

2. Prior and posterior distributions

The shape of a Beta distribution can be understood by examining itsmean and variance:

θ B (α, β)

E (θ) =α

α+ βV (θ) =

αβ

(α+ β)2 (α+ β+ 1)

1 The mean is equal to 1/2 if α = β

2 A larger α (β) shades the mean toward 1 (0)

3 the variance decreases as α or β increases.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 34 / 246

2. Prior and posterior distributions

0 0.2 0.4 0.6 0.8 10

2

4

6

a = 0.5 b = 0.5

0 0.2 0.4 0.6 0.8 10

2

4

6

a = 1.0 b = 1.0

0 0.2 0.4 0.6 0.8 10

2

4

6

a = 5.0 b = 5.0

0 0.2 0.4 0.6 0.8 10

2

4

6

a = 30.0 b = 30.0

0 0.2 0.4 0.6 0.8 10

2

4

6

a = 10.0 b = 5.0

0 0.2 0.4 0.6 0.8 10

2

4

6

a = 1.0 b = 30.0

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 35 / 246

2. Prior and posterior distributions

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 36 / 246

2. Prior and posterior distributions

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 37 / 246

2. Prior and posterior distributions

Remark

In some models, the hyperparameters are stochastic: as for theparameters of interest θ, they have a distribution.

This models are called hierarchical models

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 38 / 246

2. Prior and posterior distributions

Denition (Hierarchical models)In a hierarchical model, we add one or more additional levels, where thehyperparameters themselves are given a prior distribution depending onanother set of hyperparameters.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 39 / 246

2. Prior and posterior distributions

Example (hierarchical model)An example of hierarchical model is given by

y : pdf f (y j θ)

θ : pdf π ( θj α)α : pdf π (αj β)

where the hyperparameters β are constant terms.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 40 / 246

2. Prior and posterior distributions

Denition (Posterior distribution)

Bayesian inference centers on the posterior distribution π ( θj y), whichis the conditional distribution of the random variable θ given the data(realisation of the sample) y = (y1, .., yn).

θj (Y1 = y1, ..Yn = yn) posterior distribution

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 41 / 246

2. Prior and posterior distributions

Remark

When there is more than one parameter, the posterior distribution is ajoint conditional distribution of all the parameters given the observed data.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 42 / 246

2. Prior and posterior distributions

Notations

π (θ) prior distribution = pdf of θ

π ( θj y) posterior distribution = conditional pdf

Discrete endogenous variable

p (y ; θ) probability mass function (pmf)

Pr (A) probability of event A

Continuous endogenous variable

fY (y ; θ) probability distribution function (pdf)

FY (y ; θ) cumulative distribution function

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 43 / 246

2. Prior and posterior distributions

Warning

For all the general denitions and the general results, we will employ thenotation fY (y ; θ) for both probability mass and density functions

Be careful with the interpretation of fY (y ; θ): density (continuous case) ordirectly probability (discrete case).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 44 / 246

2. Prior and posterior distributions

Example (Discrete random variable)

If Y Be (θ) , the pmf given by:

fY (y ; θ) = Pr (Y = y) = θy (1 θ)1y

is a probability.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 45 / 246

2. Prior and posterior distributions

Example (Continuous random variable)

If Y Nµ, σ2

with θ =

µ, σ2

>, the pdf

fY (y ; θ) =1

σp2π

exp

(y µ)2

2σ2

!

is not a probability. The probability is given by

Pr (Y y) = FY (y ; θ) =yZ∞

fY (x ; θ) dx

Pr (Y = y) = 0

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 46 / 246

2. Prior and posterior distributions

The posterior density function π ( θj y) is computed by Bayes theorem:

Theorem (Bayess Theorem)For events A and B, the conditional probability of event A given that Bhas occurred is

Pr (AjB) = Pr (B jA) Pr (A)Pr (B)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 47 / 246

2. Prior and posterior distributions

Bayes theorem:

Pr (AjB) = Pr (B jA) Pr (A)Pr (B)

By setting A = θ and B = y , we have:

π ( θj y) =fY jθ (y j θ) π (θ)

fY (y)

wherefY (y) =

ZfY jθ (y j θ)π (θ) dθ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 48 / 246

2. Prior and posterior distributions

Denition (Posterior distribution)For one observation yi , the posterior distribution is the conditionaldistribution of θ given yi , dened as to be:

π ( θj yi ) =fYi jθ (yi j θ) π (θ)

fYi (yi )

wherefYi (yi ) =

fYi jθ (yi j θ)π (θ) dθ

and Θ the support of the distribution of θ.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 49 / 246

2. Prior and posterior distributions

Remark

π ( θj yi ) =fYi jθ (yi j θ) π (θ)

fYi (yi )

The term fYi jθ (yi j θ) corresponds to the likelihood of the observation yi :

fYi jθ (yi j θ) = Li (θ; yi )

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 50 / 246

2. Prior and posterior distributions

Remark

The e¤ect of dividing by fYi (yi ) is to make π ( θj yi ) a normalizedprobability distribution: integrating π ( θj yi ) with respect to θ yields 1:

Zπ ( θj yi ) dθ =

Z fYi jθ (yi j θ) π (θ)

fYi (yi )dθ

=1

fYi (yi )

ZfYi jθ (yi j θ) π (θ) dθ

= 1

since fYi (yi ) =Z

ΘfYi jθ (yi j θ)π (θ) dθ.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 51 / 246

2. Prior and posterior distributions

Discrete variable

For a discrete random variable Yi , by setting A = θ and B = y , we have:

π ( θj yi )| z posterior (pdf)

=

p (yi j θ)| z cond. probability

π (θ)| z prior (pdf)

p (yi )| z probability

wherep (yi ) =

Zp (yi j θ)π (θ) dθ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 52 / 246

2. Prior and posterior distributions

Denition (Prior predictive distribution)

The term p (yi ) or fYi (yi ) is sometimes called the prior predictivedistribution

p (yi ) =Zp (yi j θ)π (θ) dθ

fYi (yi ) =ZfY jθ (yi j θ)π (θ) dθ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 53 / 246

2. Prior and posterior distributions

Remark

1 In general, we consider an i .i .d . sample Y = (Y1, ..,YN ) with arealisation (data) y = (y1, .., yn) , and not only one observation.

2 In this case, the posterior distribution can be written as function ofthe likelihood of the i .i .d . sample y = (y1, .., yn) .

Ln (θ; y1, .., yn) =n

∏i=1Li (θ; yi ) =

n

∏i=1fYi (yi )

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 54 / 246

2. Prior and posterior distributions

Denition (Posterior distribution, sample)

For sample (y1, .., yn) , the posterior distribution is the conditionaldistribution of θ given yi , dened as to be:

π ( θj y1, ..yn) =Ln (θ; y1, .., yn) π (θ)

fY1,..,Yn (y1, .., yn)

where Ln (θ; y1, .., yn) is the likelihood of the sample and

fY1,..,Yn (y1, .., yn) =ZΘ

Ln (θ; y1, .., yn)π (θ) dθ

and Θ the support of the distribution of θ.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 55 / 246

2. Prior and posterior distributions

Notations

π ( θj y1, ..yn) =Ln (θ; y1, .., yn) π (θ)

fY1,..,Yn (y1, .., yn)

For simplicity, we skip the notation (y1, .., yn) for the sample and we putonly the generic term y :

π ( θj y) = Ln (θ; y) π (θ)

fY (y)=fY jθ (y j θ) π (θ)

fY (y)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 56 / 246

2. Prior and posterior distributions

Remark

π ( θj y1, ..yn) =Ln (θ; y1, .., yn) π (θ)

fY1,..Yn (y1, .., yn)

In this setting, the data (y1, ..yn) are viewed as constants whose(marginal) distributions do not involve the parameters of interest θ.

fY1,..Yn (y1, .., yn) = constant

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 57 / 246

2. Prior and posterior distributions

Remark (contd)

As a consequence, the Bayes theorem

Pr (parametersj data) = Pr (dataj parameters) Pr (parameters)Pr (data)

implies that

Pr (parametersj data)| z Posterior Density

∝ Pr (dataj parameters)| z Likelihood

Pr (parameters)| z Prior Density

where the symbol "∝" means is proportional to.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 58 / 246

2. Prior and posterior distributions

Denition (Unormalised posterior distribution)The unormalised posterior distribution is the product of the likelihoodof the sample and the prior distribution:

π ( θj y1, ..yn) ∝ Ln (θ; y1, .., yn) π (θ)

or with simplest notations

π ( θj y) ∝ Ln (θ; y) π (θ)

where the symbol "∝" means is proportional to.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 59 / 246

2. Prior and posterior distributions

Example (Posterior distribution)

Consider an i .i .d . sample (Y1, ..,Yn) of binary variables, such thatYi Be (θ) and:

fYi (yi ; θ) = Pr (Yi = yi ) = θyi (1 θ)1yi

We assume that the (uninformative) prior distribution for θ is an(continuous) uniform distribution over [0, 1].

Question: Write the pdf associated to the unormalised posteriordistribution and the posterior distribution

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 60 / 246

2. Prior and posterior distributions

Solution

π ( θj y1, ..yn) =Ln (θ; y1, .., yn) π (θ)

fY1,..Yn (y1, .., yn)

The sample (y1, .., yn) is i .i .d ., so its likelihood is dened as to be:

Ln (θ; y1, .., yn) =n

∏i=1fYi (yi ) =

n

∏i=1

θyi (1 θ)1yi

So, we have:Ln (θ; y1.., yn) = θΣyi (1 θ)Σ(1yi )

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 61 / 246

2. Prior and posterior distributions

Solution (contd)

π ( θj y1, ..yn) =Ln (θ; y1, .., yn) π (θ)

fY1,..Yn (y1, .., yn)

The (uninformative) priordistribution is

θ U[0,1]

with a pdf:

π (θ) =

10

if θ 2 [0, 1]otherwise

Source: wikipedia

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 62 / 246

2. Prior and posterior distributions

Solution (contd)

π ( θj y1, ..yn) ∝ Ln (θ; y1, .., yn) π (θ)

Ln (θ; y1.., yn) = θΣyi (1 θ)Σ(1yi )

π (θ) =

10

if θ 2 [0, 1]otherwise

The unormalised posterior distribution is

Ln (θ; y1, .., yn) π (θ) =

(θΣyi (1 θ)Σ(1yi )

0

if θ 2 [0, 1]otherwise

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 63 / 246

2. Prior and posterior distributions

Solution (contd)

π ( θj y1, ..yn) =Ln (θ; y1, .., yn) π (θ)

fY1,..Yn (y1, .., yn)

The joint density of (Y1, ..,Yn) evaluated at (y1, .., yn)

fY1,..Yn (y1, .., yn) =

1Z0

fY1,..Yn jθ (y1, .., yn j θ)π (θ) dθ

=

1Z0

Ln (θ; y1, .., yn)π (θ) dθ

=

1Z0

θΣyi (1 θ)Σ(1yi ) dθ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 64 / 246

2. Prior and posterior distributions

Solution (contd)

π ( θj y1, ..yn) =Ln (θ; y1, .., yn) π (θ)

fY1,..Yn (y1, .., yn)

Finally, we have

π ( θj y1, ..yn) =θΣyi (1 θ)Σ(1yi )Z 1

0θΣyi (1 θ)Σ(1yi ) dθ

if θ 2 [0, 1]

π ( θj y1, ..yn) = 0 if θ /2 [0, 1]

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 65 / 246

2. Prior and posterior distributions

Example (Posterior distribution)

Consider an i .i .d . sample (Y1, ..,Yn) of binary variables, such thatYi Be (θ) with θ = 0.3. We assume that the (uninformative) priordistribution for θ is an uniform distribution over [0, 1].

Question: Write a Matlab code in order (i) to generate a sample of sizen = 100, (ii) to display the pdf associated to the prior and the posteriordistribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 66 / 246

2. Prior and posterior distributions

0 0.2 0.4 0.6 0.8 10

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6x 10 ­28

θ

Unormalised Posterior Distribution

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 67 / 246

2. Prior and posterior distributions

0 0.2 0.4 0.6 0.8 10

1

2

3

4

5

6

7

8

9

θ

Posterior Distribution

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 68 / 246

2. Prior and posterior distributions

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 69 / 246

2. Prior and posterior distributions

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 70 / 246

2. Prior and posterior distributions

Example (Posterior distribution)

Consider an i .i .d . sample (Y1, ..,Yn) of binary variables, such thatYi Be (θ) and:

fYi (yi ; θ) = Pr (Yi = yi ) = θyi (1 θ)1yi

We assume that the prior distribution for θ is a Beta distribution B (α, β)with a pdf:

π (θ;γ) =Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1 α, β > 0 θ 2 [0, 1]

with γ = (α, β)> the vector of hyperparameters.

Question: Write the pdf associated to the unormalised posteriordistribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 71 / 246

2. Prior and posterior distributions

Solution

The likelihood of the sample (y1, .., yn) is:

Ln (θ; y1.., yn) = θΣyi (1 θ)Σ(1yi )

The prior distribution is:

π (θ;γ) =Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1

So the unormalised posterior distribution is:

π ( θj y1, ..yn) ∝ Ln (θ; y1, .., yn) π (θ)

∝Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1 θΣyi (1 θ)Σ(1yi )

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 72 / 246

2. Prior and posterior distributions

Solution (contd)

π ( θj y1, ..yn) ∝Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1 θΣyi (1 θ)Σ(1yi )

or equivalently:

π ( θj y1, ..yn) ∝Γ (α+ β)

Γ (α) Γ (β)θα1+Σyi (1 θ)β1+Σ(1yi )

Note that the term Γ (α+ β) / (Γ (α) Γ (β)) does not depend on θ. So,the unormalised posterior can also be written as:

π ( θj y1, ..yn) ∝ θ(α+Σyi )1 (1 θ)(β+Σ(1yi ))1

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 73 / 246

2. Prior and posterior distributions

Solution (contd)

π ( θj y1, ..yn) ∝ θ(α+Σyi )1 (1 θ)(β+Σ(1yi ))1

Remind that the pdf of a Beta B (α, β) distribution is

Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1

The posterior distribution is in the form of a beta distribution withparameters

α1 = α+n

∑i=1yi β1 = β+ n

n

∑i=1yi

This is an example of a conjugate prior, where the posterior distributionis in the same family as the prior distribution

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 74 / 246

2. Prior and posterior distributions

Denition (Conjugate prior)A conjugate prior is such that the posterior distribution is in the samefamily as the prior distribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 75 / 246

2. Prior and posterior distributions

Example (Posterior distribution)

Consider an i .i .d . sample (Y1, ..,Yn) of binary variables, such thatYi Be (θ) and:

fYi (yi ; θ) = Pr (Yi = yi ) = θyi (1 θ)1yi

We assume that the prior distribution for θ is a Beta distribution β (α, β)

Question: Determine the mean of the posterior distribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 76 / 246

2. Prior and posterior distributionsSolution

We know that:π ( θj y1, ..yn) ∝ θα11 (1 θ)β11

α1 = α+n

∑i=1yi β1 = β+ n

n

∑i=1yi

A simple way to normalise this posterior distribution consists in writing:

π ( θj y1, ..yn) =Γ (α1 + β1)

Γ (α1) Γ (β1)θα11 (1 θ)β11

This expression corresponds to the pdf of B (α1, β1) distribution and it isnormalised to 1 by denition:

1Z0

π ( θj y1, ..yn) dθ = 1

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 77 / 246

2. Prior and posterior distributionsSolution (contd)

π ( θj y1, ..yn) =Γ (α1 + β1)

Γ (α1) Γ (β1)θα11 (1 θ)β11

α1 = α+n

∑i=1yi β1 = β+ n

n

∑i=1yi

Using the properties of the B (α1, β1) distribution, we have:

E ( θj y1, ..yn) =α1

α1 + β1

=α+∑N

i=1 yiα+∑N

i=1 yi + β+ n∑Ni=1 yi

=α+∑N

i=1 yiα+ β+ n

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 78 / 246

2. Prior and posterior distributions

Solution (contd)

E ( θj y1, ..yn) =α+∑N

i=1 yiα+ β+ n

We can express the mean as a function of the MLE estimator,yn = n

1 ∑Ni=1 yi , as follows:

E ( θj y1, ..yn)| z posterior mean

=

α+ β

α+ β+ n

α

α+ β| z prior mean

+

n

α+ β+ n

yn|zMLE

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 79 / 246

2. Prior and posterior distributions

Remark

E ( θj y1, ..yn)| z posterior mean

=

α+ β

α+ β+ n

α

α+ β| z prior mean

+

n

α+ β+ n

yn|zMLE

1 If n! ∞, then the weight on the prior mean approaches zero, andthe weight on the MLE approaches one, implying

limn!∞

E ( θj y1, ..yn) = yn

2 If the sample size is very small, n! 0, then we have

limn!0

E ( θj y1, ..yn) =α

α+ β

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 80 / 246

2. Prior and posterior distributions

Example (Posterior distribution)

Consider an i .i .d . sample (Y1, ..,Yn) of binary variables, such thatYi Be (θ) and

∑Ni=1 yi = 3 if n = 10

∑Ni=1 yi = 15 if n = 50

We assume that the prior distribution for θ is a Beta distribution β (2, 2)

Question: Write a Matlab code to plot (i) the prior, (ii) the posteriordistribution and the (iii) the likelihood in the two cases.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 81 / 246

2. Prior and posterior distributions

0 0.2 0.4 0.6 0.8 10

1

2

θ

n=10

0 0.2 0.4 0.6 0.8 10

2

4x 10 ­3

LikelihoodPriorPosterior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 82 / 246

2. Prior and posterior distributions

0 0.2 0.4 0.6 0.8 10

1

2

θ

n=50

0 0.2 0.4 0.6 0.8 10

1

2x 10 ­11

LikelihoodPriorPosterior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 83 / 246

2. Prior and posterior distributions

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 84 / 246

2. Prior and posterior distributions

Key concepts of Section 2

1 Frequentist versus subjective probability

2 Prior distribution

3 Hyperparameters

4 Prior predictive distribution

5 Posterior distribution

6 Unormalised posterior distribution

7 Conjugate prior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 85 / 246

Section 3

Posterior distributions and inference

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 86 / 246

3. Posterior distributions and inference

Objectives

The objective of this section are the following:

1 Generalise the bayesian approach to a vector of parameters

2 Generalise the bayesian approach to a regression model

3 Introduce the Bayesian updating mechanism

4 Study the posterior distribution in the case of a large sample

5 Discuss the inference in a Bayesian framework

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 87 / 246

3. Posterior distributions and inference

The concept of posterior distribution can be generalized to:

1 a case with a vector of parameters θ = (θ1, .., θd )> .

2 a model with exogenous variable and/or lagged endogenous variables(linear regression model, time series models, DSGE etc.).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 88 / 246

Sub-Section 3.1

Generalisation to a vector of parameters

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 89 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Vector of parameters

Consider a model/variable with a pdf/pmf that depends on a vector ofparameters θ = (θ1, .., θd )

> .

The previous denitions of likelihood, prior, and posterior distributionsstill apply.

But they are now, respectively, the joint prior distribution and jointposterior distribution of the multivariate random variable θ

From the joint distributions, we may derive marginal and conditionaldistributions.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 90 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Denition (Marginal posterior distribution)

The marginal posterior distribution of θ1can be found by integratingout the remainder of the parameters from the joint posterior distribution:

π ( θ1j y) =Z

π ( θ1, .., θd j y) dθ2..dθd

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 91 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Denition (Conditional posterior distribution)

The conditional posterior distribution of θ1is dened as to be:

π ( θ1j θ2, .., θd , y) =π ( θ1, .., θd j y)π ( θ2, .., θd j y)

where the denominator on the right-hand side is the marginal posteriordistribution of (θ2, .., θd ) obtained by integrating θ1 from the jointdistribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 92 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Remark

In most applications, the marginal distribution of a parameter is moreuseful than its conditional distribution because the marginal takes intoaccount the uncertainty over the values of the remaining parameters, whilethe conditional sets them at particular values.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 93 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Example (Marginal posterior distribution)

Consider the multinomial distribution Mn (.), which generalizes theBernoulli example discussed above. In this model, each trial, assumedindependent of the other trials, results in one of d outcomes, labeled1, 2,..,d with probabilities θ1, .., θd where ∑d

i=1 θi = 1. When theexperiment is repeated n times and outcome i arises yi times, thelikelihood function is

Ln (θ j y1, .., yn) = θy11 θy22 ...θydd with ∑d

i=1 yi = n

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 94 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Example (contd)

Consider a Dirichlet prior distribution (generalisation of the Beta) with:

π (θ) =Γ

∑di=1 αi

∏di=1 Γ (αi )

θα111 θα21

2 ...θαd1d , αi > 0, ∑d

i=1 θi = 1

Question: Determine the marginal posterior distribution of θ1.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 95 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Solution

Following the previous procedure, we nd the (unormalised) posterior jointdistribution given the data y = (y1, .., yn) :

π (θj y1, ..yn) ∝ Ln (θj y1, .., yn) π (θ)

∝ θy11 θy22 ...θydd

Γ

∑di=1 αi

∏di=1 Γ (αi )

θα111 θα21

2 ...θαd1d

Or equivalently:

π (θj y1, ..yn) ∝ θy1+α111 θy2+α21

2 ...θyd+αd1d

Remark : the Dirichlet prior is a conjugate prior for the multinomial model.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 96 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Solution (contd)

The marginal distribution of θ1 is dened as to be:

π ( θ1j y1, .., yn) =1Z0

π (θj y1, ..yn) dθ2..dθd

In this context, we have:

π ( θ1j y1, .., yn)

=Γ (∑p

i=1 αi + yi )

∏pi=1 Γ (αi + yi )

θy1+α111

1Z0

θy2+α212 ...θyd+αd1

d dθ2..dθd

But, we can also use some general results about the Dirichlet distribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 97 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Solution (contd)

Denition (Dirichlet distribution)The Dirichlet distribution generalizes the Beta distribution. Letx = (x1, .., xp) with 0 xi 1, ∑p

i=1 xi = 1. Then x D (α1, .., αp) if

f (x ; α1, .αp) =Γ (∑p

i=1 αi )

∏pi=1 Γ (αi )

xα111 xα21

2 ...xαd1d , αi > 0

Marginally, we havexi B

αi ,∑k 6=i αk

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 98 / 246

3. Posterior distributions and inference3.1. Generalisation to a vector of parameters

Solution (contd)

π (θj y1, ..yn) ∝ θy1+α111 θy2+α21

2 ...θyd+αd1d

θj y1, ..yn D (y1 + α1, .., yp + αp)

The marginal posterior distribution of θ1 is a Beta distribution:

θ1j y1, ..yn B (y1 + α1,∑pi=2 yi + αi )

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 99 / 246

Sub-Section 3.2

Generalisation to a model

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 100 / 246

3. Posterior distributions and inference3.2. Generalisation to a model

Remark: We can also aim at estimating the parameters of a model (withdependent and explicative variables) such that:

y = g (x ; θ) + ε

where θ denotes the vector or parameters, X a set of explicative variables,ε and error term and g (.) the link function.

In this case, we generally consider the conditional distribution of Y givenX , which is equivalent to unconditional distribution of the error term ε :

Y jX D () ε D

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 101 / 246

3. Posterior distributions and inference3.2. Generalisation to a model

Notations (model)

Let us consider two continuous random variables Y and X

We assume that Y has a conditional distribution given X = x with apdf denoted fY jx (y ; θ) , for y 2 R

θ = (θ1..θK )| is a K 1 vector of unknown parameters. We assume

that θ 2 Θ RK .

Let us consider a sample fX1,YNgni=1 of i .i .d . random variables anda realisation fx1, yNgni=1 .

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 102 / 246

3. Posterior distributions and inference3.2. Generalisation to a model

Denition (Conditional likelihood function)

The (conditional) likelihood function of the i .i .d . sample fXi ,Yigni=1 isdened to be:

Ln (θ; y j x) =n

∏i=1fY jX (yi j xi ; θ)

where fY jX (yi j xi ; θ) denotes the conditional pdf of Yi given Xi .

Remark: The conditional likelihood function is the joint conditionaldensity of the data given θ.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 103 / 246

3. Posterior distributions and inference3.2. Generalisation to a model

Example (Linear Regression Model)Consider the following linear regression model:

yi = X>i β+ εi

where Xi is a K 1 vector of random variables and β = (β1..βK )> a

K 1 vector of parameters. We assume that the εi are i .i .d . withεi N

0, σ2

. Then, the conditional distribution of Yi given Xi = xi is:

Yi j xi Nx>i β, σ2

Li (θ; y j x) = fY jx (yi j xi ; θ) =1

σp2π

exp

yi x>i β

22σ2

!

where θ =

β> σ2>

is K + 1 1 vector.Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 104 / 246

3. Posterior distributions and inference3.2. Generalisation to a model

Example (Linear Regression Model, contd)

Then, if we consider an i .i .d . sample fyi , xigni=1, the correspondingconditional (log-)likelihood is dened to be:

Ln (θ; y j x) =n

∏i=1fY jX (yi j xi ; θ) =

n

∏i=1

1

σp2π

exp

yi x>i β

22σ2

!

=σ22π

n/2exp

12σ2

n

∑i=1

yi x>i β

2!

`n (θ; y j x) = n2lnσ2 n2ln (2π) 1

2σ2

n

∑i=1

yi x>i β

2

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 105 / 246

3. Posterior distributions and inference3.2. Generalisation to a model

Remark: Given this principle, we can derive the (conditional) likelihoodand the log-likelihood functions associated to a specic sample for anytype of econometric model in which the conditional distribution of thedependent variable is known.

Dichotomic models: probit, logit models etc.

Censored regression models: Tobit etc.

Times series models: AR, ARMA, VAR etc.

GARCH models

....

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 106 / 246

2. Prior and posterior distributions3.2. Generalisation to a model

Denition (Posterior distribution, model)

For the sample fyi , xigni=1 , the posterior distribution is the conditionaldistribution of θ given yi , dened as to be:

π ( θj y , x) = Ln (θ; y j x) π (θ)

fY jX (y j x)

where Ln (θ; y j x) is the likelihood of the sample and

fY jX (y j x) =ZΘ

Ln (θ; y j x)π (θ) dθ

and Θ the support of the distribution of θ.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 107 / 246

Sub-Section 3.3

Bayesian updating

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 108 / 246

3. Posterior distributions and inference3.3. Bayesian updating

Bayesian updating

A very attractive feature of Bayesian inference is the way in whichposterior distributions are updated as new information becomes available.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 109 / 246

3. Posterior distributions and inference3.3. Bayesian updating

Bayesian updating

As usual for the rst observation y1, we have

π ( θj y1) ∝ f (y1j θ)π (θ)

Next, suppose that a new data y2 is obtained, and we wish to compute theposterior distribution given the complete data set π ( θj y1, y2) :

π ( θj y1, y2) ∝ f (y1, y2j θ)π (θ)

The posterior can be rewritten as:

π ( θj y1, y2) ∝ f (y2j y1, θ) f (y1j θ)π (θ)

∝ f (y2j y1, θ)π ( θj y1)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 110 / 246

3. Posterior distributions and inference3.3. Bayesian updating

Denition (Bayesian updating)The posterior distribution is updated as new information becomesavailable as follows:

π ( θj y1, .., yn) ∝ f (yn j yn1, θ)π ( θj y1, .., yn1)

If the observations yn and yn1 are independent (i .i .d . sample), thenf (yn j yn1, θ) = f (yn j θ) and

π ( θj y1, .., yn) ∝ f (yn j θ)π ( θj y1, .., yn1)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 111 / 246

3. Posterior distributions and inference3.3. Bayesian updating

Example (Bayesian updating)

Consider an i .i .d . sample (Y1, ..,Yn) of binary variables, such thatYi Be (θ) and:

fYi (yi ; θ) = Pr (Yi = yi ) = θyi (1 θ)1yi

We assume that the prior distribution for θ is a Beta distribution β (α, β)with a pdf:

π (θ;γ) =Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1 α, β > 0 θ 2 [0, 1]

with γ = (α, β)> the vector of hyperparameters.

Question: Write the posterior of θ given (y1, y2) as a function of theposterior given y1.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 112 / 246

3. Posterior distributions and inference3.3. Bayesian updating

Solution

The likelihood of (y1, y2) is:

L (θ; y1, y2) = θy1+y2 (1 θ)2y1y2

The prior distribution is:

π (θ;γ) =Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1

So the unormalised posterior distribution is:

π ( θj y1, y2) ∝ L (θ; y1, y2) π (θ)

∝Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1 θy1+y2 (1 θ)2y1y2

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 113 / 246

3. Posterior distributions and inference3.3. Bayesian updating

Solution (contd)

π ( θj y1, y2) ∝Γ (α+ β)

Γ (α) Γ (β)θα1 (1 θ)β1 θy1+y2 (1 θ)2y1y2

or equivalently:

π ( θj y1, y2) ∝Γ (α+ β)

Γ (α) Γ (β)θα+y1+y21 (1 θ)β+2y1y21

So,θj y1, y2 B (α+ y1 + y2, β+ 2 y1 y2)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 114 / 246

3. Posterior distributions and inference3.3. Bayesian updating

Solution (contd)

π ( θj y1, y2) ∝Γ (α+ β)

Γ (α) Γ (β)θα+y1+y21 (1 θ)β+2y1y21

θj y1, y2 B (α+ y1 + y2, β+ 2 y1 y2)Given the observation y1, we have:

π ( θj y1) ∝Γ (α+ β)

Γ (α) Γ (β)θα+y11 (1 θ)β+1y11

θj y1 B (α+ y1, β+ 1 y1)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 115 / 246

3. Posterior distributions and inference3.3. Bayesian updating

Solution (contd)

π ( θj y1, y2) ∝Γ (α+ β)

Γ (α) Γ (β)θα+y1+y21 (1 θ)β+2y1y21

π ( θj y1) ∝Γ (α+ β)

Γ (α) Γ (β)θα+y11 (1 θ)β+1y11

The updating mechanism is given by:

π ( θj y1, y2) ∝ π ( θj y1) θy2 (1 θ)1y2

or equivalently

π ( θj y1, y2) ∝ π ( θj y1) fY2 (y2; θ)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 116 / 246

Sub-Section 3.4

Large sample

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 117 / 246

3. Posterior distributions and inference3.4. Large sample

Large samples

Although all statistical results for Bayesian estimators are necessarily"nite sample" (they are conditioned on the sample data), it remains ofinterest to consider how the estimators behave in large samples.

What is the the behavior of the posterior distribution when n is large?

π ( θj y1, .., yn)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 118 / 246

3. Posterior distributions and inference3.4. Large sample

Fact (Large sample)

Greenberg (2008) summarises the behavior of the posterior distributionwhen n is large as follows:

(1) the prior distribution plays a relatively small role in determining theposterior distribution,

(2) the posterior distribution converges to a degenerate distribution at thetrue value of the parameter,

(3) the posterior distribution is approximately normally distributed withmean bθ, the MLE of θ.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 119 / 246

3. Posterior distributions and inference3.4. Large sample

Large samples

What is the intuition of the two rst results?

1 the prior distribution plays a relatively small role in determining theposterior distribution

2 the posterior distribution converges to a degenerate distribution atthe true value of the parameter

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 120 / 246

3. Posterior distributions and inference3.4. Large sample

Denition (Log-Likelihood Function)The log-likelihood function is dened to be:

`n : ΘRn! R

(θ; y1, .., yn) 7! `n (θ; y1, .., yn) = ln (Ln (θ; y1, .., yn)) =n

∑i=1ln fY (yi ; θ)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 121 / 246

3. Posterior distributions and inference3.4. Large sample

Large samples

Introduce the mean log likelihood contribution (cf. chapter 3):

`n (θ; y) =n

∑i=1ln fY (yi ; θ) = n` (θ; y)

The posterior distribution can be written as

π ( θj y) ∝ expn` (θ; y)

| z depends on n

π (θ)| z does not depend on n

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 122 / 246

3. Posterior distributions and inference3.4. Large sample

Large samples

π ( θj y) ∝ expn` (θ; y)

| z depends on n

π (θ)| z does not depend on n

For large n, the exponential term dominates π (θ):

The prior distribution will play a relatively smaller role than do thedata (likelihood function), when the sample size is large.

Conversely, the prior distribution has relatively greater weight when nis small

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 123 / 246

3. Posterior distributions and inference3.4. Large sample

If we denote the true value of θ by θ0, it can be shown that

limn!∞

n` (θ; y) = n` (θ0; y)

Then, we have for n large:

π ( θj y) ∝ expn` (θ0; y)

| z π (θ)

does not depend on θ

8θ 2 Θ

Whatever the value of θ, the value of π ( θj y) tends to a constant timesthe prior.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 124 / 246

3. Posterior distributions and inference3.4. Large sample

Example (Large sample)

Consider an i .i .d . sample (Y1, ..,Yn) of binary variables, such thatYi Be (θ) with θ = 0.3 and:

fYi (yi ; θ) = Pr (Yi = yi ) = θyi (1 θ)1yi

We assume that the prior distribution for θ is a Beta distribution B (2, 2) .

Question: Write a Matlab code to illustrate the changes in the posteriordistribution with n.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 125 / 246

3. Posterior distributions and inference3.4. Large sample

An animation is worth 1,000,000 words...

Click me!

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 126 / 246

3. Posterior distributions and inference3.4. Large sample

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 127 / 246

Sub-Section 3.5

Inference

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 128 / 246

3. Posterior distributions and inference3.5. Inference

The output of the Bayesian estimation is the posterior distribution

π ( θj y)

1 In some particular case, the posterior distribution is a standarddistribution (Beta, normal etc..) and its pdf has an analytic form.

2 In other case, the posterior distribution is get from numericalsimulations (cf. section 5).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 129 / 246

3. Posterior distributions and inference3.5. Inference

Whatever the case (analytical or numerical), the outcome of the Bayesianestimation procedure may be:

1 The graph of the posterior density π ( θj y) for all values of θ 2 Θ

2 Some particular moments (expectation, variance etc..) or quantilesof this distribution

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 130 / 246

3. Posterior distributions and inference3.5. Inference

Source: Gerali et al. (2014), Credit and Banking in a DSGE Model of the Euro Area, JMCB, 42(6) 107-141.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 131 / 246

3. Posterior distributions and inference3.5. Inference

Source: Gerali et al. (2014), Credit and Banking in a DSGE Model of the Euro Area, JMCB, 42(6) 107-141.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 132 / 246

3. Posterior distributions and inference3.5. Inference

But, we may also be interested in estimating one parameter of the model.

The Bayesian approach to this problem uses the idea of a loss function

Lbθ, θ

where bθ is the Bayesian estimator (cf. chapter 2).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 133 / 246

3. Posterior distributions and inference3.5. Inference

Example (Absolute loss function)The absolute loss function is dened to be:

Lbθ, θ = bθ θ

Example (Quadratique loss function)The quadratic loss function is dened to be:

Lbθ, θ = bθ θ

2

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 134 / 246

3. Posterior distributions and inference3.5. Inference

Denition (Bayes estimator)

The Bayes estimator of θ is the value of bθ that minimizes the expectedvalue of the loss, where the expectation is taken over the posteriordistribution of θ; that is, bθ is chosen to minimize

ELbθ, θ = Z

Lbθ, θπ ( θj y) dθ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 135 / 246

3. Posterior distributions and inference3.5. Inference

Idea

The idea is to minimise the average loss whatever the possible value of θ

ELbθ, θ = Z

Lbθ, θπ ( θj y) dθ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 136 / 246

3. Posterior distributions and inference3.5. Inference

Under quadratic loss, we minimize

ELbθ, θ = Z bθ θ

2π ( θj y) dθ

By di¤erentiating the function with respect to bθ and setting the derivativeequal to zero

2Z bθ θ

π ( θj y) dθ = 0

or equivalently bθ = Zθ π ( θj y) dθ = E ( θj y)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 137 / 246

3. Posterior distributions and inference3.5. Inference

Denition (Bayes estimator, quadratic loss)For a quadratic loss function, the optimal Bayes estimator is theexpectation of the posterior distribution

bθ = E ( θj y) =Z

θ π ( θj y) dθ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 138 / 246

3. Posterior distributions and inference3.5. Inference

Source: Greene W. (2007), Econometric Analysis, sixth edition, Pearson - Prentice Hil

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 139 / 246

3. Posterior distributions and inference3.5. Inference

In addition to reporting a point estimate of a parameter θ, it is oftenuseful to report an interval estimate of the form

Pr (θL θ θU ) = 1 α

Bayesians call such intervals credibility intervals (or Bayesian condenceintervals) to distinguish them from a quite di¤erent concept that appearsin frequentist statistics, the condence interval (cf. chapter 2).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 140 / 246

3. Posterior distributions and inference3.5. Inference

Denition (Bayesian condence interval)If the posterior distribution is unimodal, then a Bayesian condenceinterval or credibility interval on the value of θ, is given by the two valuesθL and θU such that:

Pr (θL θ θU ) = 1 α

where α is the level of risk.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 141 / 246

3. Posterior distributions and inference3.5. Inference

For a Bayesian, values of θL and θL can be determined to obtain thedesired probability from the posterior distribution.

If more than one pair is possible, the pair that results in the shortestinterval may be chosen; such a pair yields the highest posterior densityinterval (h.p.d.).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 142 / 246

3. Posterior distributions and inference3.5. Inference

Denition (Highest posterior density interval)

The highest posterior density interval (hpd) is the smallest region H suchthat

Pr (θ 2 H) = 1 α

where α is the level of risk. If the posterior distribution is multimodal, thehpd may be disjoint.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 143 / 246

3. Posterior distributions and inference3.5. Inference

Source: Colletaz et Hurlin (2005), Modèles non linéaires et prévision, Rapport pour lInstitut pour la Recherche CDC.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 144 / 246

3. Posterior distributions and inference3.5. Inference

Another basic issue in statistical inference is the prediction of new datavalues.

Denition (Forecasting)

The general form of the pdf/pmf of Yn+1 given y1, .., yn is:

f (yn+1j y) =Zf (yn+1j θ, y)π ( θj y) dθ

where π ( θj y) is the posterior distribution of θ.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 145 / 246

3. Posterior distributions and inference3.5. Inference

Forecasting

f (yn+1j y) =Zf (yn+1j θ, y)π ( θj y) dθ

1 If Y is a discrete variable, this formula gives the conditionalprobability of Yn+1 = yn+1 given y1, .., yn :

Pr (Yn+1 = yn+1j y) =Zp (yn+1j θ, y)π ( θj y) dθ

2 If Y is a continuous variable, this formula gives the conditionaldenisty of Yn+1 given y1, .., yn. In this case, we can compute theexpected value of Yn+1 as:

E (yn+1j y) =Zf (yn+1j y) yn+1dyn+1

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 146 / 246

3. Posterior distributions and inference3.5. Inference

Forecasting

f (yn+1j y) =Zf (yn+1j θ, y)π ( θj y) dθ

If Yn+1 and Y1, ..,Yn are independent, then f (yn+1j θ) and we have:

f (yn+1j y) =Zf (yn+1j θ)π ( θj y) dθ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 147 / 246

3. Posterior distributions and inference3.5. Inference

Example (Forecasting)

Consider an i .i .d . sample (Y1, ..,Yn) of binary variables, such thatYi Be (θ). We assume that the prior distribution for θ is a Betadistribution B (α, β) . We want to forecast the value of Yn+1 given therealisations (y1, .., yn) .

Question: Determine the probability Pr (Yn+1 = 1j y1, .., yn).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 148 / 246

3. Posterior distributions and inference3.5. Inference

Solution

In this example, the trials are independent, so Yn+1 is independent of(Y1, ..,Yn) .

Pr (Yn+1 = 1j y) =ZPr (Yn+1 = 1j θ, y)π ( θj y) dθ =

ZPr (Yn+1 = 1j θ)π ( θj y) dθ

The posterior distribution of θ is is in the form of a beta distribution:

π ( θj y) = Γ (α1 + β1)

Γ (α1) + Γ (β1)θα11 (1 θ)β11

with

α1 = α+n

∑i=1yi β1 = α+ n

n

∑i=1yi

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 149 / 246

3. Posterior distributions and inference3.5. Inference

Solution (contd)

Pr (Yn+1 = 1j y) =ZPr (Yn+1 = 1j θ)π ( θj y) dθ

π ( θj y) = Γ (α1 + β1)

Γ (α1) + Γ (β1)θα11 (1 θ)β11

Since Pr (Yn+1 = 1j θ) = θ, we have

Pr (Yn+1 = 1j y) =Γ (α1 + β1)

Γ (α1) + Γ (β1)

1Z0

θ θα11 (1 θ)β11 dθ

=Γ (α1 + β1)

Γ (α1) + Γ (β1)

1Z0

θα1 (1 θ)β11 dθ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 150 / 246

3. Posterior distributions and inference3.5. Inference

Solution (contd)

Pr (Yn+1 = 1j y) =Γ (α1 + β1)

Γ (α1) + Γ (β1)

1Z0

θα1 (1 θ)β11 dθ

We admit that:

1Z0

θα1 (1 θ)β11 dθ =Γ (α1 + 1) Γ (β1)Γ (α1 + β1 + 1)

So, we have

Pr (Yn+1 = 1j y) =Γ (α1 + β1)

Γ (α1) + Γ (β1) Γ (α1 + 1) Γ (β1)

Γ (α1 + β1 + 1)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 151 / 246

3. Posterior distributions and inference3.5. Inference

Solution (contd)

Pr (Yn+1 = 1j y) =Γ (α1 + β1)

Γ (α1) + Γ (β1) Γ (α1 + 1) Γ (β1)

Γ (α1 + β1 + 1)

Since Γ (p) = (p 1) Γ (p 1) , we have:

Pr (Yn+1 = 1j y) =Γ (α1 + β1)

Γ (α1) + Γ (β1) Γ (α1) α1 Γ (β1)

Γ (α1 + β1) (α1 + β1)

=α1

α1 + β1

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 152 / 246

3. Posterior distributions and inference3.5. Inference

Solution (contd)

Pr (Yn+1 = 1j y) =α1

α1 + β1=

α+∑ni=1 yi

α+ β+ n

So, we found that

Pr (Yn+1 = 1j y) = E ( θj y) = α+∑ni=1 yi

α+ β+ n

The estimate of Pr (Yn+1 = 1j y) is the mean of the posterior distributionof θ.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 153 / 246

3. Posterior distributions and inference

Key concepts of Section 3

1 Marginal and conditional posterior distribution

2 Bayesian updating

3 Bayes estimator

4 Bayesian condence interval

5 Highest posterior density interval

6 Bayesian prediction

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 154 / 246

Section 4

Applications

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 155 / 246

5. Applications

Objectives

The objective of this section are the following:

1 To discuss the Bayesian estimation of VAR models

2 To propose various priors for this type of models

3 To introduce the issue of numerical simulations of the posterior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 156 / 246

5. Applications

Consider a typical VAR(p)

Yt = B1Yt1 +B2Yt2 + ..+BpYtp +Dzt + εt t = 1, ..,T

Yt is n 1 vector of endogenous variables

εt is a n 1 vector of error terms i .i .d . with

εt IIN0n1, Σnn

zt is a d 1 vector of exogenous variables

Bi for i = 1, .., p is a n n matrix of parameters

D is a n d matrix of parameters

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 157 / 246

5. Applications

Classical estimation of the parameters B1, ..,Bp ,D,Σ may yieldimprecisely estimated relations that t the data well only because ofthe large number of variables included: problem known as overtting

The number of parameters to be estimated n (np + d + (n 1) /2)grows geometrically with the number of variables (n) andproportionally with the number of lags (p).

When the number of parameters is relatively high and the sampleinformation is relatively loose (macro-data), it is likely that theestimates are inuenced by noise as opposed to signal

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 158 / 246

5. Applications

A Bayesian approach to VAR estimation was originally advocated byLitterman (1980) as solution to the overtting problem.

Litterman R. (1980), Techniques for forecasting with VectorAutoRegressions, University of Minnesota, Ph.D. dissertation.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 159 / 246

5. Applications

Bayesian estimation is a solution to the overtting that avoidsimposing exact zero restrictions on the parameters

The researcher cannot be sure that some coe¢ cients are zero andshould not ignore their possible range of variations

A Bayesian perspective ts precisely this view with the priordistribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 160 / 246

5. Applications

Yt = B1Yt1 +B2Yt2 + ..+BpYtp + εt

Rewrite the VAR in a compact form:

Yt = Xtβ+ εt

Xt = In Wt1 is n nk with k = p + d

Wt1 =Y>t1, ..,Y

>tp , z>t

>is k 1

β = vec (B1, ..,Bp ,D) is a nk 1 vector

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 161 / 246

5. Applications

Denition (Likelihood)Under the normality assumption, the condional distribution ofYt = Xtβ+ εt given Xt and the set of parameters β is normal

Yt jXt , β N (Xtβ,Σ)

and the likelihood of the (realisations) sample (Y1, ..,YT ) , denoted Y, is:

LT (YjX, β,Σ) ∝ jΣjT /2 exp

12

T

∑t=1(Yt Xtβ)> Σ1 (Yt Xtβ)

!

By convention, we will denote LT (YjX, β,Σ) = LT (Y, β,Σ)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 162 / 246

5. Applications

Denition (Joint posterior distribution)

Given a prior π (β,Σ) , the joint posterior distribution of the parametersβ is given by

π (β,ΣjY) = LT (Y, β,Σ)π (β,Σ)

f (Y)

or equivalentlyπ (β,ΣjY) ∝ LT (Y, β,Σ)π (β,Σ)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 163 / 246

5. Applications

Marginal posterior distribution

Given π (β,ΣjY) , the marginal posterior distribution for β and Σ can beobtained by integration

π (βjY) =Z

π (β,ΣjY) dΣ

π (ΣjY) =Z

π (β,ΣjY) dβ

where dΣ and dβ correspond to integrand dened for all the elements ofΣ or β respectively

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 164 / 246

5. Applications

Bayesian estimates

Location and dispersion of π (βjY) and π (ΣjY) can be easily analysedto yield point estimates (quadratic loss) of the parameters of interest andmeasure of precision, comparable to those obtained by using classicalapproach to estimation. Especially:

E (π (βjY)) V (π (βjY)) E (π (ΣjY))

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 165 / 246

5. Applications

Two problems arise:

1 The numerical integration of the marginal posterior distribution

2 The choice of the prior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 166 / 246

5. Applications

Fact (Numerical integration)

In most of case, the numerical integration of π (β,ΣjY) may be di¢ cultor even impossible to implement. For instance, if n = 1, p = 4, we have

π (ΣjY) =Z

π (β,ΣjY) dβ =Z Z Z Z

π (β,ΣjY) dβ1dβ2dβ3dβ4

This problem, however, can often be solved by using numerical integrationbased on Monte Carlo simulations.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 167 / 246

5. Applications

One particular MCMC (Markov-Chain Monte Carlo) estimation method isthe Gibbs sampler, which a particular version of the Metropolis-Hastingalgorithm (see section 5).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 168 / 246

5. Applications

Denition (Gibbs sampler)The Gibbs sampler is a recursive Monte Carlo method that allows togenerate simulated values of (β,Σ) from the joint posterior distribution.This method requires only knowledge of the full conditional posteriordistribution of the parameters of interest π (βjY,Σ) and π (ΣjY, β) .

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 169 / 246

5. Applications

Denition (Gibbs algorithm)

Suppose that β and Σ are sacalr. The Gibbs sampler algorithm startsfrom arbirtary values

β0,Σ0

, and samples alternatively from the density

of each element of the parameter vector, conditional of the value of theother element sampled in the previous iteration. Thus, the Gibbs samplersamples recursively as follows:

β1 from π

βjY ,Σ0

Σ1 from π

ΣjY , β1

β2 from π

βjY ,Σ1

...

βm from π

βjY ,Σm1

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 170 / 246

5. Applications

Fact (Gibbs sampler)

The vectors (βm ,Σm) form a Markov-Chain and for a su¢ ciently largenumber of iterations, m M, can be regarded as draws from the truejoint posterior distribution π (β,ΣjY ) . Given a large sample of drawsfrom this limiting distribution, (βm ,Σm)GMm=M+1, any posterior momentof marginal density of interest can then be easily estimated consistentlywith its corresponding sample average. For instance:

1G

G

∑m=M+1

βm ! E (βjY )

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 171 / 246

5. Applications

Remark

The process must be started somewhere, though it does not mattermuch where.

Nonetheless, a burn-in period is required to eliminate the inuence ofthe starting values. So, the rst M values (βm ,Σm) are discarded.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 172 / 246

5. Applications

Example (Gibbs sampler)We consider the bivariate normal distribution rst. Suppose we wished todraw a random sample from the population

X1X2

N

00

,

1 ρρ 1

Question: write a Matlab code to generate a sample of n = 1, 000observations of (x1, x2)

> with a Gibbs sampler.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 173 / 246

5. Applications

Solution

The Gibbs sampler takes advantage of the result

X1j x2 Nρx2,

1 ρ2

X2j x1 N

ρx1,

1 ρ2

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 174 / 246

5. Applications

­3 ­2 ­1 0 1 2 3­3

­2

­1

0

1

2

3

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 175 / 246

5. Applications

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 176 / 246

5. Applications

Priors

A second problem in implementing Bayesian estimation of VAR models isthe choice of the prior distribution for the models parameters

π (β,Σ)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 177 / 246

5. Applications

We distinguish two types of priors:

1 Some priors lead to analytical formula for the posterior distribution

1 Di¤use prior

2 Natural conjugate prior

3 Minnesoto prior

2 For other priors there is no analytical formula for the priors and theGibbs sampler is required.

1 Independent Normal-Wishart Prior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 178 / 246

5. Applications

Denition (Full Bayesian estimation)A full Bayesian estimation requires specifying prior distribution for thehyperparameters of the prior distribution of the parameters of interest.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 179 / 246

5. Applications

Denition (Empirical Bayesian estimation)The empirical Bayesian estimation is based on a preliminary estimationof the hyperparameters γ. These estimates could be obtained by OLS,maximum likelihood, GMM or other estimation method. Then, π (θ;γ) issubstituted by π (θ; bγ) . Note that the uncertainty in the estimation of thehyperparameters is not taken into account in the posterior distribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 180 / 246

5. Applications

Empirical Bayesian estimation

In this section, we consider only empirical Bayesian estimation methods:

Yt = Xtβ+ εt

εt IIN (0,Σ)Notations:

bβ = T

∑t=1X>t Xt

!1 T

∑t=1X>t Yt

!OLS estimator of β

bΣ = 1T k

T

∑t=1(Yt Xtβ)> (Yt Xtβ) OLS estimator of Σ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 181 / 246

5. Applications

Matlab codes for Baysesian estimation of VAR models are proposed byKoop and Korobilis (2010)

1 Analytical results:

1 Code BVAR_ANALYT.m gives posterior means and variances ofparameters & predictives, using the analytical formulas.

2 Code BVAR_FULL.m estimates the BVAR model combining all thepriors discussed below, and provides predictions and impulse responses

2 Gibbs sampler: Code BVAR_GIBBS.m estimates this model, butalso allows the prior mean and covariance.

Koop G. and Korobilis D. (2010), Bayesian Multivariate Time SeriesMethods for Empirical Macroeconomics

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 182 / 246

Sub-Section 4.1

Minnesota Prior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 183 / 246

5. Applications

Fact (Stylised facts)

Literman (1986) species his prior by appealing to three statisticalregularities of macroeconomic time series data:

(1) the trending behavior of macroeconomic time series

(2) more recent values of a series usually contain more information on thecurrent value of the series than past values

(3) past values of a given variable contain more information on its currentstate than past values of other variables.

Litterman R. (1986), Forecasting with Bayesian Vector AutoRegressions,JBES, 4, 25-38.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 184 / 246

5. Applications

A Bayesian researcher specify these regularities by assigning a probabilitydistribution to the parameters in such way that:

1 the mean of the coe¢ cients assigned to all lags other than the rstone is equal to zero.

2 the variance of the coe¢ cients depends inversely on the number oflags

3 the coe¢ cient of variable j in equation g are assigned a lower priorvariance than those of variable g .

These requirements will be controlled for by the hyperparameters

π = (π1, ..,πd )> =) prior on B1, ..,Bp ,D,Σ

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 185 / 246

5. Applications

Denition (Minnesota prior, Litterman 1986)In the Minnesota prior, the variance covariance matrix Σ is assumed to bexed and equal to bΣ. Denote βk for k = 1, .., n, the vector of parametersof the k ith equation, the corresponding prior distribution is:

βk N

βk ,Ωk

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 186 / 246

5. Applications

Remark

1 Given these assumptions, there is prior and posterior independencebetween equations.

2 The equations can be estimated separately.

3 By assuming Ωk = 0, the posterior mean of βk corresponds to theOLS estimator of βk .

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 187 / 246

5. Applications

Litterman (1986) then assigns numerical values to the hyperparametersgiven the previous stylised facts.

For the prior mean:

1 The traditional choice is to set βk = 0 for all the parameters if thek th variable is a growth rate (random walk behavior).

2 If the variable is in level all the hyperparameters are set to 0, exceptthe parameter associated to the rst own lag.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 188 / 246

5. Applications

The prior covariance matrix Ωk is assumed to be diagonal with elementsωi ,i for i = 1, .., k.

ωi ,i =

8<:a1r 2a2r 2

σiiσjj

a3σij

for coe¢ cients on own lag r = 1, .., pfor coe¢ cients on lag r of variable j 6= i , r = 1, .., pfor coe¢ cients on exogenous variables

This prior simplies the complicated choice of fully specifying all theele-ments of Ωk to choosing three scalars, a1, a2 and a3.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 189 / 246

5. Applications

Theorem (Posterior distribution)

If we assume Minnesota priors, the posterior distribution of βk is:

βk jY Neβk , eΩk

eβk = eΩk

eΩ1k βk + σ2k ,kX

>Yg

eΩk =

Ω1k + σ2k ,kX

>X1

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 190 / 246

Sub-Section 4.2

Di¤use Prior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 191 / 246

5. Applications

Denition (Di¤use prior)The Di¤use prior is dened as to be:

π (β,Σ) ∝ jΣj(n+1)/2

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 192 / 246

5. Applications

Remarkπ (β,Σ) ∝ jΣj(n+1)/2

Contrary to Minnesota prior:

1 Σ is not assumed to be xed

2 The equations are not independent

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 193 / 246

5. Applications

Theorem (Posterior distribution)For a Di¤usion prior distribution, the joint posterior distribution is given by:

π (β,ΣjY) = π (βjY,Σ) π (ΣjY)

βjY,Σ Nbβ,Σ

ΣjY IWbΣ,T k

where IW denotes the Inverted Wishart distribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 194 / 246

Sub-Section 4.3

Natural conjugate prior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 195 / 246

5. Applications

Denition (Natural conjugate prior)The natural conjugate prior has the form:

βjΣ N

β,ΣΩ

Σ IWΣ, α

with α > n and IW denotes the Inverted Wishart distribution with adegree of freedom equal to s. The hyperparameters are β, Ω, Σ and α.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 196 / 246

5. Applications

Theorem (Posterior distribution)The posterior distribution associated to the natural conjugate prior is:

βjΣ,Y Neβ,Σ eΩ

ΣjY IWeΣ,T + α

where eβ = eΩ eΩ1

β+X>XbβolseΩ =

Ω1+X>X

1

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 197 / 246

Sub-Section 4.4

Independent Normal-Wishart Prior

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 198 / 246

5. Applications

Denition (Independent Normal-Wishart Prior)The independent Normal-Wishart prior is dened as:

π

β,Σ1= π (β) π

Σ1

where

β N

β,Ω

Σ1 W

Σ1

, α

where W (., α) denotes the Wishart distribution with α degrees of freedom.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 199 / 246

5. Applications

Remark

1 Note that this prior allows for the prior covariance matrix, Ω, to beanything the researcher chooses, rather than the restrictive ΣΩ

form of the natural conjugate prior. For instance, the researcher couldchoose a prior similar in spirit to the Minnesota prior.

2 A noninformative prior can be obtained by setting

β = Ω = Σ1= 0

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 200 / 246

5. Applications

Theorem (Posterior distribution)In the case of independent Normal-Wishart prior, the joint posteriordistribution can not be derived analytically. We can only derive theconditional posterior distribution (used in the Gibbs sampler)

βjΣ,Y Neβ,Σ eΩ

ΣjY, β IWeΣ,T + α

where eβ = eΩ

Ω1

β+T

∑t=1X>t Σ1Yt

!

eΩ =

Ω1+

T

∑t=1X>t Σ1Xt

!1

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 201 / 246

Sub-Section 5

Simulation methods

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 202 / 246

5. Simulation methods

Objectives

The objective of this section are the following:

1 Introduce the Probability Integral Transform (PIT) method

2 Introduce the Accept-Reject (AR) method

3 Introduce the Importance sampling method

4 Introduce the Gibbs algorithm

5 Introduce the Metropolis Hasting algorithm

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 203 / 246

5. Simulation methods

In bayesian econometrics, we may distinguish two case give the prior:

1 When we use a conjugate prior, the posterior distribution issometimes "standard" (normal, gamma, beta etc..) and standardpackages of the statistic softwares may be used to compute thedensity π ( θj y) , its moments (for instance Eπ ( θj y)), the hpd etc.

2 In most of cases, the posterior density π ( θj y) is obtainednumerically through simulation methods.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 204 / 246

Sub-Section 5.1

Probability Integral Transform (PIT) method

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 205 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Probability Integral Transform (PIT) method

Suppose we wish to draw a sample of values from a continuousrandom variable X that has cdf FX (.), assumed to be nondecreasing.

The PIT methods allows to generate a sample of X from a sample ofrandom values issued from U where U has a uniform distributionover [0, 1] :

U U[0,1]

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 206 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Probability Integral Transform (PIT) method

Indeed, if we assume that the random variable U is a function of therandom variable X such that:

U = FX (X )

What is the distribution of U?

FX (x) = Pr (X x) = Pr (FX (X ) FX (x)) = Pr (U FX (x))

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 207 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Probability Integral Transform (PIT) method

FX (x) = Pr (X x) = Pr (FX (X ) FX (x)) = Pr (U FX (x))

So, we have:Pr (U FX (x)) = FX (x)

We know that if U has a uniform distribution then

Pr (U u) = u

So, U has a uniform distribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 208 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Denition (Probability Integral Transform (PIT) )

If X is a continuous random variable with a cdf FX (.) , then thetransformed (probability integral transformation) variable U = FX (X )has a uniform distribution over [0, 1] .

U = FX (X ) U[0,1]

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 209 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

How to get a trial for X from a trial from U?

Denition (PIT algorithm)In order to get a realisation x of the random variable of X with a cdfFX (.) , from a realisation u of the variable U with U U[0,1], thefollowing procedure has to be adopted:

(1) Draw u from U [0, 1] .

(2) Compute x = F1X (u)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 210 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Example (PIT method)Suppose we wish to draw a sample from a random variable with densityfunction

fX (x) = 3

8y2

0if 0 x 2otherwise

Question: Write a Matlab code a generate a sample (x1, .., xn) withn = 100.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 211 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Solution (contd)

First, determine the cdf of X :

FX (x) =38

xZ0

t2dt =18x3

So, we have:

U = FX (X ) =18X 3 U[0,1]

Then, determine the probability inverse transformation:

X = F1X (U) = 2U1/3

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 212 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Solution (contd)

0 20 40 60 80 1000 .2

0 .4

0 .6

0 .8

1

1 .2

1 .4

1 .6

1 .8

2

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 213 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Remark

Note that a multivariate random variable cannot be simulated by thismethod, because its cdf is not one-to-one and therefore not invertible.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 214 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Remark

An important application of this method is to the problem ofsampling from a truncated distribution

Suppose that X has a cdf FX (x) and that we want to generaterestricted values of X such that

c1 X c2

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 215 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Remark (contd)

The cdf of the truncated variable is

FX (x) F (c1)FX (c1) FX (c2)

for c1 x c2

Then, we have:

U =FX (X ) F (c1)FX (c1) FX (c2)

U[0,1]

and the truncated variable can be dened as:

Xtrunc = F1X (FX (c1) + U (FX (c2) FX (c1)))

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 216 / 246

5. Simulation methods5.1. Probability Integral Transform (PIT) method

Recommendation

1 Why using the PIT method?I In order to generate a sample of simulated values from a givendistribution

I This distribution is not available in my statistical software..

2 What are the prerequisites of the PIT?

I The functional form of the cdf is known and we need to knowF1 (x) ....

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 217 / 246

Sub-Section 5.2

Accept-reject method

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 218 / 246

5. Simulation methods5.2. Accept-reject method

Denition (Accept-reject method)

The acceptreject (AR) algorithm can be used to simulate values froma density function fX (.) if it is possible to simulate values from a densityg (.) and if a number c can be found such that

fX (x) c g (x)

for all x in the support of fX (.).

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 219 / 246

5. Simulation methods5.2. Accept-reject method

Denition (Target and source densities)

The density fX (.) is called the target density and g (.) is called thesource density.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 220 / 246

5. Simulation methods5.2. Accept-reject method

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 221 / 246

5. Simulation methods5.2. Accept-reject method

Posterior distribution

In the context of the Bayesian econometrics, we have:

π ( θj y) c g ( θj y) 8θ 2 Θ

c = supθ2Θ

π ( θj y)g ( θj y)

where:

1 Target density = posterior distribution π ( θj y)2 Source density = g ( θj y)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 222 / 246

5. Simulation methods5.2. Accept-reject method

Denition (AR algorithm, posterior distribution)The Accept-Reject algorithm for the posterior distribution is the following:

(1) Generate a value m θs from g ( θj y).(2) Draw a value u from U[0,1].

(3) Return θs as a draw from π ( θj y) if

u π ( θs j y)cg ( θs j y)

If not, reject it and return to step 1.(The e¤ect of this step is to accept θs

with probability π ( θs j y) /cg ( θs j y).)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 223 / 246

5. Simulation methods5.2. Accept-reject method

Example (AR algorithm)

We aim at simulating some realisations from a N (0, 1) distribution(target distribution) with a source density given by a Laplace distributionwith a pdf:

g (x) =12exp ( jx j)

Question: write a Matlab code to simulate a sample of n = 1, 000realisations of the normal distribution.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 224 / 246

5. Simulation methods5.2. Accept-reject method

Solution

In order to simplify the problem, we generate only positive values for x .

The source density can transformed as follows (exponential density withλ = 1).

g (x) = exp (x)

Since the normal and the Laplace are symmetric about zero, if theproposal (x > 0) is accepted, it is assigned a positive value withprobability one half and a negative value with probability one half.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 225 / 246

5. Simulation methods5.2. Accept-reject method

Solution (contd)

The pdf of the target and source distributions are for x > 0:

f (x) =1p2π

expx

2

2

g (x) = exp (x)

Determination of c : determine the maximum value of f (x) /g (x) :

f (x)g (x)

=1p2π

exp x 2

2

exp (x)

The maximum is reached for x = 1, then we have:

c =1p2π

exp12

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 226 / 246

5. Simulation methods5.2. Accept-reject method

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Ratio f(x)/g (x)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 227 / 246

5. Simulation methods5.2. Accept-reject method

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

f(x)c.g(x)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 228 / 246

5. Simulation methods5.2. Accept-reject method

Solution (contd)

The AR algorithm is the following:

1 Generate x from an exponential distribution with parameter λ = 1.

2 Generate u from an uniform distribution U[0,1]

3 If

u φ (x)c exp (x) =

1p2πexp

x2/2

1p2πexp (1/2) exp (x)

= expx x

2

2+ x

then return x if u > 1/2 and x if u 1/2. Otherwise, reject x .

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 229 / 246

5. Simulation methods5.2. Accept-reject method

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 230 / 246

5. Simulation methods5.2. Accept-reject method

Generated sample

0 200 400 600 800 1000­4

­3

­2

­1

0

1

2

3

Kernel density estimates

­5 ­4 ­3 ­2 ­1 0 1 2 3 40

0 .05

0 .1

0 .15

0 .2

0 .25

0 .3

0 .35

0 .4

0 .45

0 .5

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 231 / 246

5. Simulation methods5.2. Accept-reject method

Fact (Unormalised posterior distribution)An interesting feature of the AR algorithm is that is allows to simulatesome values for θ (from the posterior distribution π ( θj y)) by only usingthe unormalised posterior

π ( θj y) = f (y j θ)π ( θj y)f (y)

π ( θj y) ∝ f (y j θ)π ( θj y)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 232 / 246

5. Simulation methods5.2. Accept-reject method

Intuitionπ ( θj y) = 1

f (y)| z unknown

f (y j θ)π ( θj y)| z known

Assume that f (y j θ)π ( θj y) is known, but 1/f (y) is unknow.

If a value of θ generated from g ( θj y) is accepted with probabilityf (y j θ)π ( θj y) /cg ( θj y), the accepted values of θ are a sample fromπ ( θj y).This method can therefore be used even if the normalizing constant ofthe target distribution is unknow

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 233 / 246

5. Simulation methods5.2. Accept-reject method

Example (AR and posterior distribution)

Consider an i .i .d . sample (Y1, ..,Yn) of binary variables, such thatYi Be (θ) with θ = 0.3. We assume that the (uninformative) priordistribution for θ is an uniform distribution over [0, 1]. Then, we have

π ( θj y) = Ln (θ; y1, .., yn) π (θ)

fY1,..Yn (y1, .., yn)=

θΣyi (1 θ)Σ(1yi )Z 1

0θΣyi (1 θ)Σ(1yi ) dθ

We can use the pdf of the unormalised posterior eπ ( θj y) to simulate somevalues (θ1, ..θS ) from the posterior distribution

eπ ( θj y) = θΣyi (1 θ)Σ(1yi )

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 234 / 246

5. Simulation methods5.2. Accept-reject method

Recommendation

1 Why using the Accept-Reject method?I In order to generate a sample of simulated values from a givendistribution.

I This distribution is not available in my statistical software.

I To generate samples of θ from the unormalised posterior distribution

2 What are the prerequisites of the PIT?

I The functional form of the pdf of the target distribution is known

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 235 / 246

Sub-Section 5.3

Importance sampling

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 236 / 246

5. Simulation methods5.3. Importance sampling

Suppose that one is interested in calculating the value of the integral

I = E (h (θ)j y) =Z

Θh (θ)π ( θj y) dθ

where h (θ) is a continuous function.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 237 / 246

5. Simulation methods5.3. Importance sampling

Example (Importance sampling)Suppose that we want to compute the expectation and the variance of theposterior distribution

E ( θj y) =Z

Θθ π ( θj y) dθ

E

θ2 y = Z

Θθ2 π ( θj y) dθ

V ( θj y) = E

θ2 yE2 ( θj y)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 238 / 246

5. Simulation methods5.3. Importance sampling

I =Z

Θh (θ)π ( θj y) dθ

Consider a source density g ( θj y) that is easy to sample from and whichis a close match to π ( θj y) . Write:

I =Z

Θ

h (θ)π ( θj y)g ( θj y) g ( θj y) dθ

This integral can be approximated by drawing a sample of G values fromg ( θj y) denotes θg1 , .., θ

gG and computing

I ' 1G

G

∑i=1h (θgi )

π ( θgi j y)g ( θgi j y)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 239 / 246

5. Simulation methods5.3. Importance sampling

Denition (Importance sampling)The moment of the posterior distribution

E (h (θ)j y) =Z

Θh (θ)π ( θj y) dθ

can be approximated by drawing G realisations θg1 , .., θgG from a source

density g ( θgi j y) and computing

E (h (θ)j y) ' 1G

G

∑i=1h (θgi )

π ( θgi j y)g ( θgi j y)

This expression can be regarded as a weighted average of the h (θgi ) whereimportance weights are π ( θgi j y) /g ( θgi j y)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 240 / 246

5. Simulation methods5.3. Importance sampling

Example (Truncated exponential)Consider a continuous random variable X with an exponential distributionX exp (1) truncated to [0, 1] . We want to approximate

E

1

1+ X 2

by the importance sampling method with a source density B (2, 3)because it is dened over [0, 1] and because, for this choice of parameters,the match between the beta function and the target density is good.

Question: write a Matlab code to approximate this integral. Compare itto the value obtained by numerical integration.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 241 / 246

5. Simulation methods5.3. Importance sampling

Solution

For an exponential distribution with a rate parameter of 1 we have

f (x) = exp (x) F (x) = 1 exp (x) forx > 0

If this density is truncated over [0, 1] (truncated at right) we have

π (x) =f (x)F (1)

=exp (x)

1 exp (1)

So, we aim at computing:

E

1

1+ X 2

=

1Z0

11+ x2

exp (x)1 exp (1)dx

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 242 / 246

5. Simulation methods5.3. Importance sampling

Solution (contd)

The importance sampling algorithm is the following:

1 Generate a sample of G values x1, .., xG from a Beta distributionB (2, 3)

2 Compute

E

1

1+ X 2

' 1G

G

∑i=1

1

1+ x2i

| z

h(xi )

exp (xi )1 exp (1)

| z

π(xi )

1g2,3 (xi )| z g (xi )

where gα,β (xi ) is the pdf of the B (α, β) distribution evaluated at x

gα,β (x) =xα1 (1 x)β1

B (α, β)

and B(α, β) is the beta function.

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 243 / 246

5. Simulation methods5.3. Importance sampling

Results

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 244 / 246

5. Simulation methods5.3. Importance sampling

Recommendation

1 Why using the Importance sampling method?I In order to compute the moments of a given distribution (typically theposterior distribution).

2 What are the prerequisites of the PIT?

I The functional form of the pdf of the target distribution is known

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 245 / 246

End of Chapter 7

Christophe Hurlin (University of Orléans)

Christophe Hurlin (University of Orléans) Bayesian Econometrics June 26, 2014 246 / 246