+ All Categories
Home > Documents > Evaluating competing predictive distributions

Evaluating competing predictive distributions

Date post: 11-Aug-2015
Category:
Upload: andreas-collett
View: 19 times
Download: 0 times
Share this document with a friend
Popular Tags:
30
Evaluating competing predictive distributions. An out-of-sample forecast simulation study. Bachelor’s Thesis in Statistics Andreas C. Collett * January 7, 2015 Abstract This thesis aims to formulate a simple measurement that evaluates predic- tive distributions of out-of-sample forecasts between two competing models. Predictive distributions form a large part of today’s forecast models used for policy making. The possibility to compare predictive distributions between models is important for policy makers who make informed decisions based on probabilities. We conduct simulation studies to estimate autoregressive models and vector autoregressive models with Bayesian inference. The for- mulated measurement uses out-of-sample forecasts and predictive distribu- tions to evaluate the full forecast error probability distribution by forecast horizon. We find the measurement to be accurate and can be used to evalu- ate single forecasts or to calibrate forecast models. Keywords: Autoregressive, out-of-sample forecast, Bayesian inference, Gibbs sampling, prior distribution, posterior distribution and predictive distribu- tion. department of statistics autumn semester 2014 Course code SU-39434 * Correspondence to author: [email protected]. I am deeply grateful to my supervisor Professor Emeritus Daniel Thorburn for his commitment, time, notes and discussions.
Transcript

Evaluating competing predictive distributions. Anout-of-sample forecast simulation study.

Bachelor’s Thesis in Statistics

Andreas C. Collett∗

January 7, 2015

Abstract†

This thesis aims to formulate a simple measurement that evaluates predic-tive distributions of out-of-sample forecasts between two competing models.Predictive distributions form a large part of today’s forecast models used forpolicy making. The possibility to compare predictive distributions betweenmodels is important for policy makers who make informed decisions basedon probabilities. We conduct simulation studies to estimate autoregressivemodels and vector autoregressive models with Bayesian inference. The for-mulated measurement uses out-of-sample forecasts and predictive distribu-tions to evaluate the full forecast error probability distribution by forecasthorizon. We find the measurement to be accurate and can be used to evalu-ate single forecasts or to calibrate forecast models.

Keywords: Autoregressive, out-of-sample forecast, Bayesian inference, Gibbssampling, prior distribution, posterior distribution and predictive distribu-tion.

department of statistics

autumn semester 2014

Course code SU-39434

∗Correspondence to author: [email protected].†I am deeply grateful to my supervisor Professor Emeritus Daniel Thorburn for his

commitment, time, notes and discussions.

Contents

1. Introduction 2

2. Related Literature 4

3.1 Bayesian Inference 5

3.2 Autoregression 7

3.3 Vector Autoregression 11

4. Evaluating the Predictive Distribution 13

5. Simulation Study 15

6. Hyperparameters 17

7.1 Results 19

7.2 Univariate Simulation Results 20

7.3 Multivariate Simulation Results 22

8. Conclusion 26

Appendix 27

A1. Gibbs Sampler 27

1

1. Introduction

This thesis aims to formulate a simple measurement that evaluates predic-tive distributions of out-of-sample forecasts between two competing models.Predictive distributions form a large part of today’s forecast models used forpolicy making. The possibility to compare predictive distributions betweenmodels is important for policy makers who make informed decisions basedon probabilities. Out-of-sample forecasts are used to mimic the situationforecaster’s experience in real time and are used by academics in forecastmethodology research and by practitioners to calibrate forecast models. Bycombining predictive distributions and out-of-sample forecasts one can eval-uate the forecast error probability distribution. Earlier forecast evaluationliterature have tended to focus on point forecasts, either direct from a modelor by a certain value in the predictive distribution, rather than evaluatingthe full predictive distribution. This results in loss of information about theuncertainty of the forecasts and the forecast model. The contribution ofthis thesis is to formulate a simple measurement that uses this informationto evaluate forecasts at multiple horizons. There is recent research whichaddress this subject in various forms; Geweke and Amisano (2008, 2012),Warne et al. (2013) and Bauwens et al. (2014). However, the literature issmall relative to the literature of evaluating point forecasts.

We generate data samples from univariate autoregression (AR) and mul-tivariate vector autoregression (VAR) with known true parameters. We willuse Bayesian methods to estimate the data by AR and VAR models to obtainposterior inference and predictive distributions. A restrictive Gibbs sampleris implemented to conduct the Bayesian inference. These methods allow usto produce predictive distributions and to explore the theory and applica-tion of Bayesian analysis throughout a simple example. The restrictive Gibbssampler is a popular method to obtain posterior inference in time series anal-ysis. Therefore the thesis gives an introduction to Bayesian analysis and itsapplication in time series.

The structure of simulated data allow us to use simple statistical theoryto evaluate the posterior inference, summarized in Table 1, and the use ofout-of-sample forecasts allow us to evaluate which model produces the mostaccurate predictive distribution. The diagonal in Table 1 represents the sit-uation when the simulated data is estimated with the correct model. Thelower right outcome occurs when the data is simulated from an AR model butis estimated with a VAR model. In this case the VAR model will not sufferfrom misspecification but include irrelevant independent variables. This will

2

cause an (small) increase in the estimated variance, which in turn will leadto a wider predictive distribution. The upper right outcome occurs when thedata is simulated from a VAR model but is estimated with an AR model.In this case the AR model will be misspecified, i.e. the model suffers fromomitted variable bias. This will cause an (large) increase in the estimatedvariance, which in turn will lead to a wider predictive distribution. The for-mulated measurement accounts for both the size of the forecast error and theprobability that this forecast error would occur.

Table 1: Underlying Statistical Theory.

Simulation of DataUnivariate Multivariate

ModelAR(p) Optimal MisspecificationVAR(p) Irrelevant Independent Variables Optimal

We formulate a measurement that use out-of-sample forecasts and predictivedistributions to evaluate the full forecast error probability distribution byforecast horizon. We are able to validate the accuracy of the measurementagainst statistical theory, but we find that the autoregressive model and vec-tor autoregressive model with the same lag length, have difficulty producingdissimilar predictive distributions. However, we are able to separate pre-dictive distributions from the models by letting both be correctly specifiedbut allowing for high degree of correlation between the error terms acrossequations in the vector autoregressive model. From this we find that theformulated measurement is able to measure the accuracy of the full forecasterror distribution. The measurement can be used as a forecasting evaluatingtechnique for single forecasts or to calibrate forecast models. The techniqueused in this thesis can be used for these purposes.

The rest of the thesis is structured as follow. Section 2 presents the empiricalresearch closest to the research question. Section 3 describes the empiricalmethodology. Section 4 describes the evaluation method of the predictivedistribution. Section 5 describes the simulation method. Section 6 accountsfor the selection of the hyperparameters in the priors. Section 7 discussesthe results and model comparisons.

3

2. Related Literature

We will present two articles that evaluate predictive distributions. The meth-ods used are technical and we will not go into debt to describe them. Insteadwe will mention the methods and give a brief summary of the results. Forinterested readers please review the articles under consideration.

Geweke and Amisano (2012) compare the forecast performance and con-struct model combination for three models; the dynamic factor model, thedynamic stochastic general equilibrium model and the vector autoregressivemodel. They use several analytical techniques to evaluate the forecast per-formance and to construct model combination; pooling of predictive densi-ties, analysis of predictive variances, probability integral transform tests andBayesian model averaging. They find two improvements that increase fore-cast accuracy substantially. The first improvement is to use the full Bayesianpredictive distribution instead of the posterior mode for the parameters. Thesecond improvement is to construct model combination by the use of equallyweighted pooling of the predictive densities from the three models, insteadof relying on the individual predictive distribution from each model. Thisresult is considerable better than when Bayesian model averaging is used forthe same purpose.

Bauwens et al. (2014) compare the forecast performance of two models thatallow for structural breaks against a wide range of alternative models whichdo not allow for structural breaks. They evaluate forecast performance bytwo metrics. First, they use root mean squared forecast errors (RMSE) toevaluate point forecasts. The median of the predictive distribution is usedas point forecast. Second, they use the average of log predictive likelihoods(APL), which is the predictive density evaluated at the observed outcome.The APL is estimated by a nonparametric kernel smoother, using draws fromthe predictive simulator. They find that no single model is consistently bet-ter against the alternatives in the presence of structural breaks. One sourcefor this uncertainty about the forecast performance is that the two metricsyield substantially different conclusions. They find that the structural breakmodels seem to dominate the non-structural break models in terms of RMSE,but the opposite is often true in terms of APL.

4

3.1 Bayesian Inference

To describe Bayesian inference the simple linear regression model will beexamined, consider the model

yt = X tβ + εt (1)

where yt is a T ×1 vector representing the dependent variable, X t is a T ×Kmatrix and εt

iid∼ N(0, σ2), T is the number of observations and K is thenumber of independent variables. Our purpose is to obtain estimates of theK × 1 vector β and the scalar σ2. These can be obtained by maximizing thelikelihood function

l(yt | β, σ2) = (2πσ2)−T2 exp

[−(yt − βX t)

′(yt − βX t)

2σ2

](2)

which yields maximum likelihood estimates3 βMLE = (X′

tX t)−1(X tyt) and

σ2MLE = (yt−βMLEXt)′(yt−βMLEXt)

T. According to the likelihood principle the

likelihood function contains all the information in the data about the param-eters β and σ2. This is where the difference between classical (or frequentist)inference and Bayesian inference becomes apparent. Bayesian inference in-corporates prior beliefs about the parameters in the estimation process inform of probability distributions. This results in the joint posterior distribu-tion

p(β, σ2 | yt) =l(yt | β, σ2)p(β, σ2)

p(yt)∝ l(yt | β, σ2)p(β, σ2) (3)

where p(β, σ2) is the prior distribution and p(yt) is the density of the data,or marginal likelihood. The marginal likelihood, p(yt), does not depend on βor σ2 and can thus be considered a constant. This yields the unnormalizedjoint posterior distribution which is proportional to the likelihood functiontimes the prior distribution. However, Karlsson (2013) stress the importanceof the marginal likelihood for model comparison. Several conclusions canbe drawn form the joint posterior distribution. First, the joint posteriordistribution represents the probability distribution of the parameters β andσ2 when the prior distribution has been updated with the information inthe observed data, yt. Second, if the prior distribution is vague (or flat)then it can be considered as almost constant, this causes the estimates to

3Note that the βMLE is equal to the ordinary least square estimator, βOLS , while σ2MLE

is a biased estimate of the variance due to the fact that it does not deduct the numberof estimated parameters from the number of observations from the denominator, which isthe case in σ2

OLS .

5

be similar to those of classical inference, i.e. the likelihood function willdetermine the estimates. This also occur when the information in the datais rich, i.e a large number of observations. There is a large literature onBayesian inference in macroeconomics, where the data tend to have a smallnumber of observations and the model requires a large number of parametersto be estimated, for example the vector autoregressive model. Given thejoint posterior distribution, the marginal posterior distributions conditionalon the data, p(β | yt) and p(σ2 | yt), can be obtained by integrating β andσ2 from the joint posterior distribution, one at the time:

p(β | yt) =

∫p(β, σ2 | yt)dσ2 (4)

p(σ2 | yt) =

∫p(β, σ2 | yt)dβ. (5)

For the simple regression model specified there exists analytical (or closedform) results for the integrals. But for more complex models or particularprior distributions there may not exist analytical results for the integrals.Then numerical or simulation techniques are required to obtain estimates onβ and σ2, such as Markow Chain Monte Carlo (MCMC) simulation. We willimpose the restriction of stability to our AR and VAR processes, i.e. we arerestricted to evaluate a certain range of the distributions of β. This will beimplemented by the Gibbs sampler (see Appendix A1), which will allow usto sample from the range of this distribution. Even if there exists analyticalresults and no restrictions are imposed, there are still situations where sim-ulation is suitable, this is the case in forecasting. Forecasts with a horizonbeyond the one-step ahead horizon are nonlinear and can only be obtainedby simulation.

The essential key in Bayesian inference is the prior belief of the researcher,i.e. the prior distribution, p(β, σ2), in (3). The prior distribution allowsthe researcher to address the uncertainty about the parameters before thedata has been taken into account, this is done by specifying a probabilitydistribution for each parameter. The prior distribution is classified into twocategories, noninformative and informative. The noninformative prior dis-tributions are implemented when the researcher do not have prior beliefsabout the parameters, when the prior beliefs exists at a third party, i.e. notknown, and for scientific reports where differences in prior beliefs could haveimpact on the result. The noninformative prior distribution put a uniformdistribution on the parameters which forces the estimates to be determinedby the data, but still reap benefits of Bayesian analysis. The informativeprior distributions are used when the researcher have prior beliefs about the

6

parameters and incorporate these beliefs into the prior distribution. This isaccomplished by assigning hyperparameters4 or by restricting the parameterrange. It is, typically, difficult to assess the prior belief in practice, thereforeit is essential that the joint posterior distribution is proper and to assessthe posterior inference with sensitivity analysis. According to Sun and Ni(2004), there exists situations in which the posterior is improper even thoughthe full conditional distributions used for MCMC are all proper.

One reason for the use Bayesian inference is that it produce predictive distri-butions. This enables assessment of the probability to an outcome which ismore coherent to policy decisions than evaluating a certain point forecast of amodel5. The essence of Bayesian inference is that the predictive distributionaccounts for the uncertainty about the future and that then joint posteriordistribution accounts for the uncertainty about the parameters:

p(yt+1:t+h) =

∫ ∫f(yt+1:t+h | yt, β, σ2)p(β, σ2 | yt)dβdσ2. (6)

This results in a predictive distribution of forecasts p(yt+h) at each forecasthorizon h. The predictive distribution enables the researcher to address theprobability that a certain outcome will occur. This is useful in many ways,the predictive distribution can be described by measures of central tendencyand to assess distressed scenarios by evaluating quantiles of the predictivedistribution.

3.2 Autoregression

We will use Bayesian methods to estimate the parameters in an autoregressiveprocess, AR(p), where p is the number of lags to use for the univariate timeseries yt

yt =

p∑i=1

βiyt−i + εt, t = 1, ..., T (7)

where εtiid∼ N(0, σ2). This model is the same as in (1) where X t is a T × p

matrix (without a constant) and consists of p lags of the time series yt. An im-

4The hyperparameters represents the parameters in the prior distribution and are calledhyperparameters to distinguish them from the parameters in the model.

5There are methods in the frequentist view to generate an approximative predictivedistribution, for example by bootstrapping. These distributions are, however, mostlytighter than the Bayesian predictive distribution due to the fact that they do not take intoaccount the uncertainty about the parameters. These methods will not be evaluated inthis thesis.

7

portant difference between the models in (1) and (7) is that in the AR modelthe dependent variable is not independently and identically distributed, ytdepends on past values of itself.

The Normal-Gamma prior distribution is the conjugate prior for a normaldistribution, the posterior distribution and the prior distribution are in thesame family of distributions. In the Normal-Gamma prior distribution, theparameter vector, β, is normally distributed and conditional on the variance,σ2, and the variance follows the (inverse) Gamma distribution. The priormean is

p(β|σ2

)∼ N

(β0, σ

2H)

(8)

where β0 is a p× 1 vector representing the researchers prior belief about theparameter values. The prior mean variance-covariance matrix, H , is equal tothe researchers the prior belief of the variance, σ2, times the p× p diagonalmatrix H representing the researchers uncertainty about the parameters.Larger values on the diagonal of σ2H results in larger variances around theprior means. The prior variance is

p(σ2)∼ iΓ

2,θ02

)(9)

where α represents the prior degrees of freedom and θ0 represents the priorscale parameter. Holding the prior degrees of freedom fixed and letting theprior scale increase results in a (inverse) Gamma distribution with an in-creasing mean, i.e. the prior belief about the value of σ2 increases. Holdingthe prior scale fixed and letting the prior degrees of freedom increase resultsin a (inverse) Gamma distribution that is more tightly centred around themean, i.e. the prior belief of σ2 becomes tighter. This is illustrated in Figure1. Specifying the prior belief depends on several factors. Practitioners setthe prior beliefs to their own views, while researchers tend to set these tothe OLS estimate to let the data influence the estimates more than the priorbeliefs. This is viewed as more coherent and accepted academic approach.But it also depends on the number of observations and parameters, if there isa large number of observations relative to the number of parameters, the in-fluence of the data will be stronger. If there is a small number of observationsrelative to the number of parameters, this is known as overparametrizationin the literature, then one must specify a strong prior.

8

Figure 1: The (left) figure illustrates the effect on the (inverse) Gamma distri-bution as the scale parameter θ0 takes the values {1, 2, 3, 4}, holding the degreesof freedom constant at α = 1. The (right) figure illustrates the effect on the (in-verse) Gamma distribution as the degrees of freedom α takes the values {1, 2, 3, 4},holding the scale parameter constant at τ0 = 1.

The conditional posterior distributions of β and σ2 are

p(β | σ2, yt) = N(M,V ) (10)

p(σ2 | β, yt) = iΓ

(τ12,θ12

)(11)

where

M =

(H−1 +

1

σ2X′

tX t

)−1(H−1β0 +

1

σ2X′

tyt

)(12)

V =

(H−1 +

1

σ2X′

tX t

)−1(13)

τ1 = α + T (14)

θ1 = θ0 + (yt −X tβ)′(yt −X tβ) (15)

and β0, Σ, α and θ0 are hyperparameters specified by the researcher. Notethat there exists analytical results for the Normal-Gamma prior distribution,but we will use the Gibbs sampler to obtain parameter estimates and pre-dictive distributions. Table 2 describes the implemented restrictive Gibbssampler.

9

Table 2: Restrictive Gibbs Sampler for a AR(p) model.To illustrate the Gibbs sampler we will examine the first sample, m = 1.We will start to sample β(1) from p(β | (σ2)(0), yt), in (10). It is importantto note that (12) and (13) depend on σ2 and the Gibbs sampler needan initial value for this parameter, denoted (σ2)(0), which is specifiedby the researcher. We will set (σ2)(0) to the OLS estimate σ2. As wehave obtained M , in (12), and V , in (13), we can sample β(1) fromp(β | (σ2)(0), yt) by

β(1) = M + [r(V )12 ]′

where r is a 1× p vector of draws from the standard normal distributionand (V )

12 is the Cholesky decomposition of V . We impose the restriction

that β(1) must come from a stable AR process, i.e. all the roots, z, ofthe polynomial βp(z) = 1 − β1z − β2z2 − ... − βpzp must have a module

greater than one, |z| > 1. As β(1) is obtained we can sample (σ2)(1) fromp(σ2 | β(1), yt), from the inverse Gamma distribution (11). Note that theprior degrees of freedom τ1, in (14), and the prior scale parameter θ1,in (15), requires the researcher to specify α and θ0. A sample from theinverse Gamma distribution is structured as

(σ2)(1) =θ1x′0x0

where x0 is a 1× τ1 vector of draws form the standard normal distribu-tion. A sample from the predictive distributions for forecast horizon h isstructured as

y(1)t+h =

h−1∑i=1

β(1)i y

(1)t+h−i +

p∑i=1

β(1)i y

(1)t+h−i + ε

(1)t+h

where ε(1)t+h = r

√(σ2)(1) and r is a single draw from the standard normal

distribution. This process is repeatedM iterations until we have obtainedβ(1), ..., β(M), (σ2)(1), ..., (σ2)(M) and y

(1)t+h, ..., y

(M)t+h . The first 1, ..., B it-

erations are discarded, thus β(B+1), ..., β(M), (σ2)(B+1), ..., (σ2)(M) and

y(B+1)t+h , ..., y

(M)t+h are used for empirical distributions. The iterations 1, ..., B

are known as burn-in iterations and are required for the Gibbs samplerto converge. There is, however, no guarantee that the Gibbs sampler willperform well or that it will converge. In this thesis we will set M = 4.000and B = 1.000 to ensure convergence.

10

3.3 Vector Autoregression

We will use Bayesian methods to estimate a vector autoregressive process,VAR(p), where p is the number of lags to use for each time series. The VARmodel is a system of equations model that allows the endogenous variablesto simultaneously affect each other. Furthermore, the error terms can becorrelated across equations. A structural shock in one of the error termsmay cause a shock in all error terms, causing contemporaneous movement inall endogenous variables. The VAR(p) with n endogenous variables withoutconstants, deterministic and exogenous variables, is defined as

yt =

p∑i=1

Biyt−i + εt, t = 1, ..., T (16)

where yt is a n × 1 vector, Bi is a n × n matrix, εt is a n × 1 vector and

εTiid∼ N(0,Σ). The endogenous variables in the VAR model are not iid, they

depend on past values of yt. We can express (16) in compact matrix form ifwe rewrite xt = (y′t−1, ...,y

′t−p)

Y t = X tB + Et (17)

where Y t and Et are T × n matrices, X t = (x1, ...,xT )′ is a T × np matrixand B = (B′1, ...,B

′p)′ is a np × n matrix. Note that the parameter matrix

can be stacked into a n(np)× 1 vector by b = vec(B).

The Normal-Wishart prior distribution is the conjugate prior for a multivari-ate normal distribution, the posterior distribution and the prior distributionare in the same family of distributions. In the Normal-Wishart prior dis-tribution, the parameter vector, b, is normally distributed and conditionalon the variance-covariance matrix, Σ, and the variance-covariance matrixfollows the (inverse) Wishart distribution

p(b | Σ) ∼ N(b0,Σ⊗ H) (18)

p(Σ) ∼ iW (S, α) (19)

where ⊗ is the Kronecker product and b0 represents the researchers priorbelief of the parameter values. We follow Kadiyala and Karlsson (1993, 1997)in specifying the matrix H , the prior scale matrix S and the prior degreesof freedom α . The np× np diagonal matrix H has the diagonal equal to(

λ0λ1lλ2si

)2

11

where l refers to the lag length and s2i refers to the OLS estimate of thevariance from an AR(p) model and i refers to the endogenous variable in theith equation. The prior scale is a n × n diagonal matrix with the diagonalequal to (α− n− 1)λ−10 s2i and the prior degrees of freedom satisfy

α = max{n+ 2, n+ 2h− T} (20)

to ensure existence of the prior variances of the regression parameters andthe posterior variances of the predictive distribution at forecast horizon h.Following the guidelines of Kadiyala and Karlsson (1993, 1997) we only needto specify the hyperparameters b0, λ0, λ1 and λ2. The interpretation of thehyperparameters of λ are as follows:

λ0 controls the overall tightness of the prior on the covariance matrix.

λ1 controls the tightness of the prior on the coefficients on the first lag.

λ2 controls the degree to which coefficients are shrunk towards zero more tightly.

The prior mean variance-covariance matrix, H , is obtained by V (b) = (α −m − 1)−1S ⊗ H , due to the imposed Kronecker structure we are not ableto specify prior variances and standard deviations. Instead we are forced totreat all equations symmetrically.

The conditional posterior distributions of β and Σ are

p(b | Σ,Y t) ∼ N(M,V ) (21)

p(Σ | b,Y t) ∼ iW (Σ, T + α) (22)

where

M =(H−1 + Σ−1 ⊗Xt

′Xt

)−1 (H−1b0 + Σ−1 ⊗Xt

′Xtb)

(23)

V =(H−1 + Σ−1 ⊗Xt

′Xt

)−1(24)

Σ = S + (Y t −X tB)′(Y t −X tB). (25)

and b is the OLS estimate of b. Kadiyala and Karlsson (1997) and Karlsson(2013) provide the analytical results for this prior. But we will use the Gibbssampler to obtain parameter estimates and predictive distributions. The re-strictive Gibbs sampler implemented in Table 2 is essentially the same for theNormal-Wishart prior, b is sampled from (21) and Σ is sampled from (22).

12

There is, however, one step in the Gibbs sampler for the VAR model that willaffect the predictive distributions. This is due to fact that the predictive dis-tribution accounts for the uncertainty about the future, combined with thatthe VAR model allows the error terms to be correlated across the equations.The Gibbs sampler will draw a sample m from the predictive distribution atforecast horizon h

y(m)t+h =

h−1∑i=1

B(m)

i y(m)t+h−i +

p∑i=1

B(m)

i y(m)t+h−i + ε

(m)t+h

where ε(m)t+h = r[Σ

(m)]12 and r is n draws from the standard normal distribu-

tion. The term [Σ(m)

]12 is the square root of the estimated variance-covariance

matrix by the Cholesky decomposition, which results in that the lower tri-

angle of Σ(m)

consists of zeros. Therefore, the order of the variables in theVAR model is important. For example, for the bivariate VAR model, thefirst equation will have two elements of uncertainty added to the forecast foreach draw. While in the second equation will only have one element of un-certainty added to the forecast at each draw. Therefore, there will be moreuncertainty added to the predictive distribution of y1 than y2.

4. Evaluating the Predictive Distribution

To explain the measurement, consider a predictive distribution at a specificforecast horizon, h. The problem is to structure a measurement that accountsfor two factors, (1) the accuracy of the forecasts6 and (2) the probabilitiesthat the forecasts occurs.

The most common method to examine forecast accuracy is to use out-of-sample forecasts. This allow the researcher to mimic a real time situation.The observed data is divided into two sets, data for parameter estimation andactual values for forecast evaluation. The forecast error for each observationin the predictive distribution at h is

y(m)t+h − y

at+h

where y(m)t+h is the mth Gibbs sampling in predictive distribution at horizon h,

and yat+h is the actual value corresponding to the forecast.

6Note: When we use the terminology forecast, we refer to one element within thepredictive distribution if nothing else is specified.

13

The most common method to visualize a distribution is the histogram andwill serve as a tool to describe the concept of the measurement. The data isclassified into bins represented by rectangles where the height of the rectan-gle represents the number of data points within the interval of the bin. Thehistogram can be normalized to represent the probability for each bin withthe condition that the sum of probabilities of each bin is equal to one. In Fig-ure 2 we can see the forecast error probability distribution of two competingmodels, it is clear that the left graph is more accurate than the right graph.We can conclude that these graphs serve our purpose, i.e. we can determinewhich graph is most accurate by examining the probabilities of the forecasterror.

Figure 2: The (left) graph is the forecast errors probability distribution of theAR(2) model at h = 2 for univariate simulation of y2. The (right) graph isthe forecast errors probability distribution of the VAR(2) model at h = 2 forunivariate simulation of y2. The red line indicates zero forecast error.

Examining each forecast error distribution is time-consuming if, for exam-ple, a practitioner is calibrating a forecast model. Therefore we would liketo summarize the forecast error distribution by the expected value. Thiswill not, however, yield information about the accuracy of the forecast errordistribution that we intend to measure, instead it will contain informationabout the bias of the forecast error distribution. To achieve the same conclu-sion to be drawn as for the graphs in Figure 3 we will examine the squared

14

forecast errors7. The expected value of the squared forecast error is

et+h =M−B∑m=1

(M −B)−1(y(m)t+h − y

at+h)

2 =M−B∑m=1

pm(y(m)t+h − y

at+h)

2 (26)

where et+h is the expected value of the squared forecast error and M −B isthe number of stored samples in the Gibbs sampler. The sum of probabilitiesare equal to one,

∑M−Bm=1 pm = 1. A tight forecast error distribution centred

around zero will produce a small value of the expected value of squared fore-cast error. While a wide forecast error distribution not centred at zero orskewed away from zero will produce a larger expected value of the squaredforecast error. The expected value of the squared forecast error is itself notinformative, it is only informative when put in relative terms to a competingmodel. We will use the notation e

AR(p)t+h to represent the expected value of the

squared forecast error at forecast horizon h for the AR(p) model and eVAR(p)t+h

for the corresponding representation for the VAR(p) model.

5. Simulation Studies

We generate the data for the variables y1 and y2 by (pseudo)-simulation fromunivariate AR models and a multivariate VAR model. This will allow us toobtain results corresponding to Table 1 (the simulation of data representsthe columns of Table 1). Both the univariate and multivariate simulationwill create data for the variables y1 and y2 with T S = 200 observations8.

Univariate Simulation: The two time series y1 and y2 are simulated fromtwo AR(2) models. Both AR models are conditional on being stable, this isfulfilled when the modules of the eigenvalues of the companion matrix(

β1 β21 0

)are less than one. We impose a stricter condition, the modules of the eigenval-ues must be less than 0.850. This is motivated by the implemented restrictive

7There are also other factors to use the squared forecast errors, it penalizes outliers toa high degree and it has nice properties to the normal distribution. Note, however, thatour forecast error distributions are t-distributed because they are estimated form a model.

8Note that y1 and y2 depend on past observations. We have specified start valuesfor each simulation processes. To mitigate the effect of start values we generated 250observations of y1 and y2 and discarded the first 50 observations, resulting in TS = 200.

15

Gibbs sampler which will require large increases in computational time as themodules of the eigenvalues approaches one. The series will be simulated by:

y1,t = 0.70y1,t−1 + 0.10y1,t−2 + ε1,t, ε1,t ∼ N(0, 2) (27)

y2,t = 0.35y2,t−1 + 0.30y2,t−2 + ε2,t, ε2,t ∼ N(0, 3). (28)

The parameters in (27) are chosen so that one of the eigenvalues have amodules close to the chosen criteria and to have β1 large and β2 small. Theparameters in (28) are chosen to have β1 and β2 closer to each other thanin (27), they should be positive and not close to zero. The pair of β1 and β2in (27) and (28) have been chosen to not be identical and the variances of y1and y2 are different. The correlations between y1 and y2 should on averagebe zero. But due to the randomness of simulation the correlations betweeny1 and y2 will not be constant, which in turn will affect the estimation. Theleft graph in Figure 3 shows 100 correlations between y1 and y2, the averageis -0.008 and the median is -0.008.

Multivariate Simulation: The two time series y1 and y2 are simulatedfrom a bivariate VAR(2) model. The VAR model is conditional on beingstable, this is fulfilled when the modules of the eigenvalues of the companionmatrix (

B1 B2

I2 0

)are less than one. We again impose the stricter condition that the modulesof the eigenvalues must be less than 0.850. The series will be simulated by{

y1,t = 0.70y1,t−1 + 0.10y1,t−2 + ε1,t

y2,t = −0.25y1,t−1 + 0.35y2,t−1 + 0.50y1,t−2 + 0.30y2,t−2 + ε2,t(29)

where the variance-covariance matrix is

Σ =

(2 0.75

√6

0.75√

6 3

)(30)

where the error terms between the equations have a correlation of 0.750 andthe variances of y1 and y2 are different.

Three factors have determined the choice of the parameter matrix B. First,the elements in B are chosen so that the correlation that arise from theparameters are balanced, i.e. the correlation between y1 and y2 is approx-imately 0.750. Second, the first equation, y1, in (29) should not dependon the parameters from y2. The only relationship between the variables, in

16

the first equation in (29), is the correlation between error terms across thetwo equations. Both models are correctly specified, but the AR model dis-card the correlated error terms across equations. While the VAR model willinclude irrelevant independent variables and will add uncertainty to the pre-dictive distribution due to the correlated error terms across equations. Boththese effects will make the predictive distribution of the VAR model wider.Third, the second equation, y2, in (29) should depend on the parametersof y1. Estimating y2 with a AR model will result in misspecification. Dueto randomness of simulation the correlations between y1 and y2 will not beconstant, which in turn will affect the estimation. The right graph in Figure3 shows 100 correlations between y1 and y2, the average is 0.745 and themedian is 0.741.

Figure 3: The (left) graph is the histogram of correlations between y1 and y2generated by 100 univariate simulations. The (right) graph is the histogramof correlations between y1 and y2 generated by 100 multivariate simulations.

6. Hyperparameters

As mentioned in Section 3.3, Kadiyala and Karlsson (1993, 1997) suggestsa set of guidelines to standardize the restrictions on the parameters in theNormal-Wishart prior distribution. This allows the researcher to only spec-ify a number of hyperparameters. Informally, we can think of the inverseWishart distribution as a multivariate version of the inverse Gamma distri-bution. This allows us to align the Normal-Gamma prior in the AR andNormal-Wishart prior in the VAR, by using the guidelines of Kadiyala and

17

Karlsson (1993, 1997).

We align the Normal-Gamma prior to the Normal-Wishart prior in threesteps. First, we specify the hyperparameters for the prior mean variance,in (8), in the AR(2) model

H = σ2H =

(λ21 0

0(λ12λ3

)2 )we specify σ2 = 1 and the diagonal of H to have the same variances forthe lags as matrix V (b) = (α − n − 1)−1S ⊗ H has for own lags in theNormal-Wishart prior. Second, the prior degrees of freedom α, in (14), willbe determined by (20), which results in α = 3. Third, the prior scale param-eter θ0, in (15), will be determined by θ0 = (α − n − 1)λ−10 σ2, with α = 3this simplifies to θ0 = λ−10 σ2, where σ2 is the OLS estimate.

Now we turn to examine the guidelines of Kadiyala and Karlsson (1993,1997) for the Normal-Wishart prior. The prior mean variance V (b) = (α −n−1)−1S⊗H and the prior scale matrix S, with the diagonal (α−n−1)λ−10 s2i ,both depend on the prior degrees of freedom, α. By (20) we determine thatα = 4, which results in that the scale matrix, S, is(

λ−10 s21 00 λ−10 s22

)and the prior mean variance V (b) = S ⊗ H is

λ21 0 0 0 0 0 0 0

0(s1λ1s2

)20 0 0 0 0 0

0 0(λ12λ2

)20 0 0 0 0

0 0 0(s1λ1s22λ2

)20 0 0 0

0 0 0 0(s2λ1s1

)20 0 0

0 0 0 0 0 λ21 0 0

0 0 0 0 0 0(s2λ1s12λ2

)20

0 0 0 0 0 0 0(λ12λ2

)2

notice that the first and third diagonal element in this matrix is equal to thediagonal elements of the prior mean variance in the Normal-Gamma prior.

18

This alignment between the Normal-Gamma prior and the Normal-Wishartprior allow us to control the parameter restrictions of both priors by onlyspecifying the prior mean, β and b0, in (8) and (18) for each prior and thehyperparameters λ0, λ1, λ2 for both priors. For the Normal-Gamma priorwe will set the prior mean to β = (0, 0)′ and for the Normal-Wishart priorwe will set the prior mean to b0 = (0, 0, 0, 0, 0, 0, 0, 0)′. The hyperparameterswill be set to λ0 = λ1 = λ2 = 1.

7.1 Results

To validate the measurement we want to obtain results corresponding to thosein Table 1. We will attempt to verify these by examining posterior variancesand the difference in the measurement between two competing models byforecast horizon.

But first we will describe the kind of data we analyse in this section. InSection 5, we generated the data for the univariate and multivariate models.From this we construct out-of-sample forecast, we will produce forecasts forthe horizon h = 1, 2, ..., 10 leaving T p = T s − h = 190 observations for pa-rameter estimation. Predictive distributions are estimated and the expectedvalue of the squared forecast error, i.e. the measurement, is calculated forten forecast horizons. This step of simulating data and calculating the mea-surement is repeated one hundred times. This results in that each model willproduce data of the measurement in form of two 100 × 10 matrices wherethe first matrix is for the univariate simulated data and the second is for themultivariate simulated data.

First, we will examine the posterior variance. The results are presentedin Table 3 and 5, the coefficient is the expected value of the mean posteriorvariance, resulting from one hundred simulations of the data. This is moti-vated by two reasons (1) to summarize the data, each simulation of the datawill yield a mean posterior variance and the corresponding ninety-five per-cent probability interval of the posterior variance (2) deviations from Table1 should be caused by the random correlation in the data due to simulation,therefore the expected value of the mean posterior variance yield a more ro-bust conclusion.

Second, we will examine the difference in the measurement between twocompeting models at each forecast horizon. The mean of this difference willdetermine which predictive distribution is most accurate. We will represent

19

this by the linear regression described in (1), where the dependent variable isthe difference in the measurement between the competing models by forecasthorizon and the independent variable of a constant:

(eAR(2)t+h )i − (e

VAR(2)t+h )i = ϕ+ εi, i = 1, ..., 100 (31)

The coefficient of the constant, ϕ, represent the mean of the difference inthe measurement between the competing models by forecast horizon, thisenables the following hypothesis test:

H0 : ϕ = 0

HA : ϕ 6= 0

If ϕ < 0 then the predictive distribution of the AR(2) is more accurate thanthe VAR(2). If ϕ > 0 then the predictive distribution of the VAR(2) is moreaccurate than the AR(2). If H0 is not rejected, then we cannot conclude thatone predictive distribution is more accurate than the other. The estimationof this model is similar to the AR(p) model described in section 3.1 and theGibbs sampler described in Table 2. We will set the hyperparameters asfollows β0 = 0, σ2 = 1, θ0 = σ2 and α = 3 according to (20). The results arepresented in Table 4 and 6.

7.2 Univariate Simulation Results

We start by examining the results of the estimated posterior variance andcovariance from the AR(2) and VAR(2) models for the univariate simulationsof the data. The first column in Table 1 shows the expected outcomes. TheAR model is correctly specified, while the VAR model includes irrelevantindependent variables which is expected to increase the estimated posteriorvariance.

The results are presented in Table 3. For the variable y1, we conclude thatthe expected value of the mean posterior variance is equal to 2.006 for theAR and 2.007 for the VAR. Therefore we conclude that inclusion of irrele-vant independent variables do increase the estimated posterior variance asexpected. However, this increase is smaller than we expected. For the vari-able y2, we conclude that the expected value of the mean posterior varianceis 2.962 for the AR model and 2.965 for the VAR model. The increase inestimated posterior variance is somewhat larger than for y1 but still smallerthan expected. The expected value of the mean posterior covariance in theVAR model is 0.006. Which is close to zero as we expect due to that y1 and

20

y2 are simulated independent of each other. From the (left) graph in Figure3, we conclude that the correlation between y1 and y2 is on average -0.008,but range approximately from -0.300 to 0.350.

Overall, we conclude that the estimated posterior variances follow the ex-pected theory in Table 1, but the increase in estimated posterior variancedue to inclusion of irrelevant variables in the VAR model is smaller than weexpected.

Table 3: Expected Values of Mean Posterior Variance/Covariance.

Univariate Simulation of the DataAR(2) VAR(2)

E[Coef.] E[95% PI] E[Coef.] E[95% PI]

V(y1) 2.006 [ 1.638 ; 2.455] 2.007 [ 1.639 ; 2.456]COV(y1,y2) - - 0.006 [-0.348 ; 0.361]V(y2) 2.962 [ 2.419 ; 3.620] 2.965 [ 2.421 ; 3.633]

E[.] is the expected value of 100 simulations of the data.

We now turn to examine the results for the linear regression (31) of themeasurement we have formulated. From our findings about the estimatedposterior variance we expect the forecast error probability distributions ofthe AR model to be tighter and more centred around zero than for the VARmodel, i.e the measurement would be smaller for the AR model than forthe VAR model. The expected value of the estimated posterior covariancebetween y1 and y2 was close to zero and, therefore, a minimal increase ofuncertainty in the predictive distribution of y1 caused by the Cholesky de-composition. Therefore, we expect to reject H0 and find that the parameterϕ is negative.

The results are presented in Table 4. For the variable y1, we conclude thatwe reject H0 at one of the ten forecast horizons. We find that ϕ is negativeat the second forecast horizon as expected, i.e the ninety-five percent prob-ability interval do not range over zero. On average at this forecast horizon,the predictive distribution of the AR model is more accurate than that ofthe VAR model. For all other forecast horizons H0 cannot be rejected. Forthe variable y2, we conclude that we reject H0 at one of the ten forecasthorizons. We find that ϕ is negative at the first forecast horizon as expected.On average at this forecast horizon, the predictive distribution of the VARmodel is more accurate than that of the AR model. For all other forecasthorizons H0 cannot be rejected.

21

Overall, we cannot conclude that the AR model produce more accurate pre-dictive distributions than the VAR model for the univariate simulated data.It seems that the small increase in estimated posterior variance caused by theinclusion of irrelevant variables is not large enough to distinguish betweenthe two models predictive distributions.

Table 4: Regression Results.

Univariate Simulation of the Datay1 y2

ϕ 95% PI ϕ 95% PI

h=1 -0.111 [-0.377 ; 0.155] -0.906 [-1.768 ; -0.063]h=2 -0.527 [-0.976 ; -0.060] 0.080 [-0.536 ; 0.698]h=3 -0.235 [-0.615 ; 0.132] -0.559 [-1.330 ; 0.233]h=4 -0.197 [-0.568 ; 0.166] 0.261 [-0.263 ; 0.789]h=5 -0.085 [-0.535 ; 0.358] -0.448 [-1.268 ; 0.358]h=6 -0.016 [-0.384 ; 0.347] -0.157 [-0.597 ; 0.292]h=7 0.161 [-0.245 ; 0.562] 0.387 [-0.060 ; 0.833]h=8 0.042 [-0.359 ; 0.449] 0.186 [-0.128 ; 0.501]h=9 0.004 [-0.434 ; 0.446] 0.005 [-0.264 ; 0.273]h=10 0.060 [-0.381 ; 0.488] 0.135 [-0.148 ; 0.413]

PI stands for probability interval.

7.3 Multivariate Simulation Results

We start by examining the results of the estimated posterior variance and co-variance from the AR(2) and VAR(2) models for the multivariate simulationsof the data. The second column in Table 1 shows the expected outcomes.The VAR model is correctly specified, while the AR model is misspecifiedwhich is expected to increase the estimated posterior variance.

The results are presented in Table 5. For the variable y1, which is not de-termined by y2, we conclude that the expected value of the mean posteriorvariance is equal to 1.980 for the AR and 1.983 for the VAR. We concludethat the inclusion of irrelevant independent variables cause an increase in theestimated posterior variance, this is the same conclusion as in the univariatesimulation of data. For the variable y2, we conclude that the result follow thestatistical theory well. The expected value of the mean posterior varianceis 3.291 for the AR model and 3.025 for the VAR model. The increase inestimated posterior variance due to misspecification is large. The expectedvalue of the mean posterior covariance in the VAR model is 1.821. Which isclose to the covariance in equation (30), 0.75 ×

√6 ≈ 1.837. The choice of

elements in parameter matrix B have balanced the correlation between y1

22

and y2 to the same correlation specified for the error terms. From the (right)graph in Figure 3, we conclude that the correlation between y1 and y2 is onaverage 0.745, but range approximately from 0.550 to 0.850.

Overall, we conclude that the estimated posterior variances follow the ex-pected theory in Table 1. The increase in the estimated posterior variancedue to misspecification is large as expected. We find the same conclusion ofinclusion of irrelevant variables as in the univariate simulation of the data.

Table 5: Expected Values of Mean Posterior Variance/Covariance.

Multivariate Simulation of the DataAR(2) VAR(2)

E[Coef.] E[95% PI] E[Coef.] E[95% PI]

V(y1) 1.980 [ 1.617 ; 2.420] 1.983 [ 1.619 ; 2.429]COV(y1,y2) - - 1.821 [ 1.423 ; 2.303]V(y2) 3.291 [ 2.689 ; 4.027] 3.025 [ 2.469 ; 3.703]

E[.] is the expected value of 100 simulations of the data.

We now turn to examine the results for the linear regression (31) of the mea-surement. We expect ϕ to be negative for y1, the AR model do not sufferfrom misspecification and the expected value of the mean posterior covari-ance is large. Therefore we expect the predictive distributions of the VARmodel to be wide for y1 due to the Cholesky decomposition. From the findingabout the estimated posterior variance for y2, we expect to reject H0 andfind that the parameter ϕ is positive.

The results are presented in Table 6. For the variable y1, we conclude that wereject H0 at all forecast horizons except the first. We find that ϕ is negativefor the second to tenth forecast horizon as we expected due to the Choleskydecomposition. On average, the forecast accuracy of the AR relative to theVAR increases over forecast horizons. For the variable y2, we conclude thatwe reject H0 for three out of ten forecast horizons. We find that ϕ is neg-ative for the eighth to tenth forecast horizon, which is opposite to what weexpected. We also conclude that the magnitude of the differences are largefor the ninth and tenth forecast horizon.

23

Table 6: Regression Results.

Multivariate Simulation of the Datay1 y2

ϕ 95% PI ϕ 95% PI

h=1 -0.029 [-0.159 ; 0.109] 0.272 [-0.050 ; 0.604]h=2 -0.251 [-0.467 ; -0.038] 0.050 [-0.257 ; 0.349]h=3 -0.321 [-0.553 ; -0.087] 0.536 [ 0.145 ; 0.938]h=4 -0.289 [-0.563 ; -0.030] 0.172 [-0.192 ; 0.539]h=5 -0.219 [-0.470 ; 0.039] 0.334 [-0.104 ; 0.792]h=6 -0.338 [-0.597 ; -0.068] -0.257 [-0.640 ; 0.125]h=7 -0.308 [-0.614 ; -0.005] -0.211 [-0.614 ; 0.203]h=8 -0.333 [-0.598 ; -0.062] -0.492 [-0.910 ; -0.066]h=9 -0.483 [-0.755 ; -0.209] -0.701 [-1.107 ; -0.293]h=10 -0.466 [-0.718 ; -0.210] -0.905 [-1.289 ; -0.523]

PI stands for probability interval.

Table 7 shows the same analysis as before but this time we have changed theorder of the variables when estimating the VAR model. For the variable y1we conclude that we reject H0 at the second, fourth, fifth and ninth forecasthorizon. We find that ϕ is negative for the second forecast horizon andpositive for fourth, fifth and ninth forecast horizon. For the variable y2 weconclude that we reject H0 for five out of the ten forecast horizons. We findthat ϕ is negative at the first, sixth, eight, ninth and tenth forecast horizonas expected due to Cholesky decomposition. Both these results show thestrong effect of Cholesky decomposition. The variable y2 handles the extrauncertainty added to the predictive distribution better than y1. This is mostlikely due to the misspecification of the AR model in the y2 variable.

Table 7: Changed Order Of Variables in VAR.

Multivariate Simulation of the Datay2 y1

ϕ 95% PI ϕ 95% PI

h=1 -2.054 [-3.041 ; -1.001] -0.940 [-1.956 ; 0.022]h=2 -0.165 [-0.946 ; 0.614] -1.506 [-2.507 ; -0.480]h=3 -0.415 [-0.836 ; 0.018] 0.132 [-0.282 ; 0.549]h=4 -0.310 [-0.698 ; 0.086] 0.295 [ 0.077 ; 0.514]h=5 -0.307 [-1.020 ; 0.409] 0.599 [ 0.088 ; 1.119]h=6 -0.680 [-1.267 ; -0.089] 0.150 [-0.387 ; 0.666]h=7 -0.433 [-0.924 ; 0.055] 0.388 [-0.008 ; 0.775]h=8 -0.569 [-0.966 ; -0.168] 0.158 [-0.150 ; 0.463]h=9 -0.829 [-1.290 ; -0.367] 0.388 [ 0.049 ; 0.726]h=10 -0.936 [-1.369 ; -0.512] 0.178 [-0.238 ; 0.579]

PI stands for probability interval.

24

Overall, we can conclude that the AR model produce more accurate predic-tive distributions than the VAR model for the multivariate simulated data.This was expected for y1 due to Cholesky decomposition and that neither theAR or VAR was misspecified. For y2, however, it seems that misspecificationis too small to have an effect on the predictive distribution. Instead the VARis out performed by the AR at longer horizons.

Summing up our results. From both the univariate and multivariate re-sults we conclude that the estimated posterior variances follow the statisticaltheory of Table 1. But the magnitude of these effects are smaller than ex-pected. From analyzing the predictive distribution by the measurement forthe univariate simulation of the data. We find difficulties to verify the in-crease in estimated posterior variance in the predictive distributions, due tothe statistical theory. Only at two horizons are we able to distinguish thatthe AR models outperform the VAR models. This difficulty arise becausethe increase in the estimated posterior variance is too small to separate thepredictive distributions. From the multivariate simulation of the data weconclude that there is a large effect of the Cholesky decomposition in thefirst equation of the estimation in the VAR model. We find that the VARmodel produces inferior predictive distributions of y1 for all horizons. Thisresult is not as strong for y2 where five of the horizons produce inferior pre-dictive distributions. We also conclude that this effect increases with forecasthorizon.

We conclude that the autoregressive and vector autoregressive models witha lag length of two, have difficulty to produce dissimilar predictive distribu-tions. This has made it difficult to assess the accuracy of the measurementbut by examining the effect of the Cholesky decomposition in VAR modelswe are able to validate the accuracy of the measurement.

25

8. Conclusion

We conduct simulation studies to formulate a measurement that evaluatesthe forecast accuracy of predictive distributions. We use Bayesian methods toestimate posterior inference and predictive distributions for the autoregres-sive model and the vector autoregressive model. By the use of out-of-sampleforecasts and predictive distributions we are able to evaluate the full distri-bution of forecast errors. We are also able to validate the accuracy of themeasurement, especially by allowing for correlated error terms across equa-tions in the vector autoregressive model.

We formulate a measurement that uses out-of-sample forecasts and predic-tive distributions to evaluate the full forecast error probability distribution byforecast horizon. The measurement can be used as a forecasting evaluatingtechnique for single forecasts or to calibrate forecast models. The techniqueused in this thesis can be used for these purposes. Furthermore, we recom-mend that the measurement should be used with several other forecastingevaluation techniques when used by practitioners.

For further research we recommend that the measurement should be evalu-ated by models that are not from the same family to ensure differences inpredictive distributions, treat conditional hetroskedasticity differently, casestudies of outliers such as financial crisis and evaluated against a wide rangeof forecasting evaluation techniques.

26

Appendix

A1. Gibbs Sampler

To explain the intuition behind the Gibbs sampler we will borrow a summaryput forward by Ciccarelli and Rebucci (2003), but with the mathematical no-tation convenient to Section 3.1.

In many applications the analytical integration of p(β, σ2|yt) may be diffi-cult or even impossible to implement. This problem, however, can oftenbe solved by using numerical integration based on Monte Carlo simulationmethods.

One particular method used in the literature to solve similar estimation prob-lems of those discussed in the paper is the Gibbs sampler. The Gibbs Sampleris a recursive Monte Carlo method which only requires knowledge of the fullconditional posterior distribution of the parameters of interest, p(β|σ2, yt)and p(σ2|β, yt) are known. Then the Gibbs sampler starts from an arbi-trary value for β(0) or (σ2)(0), and samples alternately from the density fromeach element of the parameter vector, conditional on the values of the otherelement sampled in the previous iteration and the data. Thus, the Gibbssampler samples recursively as follows:

β(1) from p(β|(σ2)(0), yt)(σ2)(1) from p(σ2|β(1), yt)β(2) from p(β|(σ2)(1), yt)(σ2)(2) from p(σ2|β(2), yt)

...β(m) from p(β|(σ2)(m−1), yt)(σ2)(m) from p(σ2|β(m), yt)and so on.

The vectors ϑ(m) = (β(m), (σ2)(m)) from a Markov Chain, and, for a suffi-ciently large number of iterations (say m ≥ M), can be regarded as drawsfrom the true joint posterior distribution. Given a large sample of drawsfrom this limiting distribution, any posterior moment or marginal densityof interest can then be easily estimated consistently with the correspondingsample average.

27

References

Luc Bauwens, Gary Koop, Dimitris Korobilis, and Jeroen VK Rombouts.The contribution of structural break models to forecasting macroeconomicseries. Journal of Applied Econometrics, 2014.

Andrew P Blake and Haroon Mumtaz. Applied Bayesian econometrics forcentral bankers. Number 4 in Technical Books. Centre for Central BankingStudies, Bank of England, 2012. URL http://ideas.repec.org/b/ccb/

tbooks/4.html.

Matteo Ciccarelli and Alessandro Rebucci. BVARs: A Survey of the Re-cent Literature with an Application to the European Monetary Sys-tem. Rivista di Politica Economica, 93(5):47–112, September 2003. URLhttp://ideas.repec.org/a/rpo/ripoec/v93y2003i5p47-112.html.

A. Gelman, J.B. Carlin, H.S. Stern, D.B. Dunson, A. Vehtari, and D.B.Rubin. Bayesian Data Analysis, Third Edition. Chapman & Hall/CRCTexts in Statistical Science. Taylor & Francis, 2013. ISBN 9781439840955.URL http://books.google.se/books?id=ZXL6AQAAQBAJ.

John Geweke and Gianni Amisano. Comparing and evaluating Bayesianpredictive distributions of assets returns. Working Paper Series 0969, Eu-ropean Central Bank, November 2008. URL http://ideas.repec.org/

p/ecb/ecbwps/20080969.html.

John Geweke and Gianni Amisano. Prediction using several macroeconomicmodels, 2012.

K Rao Kadiyala and Sune Karlsson. Forecasting with generalized bayesianvector auto regressions. Journal of Forecasting, 12(3-4):365–378, 1993.

K Rao Kadiyala and Sune Karlsson. Numerical Methods for Estimationand Inference in Bayesian VAR-Models. Journal of Applied Econometrics,12(2):99–132, March-Apr 1997. URL http://ideas.repec.org/a/jae/

japmet/v12y1997i2p99-132.html.

Sune Karlsson. Chapter 15 - forecasting with bayesian vector autoregres-sion. In Graham Elliott and Allan Timmermann, editors, Handbookof Economic Forecasting, volume 2, Part B of Handbook of EconomicForecasting, pages 791 – 897. Elsevier, 2013. doi: http://dx.doi.org/10.1016/B978-0-444-62731-5.00015-4. URL http://www.sciencedirect.

com/science/article/pii/B9780444627315000154.

28

Gary Koop and Dimitris Korobilis. Bayesian multivariate time series meth-ods for empirical macroeconomics. Now Publishers Inc, 2010.

Dongchu Sun and Shawn Ni. Bayesian analysis of vector-autoregressive mod-els with noninformative priors. Journal of statistical planning and infer-ence, 121(2):291–309, 2004.

Anders Warne, Gunter Coenen, and Kai Christoffel. Predictive likelihoodcomparisons with DSGE and DSGE-VAR models. Working Paper Series1536, European Central Bank, April 2013. URL http://ideas.repec.

org/p/ecb/ecbwps/20131536.html.

29


Recommended