MFx – Macroeconomic Forecasting · PDF fileMFx – Macroeconomic Forecasting ......

Post on 16-Mar-2018

253 views 14 download

transcript

MFx – Macroeconomic Forecasting

This training material is the property of the International Monetary Fund (IMF) and is intended for use in IMF Institute for Capacity Development (ICD) courses. Any reuse requires the permission of the ICD.

EViews® is a trademark of IHS Global Inc.

IMFx

Module 9 (Optional): Combination Forecasts

Pat Healy & Carl Sandberg

• M9_Intro. Introduction, Motivations & Outline • M9_S1. Forecast Combination Basics • M9_S2. Solving the Theoretical Combination Problem &

Implementation Issues • M9_S3. Methods to Estimate Weights (M<T) • M9_S4. Methods to Estimate Weights (M>T) • M9_S5. EViews 1: EViews Workshop 1 • M9_S6. EViews 2: Forecast Combination Tools in EViews • M9_Conclusion. Conclusions

Roadmap

Module 9

M9_Intro: Introduction, Motivations & Outline

• Generally speaking, multiple forecasts are available to decision makers before they make a policy decision

• Key Question: Given the uncertainty associated with identifying the true DGP, should a single (best) forecast be used? Or should we (somehow) average over all the available forecasts?

Introduction

• There are several disadvantages of using only one forecasting model: – Model misspecifications of an unknown form – Implausible that one statistical model would be preferable to

others at all points of the forecast horizon

• Combining separate forecasts offers: – A simple way of building a complex, more flexible forecasting

model to explain the data – Some insurance against “breaks” or other non-stationarities that

may occur in the future

Motivations: One vs. Many

1. Forecast Combination Basics 2. Solving the Theoretical Combination Problem &

Implementation Issues 3. Methods to Estimate & Assign Weights 4. Workshop 1: Combining Forecasts in EViews 5. Workshop 2: EViews Combination Tool 6. Wrap-Up

Outline

Module 9

Session 1, Part 1: Forecast Combination Basics

• Learning Objectives:

– Look at a general framework and notation for combining forecasts

– Introduce the theoretical forecast combination problem

1. Forecast Combination Basics

• Today (say, at time T) we want to forecast the value that a variable of interest (Y) will take at time T+h

• We have a certain number (M) of forecasts available

• How can we pool, or combine these M forecasts into an optimal forecast? – Is there any advantage of pooling instead of just finding

and using the “best” one among the M available?

General Framework

• yt is the value of Y at time t (today is T)

• xt,h,i is an unbiased (point) forecast of yt+h made at time t

– h is the forecasting horizon

– i = 1,…M is the identifier of the available forecast

• M is the total number of forecasts

Some Notation

• et+h,i = yt+h - xt,h,i is the forecast (prediction) error

• 2t+h,i = Var(e2

t+h,i) is the inverse of precision

• t+h,i,j = Cov(et+h,i et+h,j)

• wt,h,i = [wt,h,1 ,… wt,h,M ]’ is a vector of weights

• L(et,h) is the loss from making a forecast error

• E [L(et,h)] is the risk associated to a forecast

Some (more) Notation

• A combined forecast is a weighted average of the M forecasts:

• The forecast combination “problem” can be formally stated as:

The Forecast Combination Problem

, , , ,

1

ˆM

c

T h T h i T h i

i

y w x

Problem: Choose weights wT,h,i to minimize

subject to , ,

1

1M

T h i

i

w

[ ( )]c

T hE L e

Observations on a variable Y

Observations of forecasts of Y with forecasting horizon 1

Forecasting errors

Question: How much weight shall we give to the current forecast given past performance and knowing that there will be a forecasting error?

What is the Problem Really About?

T

?

What is the Problem Really About?

Time T T + 1

GDP A

B

C

Module 9

Session 1, Part 2: Combining Prediction Errors

• Squared error loss:

• Absolute error loss:

• Linex loss:

Examples of Loss Functions

We will focus on mean squared error loss 2[ ( )] [( ) ]c c

T h T hE L e E e

2[( ) ]c

T hE e

[| |]c

T hE e

[exp( ) 1]c c

T h T hE e e

Combining Prediction Errors

Notice that because then:

ˆc c

T h T h T he y y

, , ,

1

Mc

T h T h i T h i

i

E L e E L w e

, , , , , ,

1 1

M M

T h T h i T h i T h i

i i

y w w x

, , , ,

1

, , ,

1

( )M

T h i T h T h i

i

M

T h i T h i

i

w y x

w e

Hence, if weights sum to one:

, ,

1

1M

T h i

i

w

Combination Problem with MSE Let w be the (M x 1) vector of weights, e the (M x 1) vector of forecast errors, u an (M x 1) vector of 1s, and the (M x M) variance covariance matrix of the errors

EΣ ee'It follows that

, , ,

1

Mc

T h T h i T h i

i

e w e

w'e 2

c

T he w'ee'w

2

c

T hE e E E

w'ee'w w' ee' w w'Σw

Problem 1: Choose w to minimize w’ w subject to u’w = 1.

, ,

1

'M

T h i

i

w

u w

• Must the weights sum to one? – If forecasts are unbiased, this guarantees an

unbiased combination forecast

• How restrictive is pooling the forecasts rather than information sets? – Pooling information sets is theoretically better but

practically difficult or impossible

Issues and Clarification (I)

• Are point forecasts the only forecasts that we can combine? – We can also combine forecasts of distributions.

• Are Bayesian methods useful? – Not entirely.

• Is there a difference between averaging across forecasts and across forecasting models? – If you know the models and the models are linear in the

parameters, there is no difference.

Issues and Clarification (II)

1. What are the optimal weights in the population ?

2. How can we estimate the optimal weights?

3. Are these estimates good?

Broad Summary of Questions

Module 9

Session 2, Part 1: Solving the Theoretical Combination Problem

• Learning Objectives:

– Create a simple combination of forecasts using 2 forecasts

– Generalize to M forecasts

– Explore key takeaways from combining

2. Solving the Theoretical Combination Problem

For now, let’s assume that we know the distribution of the forecasting errors associated to each forecast.

Simple Combination

T T + 1

What are the optimal

(loss minimizing) weights,

in population?

Simple Combination with m=2

The solution to the Problem (m=2) is:

, ,1 , ,2ˆ (1 )c

T h T h T hy wx w x 2 2

,1 ,2[( ) ] [( (1 ) ) ]c

T h T h T hE e E we w e

2

,2 ,1,2*

2 2

,1 ,2 ,1,22

T h T h

T h T h T h

w

2

,1 ,1,2*

2 2

,1 ,2 ,1,2

12

T h T h

T h T h T h

w

weight of xT,h,1

weight of xT,h,2

Consider two point forecasts:

=w2sT+h,1

2 + (1-w)2sT+h,2

2 +2w(1-w)sT+h,1,2

Consider:

• a larger weight is assigned to the more precise model

• the weights are the same (w* = 0.5) if and only if 2T+h,1 = 2

T+h,2

• if the covariance between the two forecasts increases, greater weight goes to the more precise forecast

Interpreting the Optimal Weights

2*,2 ,1,2

* 2

,1 ,1,21

T h T h

T h T h

w

w

General Result with M Forecasts:

Result: The vector of optimal weights w’ with M forecasts is

'

-1

T,h

-1

T,h

u'Σw

u'Σ u

The choice of weights will reflect var-covar matrix of FE.

Takeaway 1: Combining Forecasts Decreases Risk

Compute the expected loss (MSE) under the optimal weights 2 2 2

,1 ,2 ,1,2* 2

2 2

,1 ,2 ,1,2 ,1 ,2

(1 )[( ( )) ]

2

T h T h T hc

T h

T h T h T h T h T h

E e w

Is the correlation coefficient

Result:

That is, the forecast risk from combining forecasts is lower than the lowest of the forecasting risk from the single forecasts

Suppose that 2 2

,2 ,1,2* 2 2 2

,1 ,12 2

,1 ,2 ,1,2 ,1 ,2

(1 )[( ( )) ]

2

T h T hc

T h T h T h

T h T h T h T h T h

E e w

* 2 2 2

,1 ,2[( ( )) ] min{ , }c

T h T h T hE e w

Takeaway 2: Bias vs. Variance Tradeoff

Result: Combining forecasts offers a tradeoff between increased overall bias vs. lower (ex-ante) forecast variance.

The MSE loss function of a forecast has two components:

• the squared bias of the forecast

• the (ex-ante) forecast variance

2 2 2

, , , , , ,[( ) ] ( )T h T h i T h i y T h iE y x Bias Var x 2

2

, , , , , , , , , ,

2 2 2 2 2

, , , , , , . .

( ) ( )M M

c

T h t h T h i T h i y T h i t h i t h i

i i

M M

T h i T h i y T h i T h i

i i

E y y E w bias w x E x

w bias w Var

What is the Problem Really About?

Time T T + 1

GDP A

B

C

Module 9

Session 2, Part 2: Implementation Issues

Issue: What if it is optimal for one weight to be<0?

Shall we impose the

constraints that wT,h,i > 0 ?

Consider again the case M = 2. The optimal weights are such that:

• If T,h,1,2 > 0 and 2T,h,2 < T,h,1,2 <

2T,h,1 then w* < 0

2*,2 ,1,2

* 2

,1 ,1,21

T h T h

T h T h

w

w

In reality, we do not know : we can only estimate the theoretical weights using the observed past forecast errors.

Another Issue: Estimating Σ

T

et,h,1

T

et,h,2

1) Are the estimates of based on past errors unbiased?

2) Does the population depend on t?

• If not, estimates become better as T increases

• If it does, different issues: heteroskedasticity of any sort, serial correlation, etc.

• Do our estimates capture this dependence?

3) Does depend on past realizations of y?

Questions when estimating Σ

4) How good are our estimates of ? If M is large relative to T, our estimates are poor!

5) Shall we just focus on weighted averages? Why not to consider the median forecast, or trim excess forecasts?

Questions when estimating Σ

When does it make sense (in terms of minimum squared error) to use equal weights?

• when the variance of the forecast errors are the same

• and all the pair wise covariances across forecast errors are the same

• the loss function is symmetric

One More Issue: Optimality of Equal Weights?

Result: Equal weights tend to perform better than many estimates of the optimal weights (Stock and Watson 2004, Smith and Wallis 2009)

Module 9

Session 3: Methods to estimate the weights when M is low relative to T

• Learning Objectives:

– Decide when to combine and when not to combine

– Estimate weights using OLS

– Address Sampling Error

Methods of Weighting (M<T)

We need to assess whether one set of forecasts encompasses all information contained in another set of forecasts • Example, for 2 forecasting models, run the regression:

• If you cannot reject:

• All other outcomes imply that there is some information in both forecasts that can be used to obtain a lower mean squared error

To combine or not to combine?

, ,0 , ,1 , ,1 , ,2 , ,2t h T h T h t h T h t h t hy x x 1,2,... -t T h

0 , ,0 , ,1 , ,2: ( , , ) (0,1,0)T h T h T hH

0 , ,0 , ,1 , ,2: ( , , ) (0,0,1)T h T h T hH

If we assume a “linear-in-weights” model, OLS can be used to estimate the weights that minimizes the MSE using data for t = 1,…T – h:

OLS estimates of the weights

, ,1 , ,1 , ,2 , ,2 , , , ,...t h T h t h T h t h T h M t h M t hy x x x

, ,0 , ,1 , ,1 , ,2 , ,2 , , , ,...t h T h T h t h T h t h T h M t h M t hy x x x

Including a constant allows correcting

for the bias in any one of the forecasts ,

1

. . 1N

i

i

s t

If weights sum to one, then previous equation becomes a regression of a vector of 0s over the past forecasting errors:

OLS estimates of the weights

, ,1 , ,1 , ,2 , ,2

, ,1 ,1 , ,2 ,2

0 ( ) ( ) ...

0 ...

T h t h t h T h t h t h t h

T h t h T h t h t h

x y x y

e e

Problem 2: Choose w to minimize subject to u’w = 1 and

wi 0, where and et,h is a vector that collects the

forecast errors of the M forecasts made in t. 1

ˆ 'T h

t

t,h t,hΣ e e

ˆw'Σw

Assume that estimates of reflect (in part) sampling error.

• Although optimal weights depend on , it makes sense to reduce the dependence of the weights on such an estimate

One way to achieve this is to “shrink” the optimal weights towards equal weights (Stock and Watson 2004):

Reducing the dependency on sampling errors

, , , , (1 )(1/ ),

max(0,1 ( / ( 1)))

s

T h i T h i M

M T h M

Module 9

Session 4: Methods to estimate the weights when M is high relative to T

• Learning Objectives:

– Explore shortcomings of OLS weights

– Look at other parametric weights.

– Consider some non-parametric weights and techniques

Methods of Weighting (M>T)

The problem with OLS weights is that:

• If M is large relative to T – h, the estimator loses precision and may not even be feasible (if M > T – h)

• Even if M is low relative to T – h, OLS estimation of weights is subject to sampling error.

Premise: problems with OLS weights

Other Types of Weights

• Relative Performance

• Shrinking Relative Performance

• Recent Performance

• Adaptive Weights

• Non-parametric (trimming and indexing)

A solution to the problem of OLS weights is to ignore the covariance across forecast errors and compute weights based on their relative performance over the past.

For each forecast compute

Relative performance weights

MSE weights (or relative performance weights)

, ,

, ,

1 , ,

1

1

T h i

T h i M

i T h i

MSE

MSE

2

, , , ,

1

1

1

T h

T h i t h i

t

MSE eT h

Shrinking relative performance

Consider instead

The parameter k allows attenuating the attention we pay to performance

, ,

, ,

1 , ,

1

1

k

T h i

T h i kM

i T h i

MSE

MSE

If k = 1 we obtain standard MSE weights

If k = 0 we obtain equal weights 1/M

Consider computing a “discounted”

Highlighting recent performance

Computing MSE weights using either rolling windows or discounting allows paying more attention to recent performance

rolling window

discounted MSE

2

, , , ,

1

1( )

#period with 0

T h

T h i t h i

t

MSE e t

1 if ( )

0 if

t T h vt

t T h v

( ) T h tt

where (t) can be either one of the following

Relative performance weights could be sensitive to adding new forecast errors. A possibility is to adapt previous weights by the most recently computed weights

Adaptive weights

, , MSE weight (with or without covariance)T h i

* *

, , 1, , , ,(1- )T h i T h i T h i

• Often advantageous to discard the models with the worst and best performance when combining forecasts

– Simple averages are easily distorted by extreme forecasts/forecast errors

• Aiolfi and Favero (2003) recommend ranking the individual models by R^2 and discarding the bottom and top 10 percent

Non parametric weights: Trimming

Rank the models by their MSE. Let Ri be the rank of the i-th model, then:

Non parametric weights: Indexing

Index based-weights

, ,

1

1

1

index iT h i M

j

j

Rw

R

Module 9

Session 5: Workshop on Combining Forecasts

• Have open “MF_combination_forecasting.wf1”—we’ll be working on the “CF_W1_Combined” pagefile.

• Let’s estimate 4 regression models:

Workshop 1: Combining Forecasts

• Step One: Estimate each of the models using LS and Forecast 2008-09 using EViews. Note the RMSE of each.

• Step Two: Calculate a combined forecast using the misspecified ones using: – Equal Weights

– Trimmed Weights

– Inverse MSE weights

Combining Forecasts

• Step Three: Compare the RMSE of the combined forecast models to that of the individual forecast models. – Which is most accurate?

– Do all 3 combined ones outperform the individual ones?

• Step Four: Repeat Steps 1-3 for the true DGP:

– Then forecast 2008-09, what is RMSE of this full model?

Combining Forecasts

Module 9

Session 6 : Forecast Combination Tools in EViews

• Have open “MF_M9.wf1”—we’ll be working on the “USA” pagefile.

• Let’s explore different forecast combination techniques!

• Note: This workshop will involve the use of multiple program files. Make sure to have the workfile open at all times.

Workshop 2: Forecast Combination Techniques

• Step One: First, remember to always inspect your data: – The USA pagefile contains quarterly data for 1959q1-2010q4:

• “rgdp”= real GDP index (2005=100)—in this case will use natural log (“lngdp”).

• “growth” = growth rate over different time spans (1, 2 and 4 quarters)

• “fh_i_j” = Growth forecast indexed by time horizon (i) and model number (j)

• Step Two: Produce combined forecast for GDP growth in 2004q1—we can compare this to actual GDP growth. – Use programs to compute forecasts for rest of 2004-2007 (tedious).

W2: Forecast Combination Techniques

• Have open “MF_M9.wf1”—we’ll be working on the “USA” pagefile.

• Let’s explore different forecast combination techniques!

• Note: This workshop will involve the use of multiple program files. Make sure to have the workfile open at all times.

Workshop 2: Forecast Combination Techniques

Module 9

Session 7: Conclusions

• Numerous weighting schemes have been proposed to formulate combined forecasts.

• Simple combination schemes are difficult to beat; why this is the case is not fully understood.

– Simple weights reduce variability with relatively little cost in terms of overall bias

– Also provide diversity if pool of models is indeed diverse.

Conclusions

• Results are valid for symmetric loss function; may not be valid if sign of the error matters

• Forecasts based solely on the model with the best in-sample performance often yields poor out-of-sample forecasting performance.

• Reflects the reasonable prior that a “preferred” model is really just an approximation of true DGP, which can change each period.

• Combined forecasts imply diversification of risk (provided not all the models suffer from the same misspecification problem).

Conclusions

Appendix

65 JVI14.09

Appendix 1

Let e be the (M x 1) vector of the forecast errors. Problem 1: choose the vector w to minimize E[w’ee’w] subject to u’w = 1. Notice that E[w’ee’w] = w’E[ee’]w = w’w. The Lagrangean is

66

L w'ee'w -λ[u'w -1]

and the FOC is 0 * -1

Σw-λu w = Σ uλ

Using u’w = 1 one can obtain 1

1

* -1 -1 -1

u'w = u'Σ uλ = u'Σ uλ = u'Σ u

Substituting back one obtains 1

* -1 -1

w = Σ u u'Σ u

JVI14.09

Appendix 1

67

Let t,h be the variance-covariance matrix of the forecasting errors 2

,1 ,1,2

, 2

,1,2 ,2

T h T h

T h

T h T h

Consider the inverse of this matrix 2

,2 ,1,21

, 2

,1,2 ,1,

1

det | |

T h T h

T h

T h T hT h

Let u’ = [1, 1]. The two weights w* and (1 - w*) can be written as

* *1w w

-1

T,h

-1

T,h

Σ u

u'Σ u

JVI14.09

Appendix 2

Notice that

68

2 2 2

,1 ,2 ,1,2 ,1 ,22 ( ) 0T h T h T h T h T hE e e

2 2

,2 ,1,2

2 2

,1 ,2 ,1,2 ,1 ,2

(1 )1

2

T h T h

T h T h T h T h T h

So, the following inequality holds

2 2 2 2

,2 ,1,2 ,1 ,2 ,1,2 ,1 ,2(1 ) 2T h T h T h T h T h T h T h

2 2 2

,1 ,1,2 ,1 ,2 ,2 ,1,20 2T h T h T h T h T h T h

2

,1 ,2 ,10 ( )T h T h T h

and that 2 2

,1 ,1,2(1 ) 0T h T h

JVI14.09

Appendix 3

The MSE loss function of a forecast has two components:

• the squared bias of the forecast

• the (ex-ante) forecast variance

69

2 2

, , , ,[( ) ] [( ( ) ) ]T h T h i T h y T h iE y x E E y x

2

, , , , , ,[( ( ) ( ) ( ) ) ]T h y T h i T h i T h iE E y E x E x x

2

, , , ,[( ( ) ) ]i y T h i T h iE Bias E x x

2 2

, ,( )i y T h iBias Var x

JVI14.09

Appendix 4

Suppose that where P is an (m x T) matrix, y is a (T x 1) vector with all yt , t = 1,…T. Consider:

70

, ,T h ix Py

22

, , , ,

1

2

, , ,

1

22

, , , , ,

1 1

1[ ]

1 2[ ] [ ]

T h

T h i t h i t h

t

T h

t h i t h y t h

t

T h T h

y t h i t h y t h t h i t h

t t

x yT h

x E yT h

x E y x E yT h T h

JVI14.09

Appendix 4

Consider:

71

22 2

, , , , , , ,

1 1

1 2ˆ [ ] [ ]

2...

2...

2...

2

T h T h

T h i y t h i t h y t h t h i t h

t t

T

x E y x E yT h T h

T h

T h

T h

MSPE ET h

ε'(Py -E[y])

ε'(PE[y]+ Pε -E[y])

ε'Pε ε'(I - P)E[y]

ε'Pε

JVI14.09