Forecasting without forecasters

Post on 06-May-2015

1,862 views 0 download

description

Keynote talk given at the International Symposium on Forecasting, Seoul, South Korea. 25 June 2013

transcript

1

Rob J Hyndman

Forecasting withoutforecasters

Outline

1 Motivation

2 Exponential smoothing

3 ARIMA modelling

4 Time series with complex seasonality

5 Hierarchical and grouped time series

6 Functional time series

Forecasting without forecasters Motivation 2

Motivation

Forecasting without forecasters Motivation 3

Motivation

Forecasting without forecasters Motivation 3

Motivation

Forecasting without forecasters Motivation 3

Motivation

Forecasting without forecasters Motivation 3

Motivation

Forecasting without forecasters Motivation 3

Motivation

1 Common in business to have over 1000products that need forecasting at least monthly.

2 Forecasts are often required by people who areuntrained in time series analysis.

3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.

Specifications

Automatic forecasting algorithms must:

å determine an appropriate time series model;

å estimate the parameters;

å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4

Motivation

1 Common in business to have over 1000products that need forecasting at least monthly.

2 Forecasts are often required by people who areuntrained in time series analysis.

3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.

Specifications

Automatic forecasting algorithms must:

å determine an appropriate time series model;

å estimate the parameters;

å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4

Motivation

1 Common in business to have over 1000products that need forecasting at least monthly.

2 Forecasts are often required by people who areuntrained in time series analysis.

3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.

Specifications

Automatic forecasting algorithms must:

å determine an appropriate time series model;

å estimate the parameters;

å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4

Motivation

1 Common in business to have over 1000products that need forecasting at least monthly.

2 Forecasts are often required by people who areuntrained in time series analysis.

3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.

Specifications

Automatic forecasting algorithms must:

å determine an appropriate time series model;

å estimate the parameters;

å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4

Motivation

1 Common in business to have over 1000products that need forecasting at least monthly.

2 Forecasts are often required by people who areuntrained in time series analysis.

3 Some types of data can be decomposed into alarge number of univariate time series thatneed to be forecast.

Specifications

Automatic forecasting algorithms must:

å determine an appropriate time series model;

å estimate the parameters;

å compute the forecasts with prediction intervals.Forecasting without forecasters Motivation 4

Example: Asian sheep

Forecasting without forecasters Motivation 5

Numbers of sheep in Asia

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

250

300

350

400

450

500

550

Example: Asian sheep

Forecasting without forecasters Motivation 5

Automatic ETS forecasts

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

250

300

350

400

450

500

550

Example: Cortecosteroid sales

Forecasting without forecasters Motivation 6

Monthly cortecosteroid drug sales in Australia

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

Example: Cortecosteroid sales

Forecasting without forecasters Motivation 6

Automatic ARIMA forecasts

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

M3 competition

Forecasting without forecasters Motivation 7

M3 competition

Forecasting without forecasters Motivation 7

3003 time series.

Early comparison of automatic forecastingalgorithms.

Best-performing methods undocumented.

Limited subsequent research on generalautomatic forecasting algorithms.

M3 competition

Forecasting without forecasters Motivation 7

3003 time series.

Early comparison of automatic forecastingalgorithms.

Best-performing methods undocumented.

Limited subsequent research on generalautomatic forecasting algorithms.

M3 competition

Forecasting without forecasters Motivation 7

3003 time series.

Early comparison of automatic forecastingalgorithms.

Best-performing methods undocumented.

Limited subsequent research on generalautomatic forecasting algorithms.

M3 competition

Forecasting without forecasters Motivation 7

3003 time series.

Early comparison of automatic forecastingalgorithms.

Best-performing methods undocumented.

Limited subsequent research on generalautomatic forecasting algorithms.

Outline

1 Motivation

2 Exponential smoothing

3 ARIMA modelling

4 Time series with complex seasonality

5 Hierarchical and grouped time series

6 Functional time series

Forecasting without forecasters Exponential smoothing 8

Exponential smoothing

Forecasting without forecasters Exponential smoothing 9

Classic Reference

Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.

Exponential smoothing

Forecasting without forecasters Exponential smoothing 9

Classic Reference

Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.

å “Unfortunately, exponential smoothingmethods do not allow the easy calculation ofprediction intervals.” (MWH, p.177)

å No satisfactory way to select an exponentialsmoothing method.

Exponential smoothing

Forecasting without forecasters Exponential smoothing 9

Classic Reference

Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.

å “Unfortunately, exponential smoothingmethods do not allow the easy calculation ofprediction intervals.” (MWH, p.177)

å No satisfactory way to select an exponentialsmoothing method.

Exponential smoothing

Forecasting without forecasters Exponential smoothing 9

Classic Reference

Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.

Current Reference

Hyndman and Athanasopoulos(2013) Forecasting: principlesand practice, OTexts: Australia.OTexts.com/fpp.

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothing

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear method

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend method

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend method

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend method

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ method

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

N,N: Simple exponential smoothingA,N: Holt’s linear methodAd,N: Additive damped trend methodM,N: Exponential trend methodMd,N: Multiplicative damped trend methodA,A: Additive Holt-Winters’ methodA,M: Multiplicative Holt-Winters’ method

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

There are 15 separate exponential smoothingmethods.

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

There are 15 separate exponential smoothingmethods.Each can have an additive or multiplicativeerror, giving 30 separate models.

Forecasting without forecasters Exponential smoothing 10

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing

Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Forecasting without forecasters Exponential smoothing 11

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing

Examples:A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Forecasting without forecasters Exponential smoothing 11

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↑

TrendExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Forecasting without forecasters Exponential smoothing 11

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↑ ↖

Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Forecasting without forecasters Exponential smoothing 11

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↗ ↑ ↖

Error Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Forecasting without forecasters Exponential smoothing 11

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↗ ↑ ↖

Error Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Forecasting without forecasters Exponential smoothing 11

Exponential smoothing methods

Seasonal ComponentTrend N A M

Component (None) (Additive) (Multiplicative)

N (None) N,N N,A N,M

A (Additive) A,N A,A A,M

Ad (Additive damped) Ad,N Ad,A Ad,M

M (Multiplicative) M,N M,A M,M

Md (Multiplicative damped) Md,N Md,A Md,M

General notation E T S : ExponenTial Smoothing↗ ↑ ↖

Error Trend SeasonalExamples:

A,N,N: Simple exponential smoothing with additive errorsA,A,N: Holt’s linear method with additive errorsM,A,M: Multiplicative Holt-Winters’ method with multiplicative errors

Forecasting without forecasters Exponential smoothing 11

Innovations state space models

å All ETS models can be written in innovationsstate space form (IJF, 2002).

å Additive and multiplicative versions give thesame point forecasts but different predictionintervals.

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each of 30 models that are appropriate tothe data. Optimize parameters and initialvalues using MLE (or some other criterion).

Select best method using AIC:

AIC = −2 log(Likelihood) + 2p

where p = # parameters.

Produce forecasts using best method.

Obtain prediction intervals using underlyingstate space model.

Forecasting without forecasters Exponential smoothing 12

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each of 30 models that are appropriate tothe data. Optimize parameters and initialvalues using MLE (or some other criterion).

Select best method using AIC:

AIC = −2 log(Likelihood) + 2p

where p = # parameters.

Produce forecasts using best method.

Obtain prediction intervals using underlyingstate space model.

Forecasting without forecasters Exponential smoothing 12

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each of 30 models that are appropriate tothe data. Optimize parameters and initialvalues using MLE (or some other criterion).

Select best method using AIC:

AIC = −2 log(Likelihood) + 2p

where p = # parameters.

Produce forecasts using best method.

Obtain prediction intervals using underlyingstate space model.

Forecasting without forecasters Exponential smoothing 12

Automatic forecasting

From Hyndman et al. (IJF, 2002):

Apply each of 30 models that are appropriate tothe data. Optimize parameters and initialvalues using MLE (or some other criterion).

Select best method using AIC:

AIC = −2 log(Likelihood) + 2p

where p = # parameters.

Produce forecasts using best method.

Obtain prediction intervals using underlyingstate space model.

Forecasting without forecasters Exponential smoothing 12

Exponential smoothing

Forecasting without forecasters Exponential smoothing 13

Forecasts from ETS(M,A,N)

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

300

400

500

600

Exponential smoothing

fit <- ets(livestock)fcast <- forecast(fit)plot(fcast)

Forecasting without forecasters Exponential smoothing 14

Forecasts from ETS(M,A,N)

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

300

400

500

600

Exponential smoothing

Forecasting without forecasters Exponential smoothing 15

Forecasts from ETS(M,Md,M)

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

1.6

Exponential smoothing

fit <- ets(h02)fcast <- forecast(fit)plot(fcast)

Forecasting without forecasters Exponential smoothing 16

Forecasts from ETS(M,Md,M)

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

1.6

M3 comparisons

Method MAPE sMAPE MASE

Theta 17.83 12.86 1.40

ForecastPro 18.00 13.06 1.47

ETS additive 18.58 13.69 1.48

ETS 19.33 13.57 1.59

Forecasting without forecasters Exponential smoothing 17

References

RJ Hyndman, AB Koehler, RD Snyder, andS Grose (2002). “A state space framework forautomatic forecasting using exponentialsmoothing methods”. International Journal ofForecasting 18(3), 439–454.

RJ Hyndman, AB Koehler, JK Ord, and RD Snyder(2008). Forecasting with exponentialsmoothing: the state space approach.Springer-Verlag.

RJ Hyndman and G Athanasopoulos (2013).Forecasting: principles and practice. OTexts.OTexts.com/fpp/.

Forecasting without forecasters Exponential smoothing 18

Outline

1 Motivation

2 Exponential smoothing

3 ARIMA modelling

4 Time series with complex seasonality

5 Hierarchical and grouped time series

6 Functional time series

Forecasting without forecasters ARIMA modelling 19

ARIMA modelling

Forecasting without forecasters ARIMA modelling 20

Classic Reference

Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.

ARIMA modelling

Forecasting without forecasters ARIMA modelling 20

Classic Reference

Makridakis, Wheelwright andHyndman (1998) Forecasting:methods and applications, 3rded., Wiley: NY.

å “There is such a bewildering variety of ARIMAmodels, it can be difficult to decide which modelis most appropriate for a given set of data.”(MWH, p.347)

Auto ARIMA

Forecasting without forecasters ARIMA modelling 21

Forecasts from ARIMA(0,1,0) with drift

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

250

300

350

400

450

500

550

Auto ARIMA

fit <- auto.arima(livestock)fcast <- forecast(fit)plot(fcast)

Forecasting without forecasters ARIMA modelling 22

Forecasts from ARIMA(0,1,0) with drift

Year

mill

ions

of s

heep

1960 1970 1980 1990 2000 2010

250

300

350

400

450

500

550

Auto ARIMA

Forecasting without forecasters ARIMA modelling 23

Forecasts from ARIMA(3,1,3)(0,1,1)[12]

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

Auto ARIMA

fit <- auto.arima(h02)fcast <- forecast(fit)plot(fcast)

Forecasting without forecasters ARIMA modelling 24

Forecasts from ARIMA(3,1,3)(0,1,1)[12]

Year

Tota

l scr

ipts

(m

illio

ns)

1995 2000 2005 2010

0.4

0.6

0.8

1.0

1.2

1.4

How does auto.arima() work?

A non-seasonal ARIMA process

φ(B)(1− B)dyt = c + θ(B)εt

Need to select appropriate orders p,q,d, andwhether to include c.

Forecasting without forecasters ARIMA modelling 25

Algorithm choices driven by forecast accuracy.

How does auto.arima() work?

A non-seasonal ARIMA process

φ(B)(1− B)dyt = c + θ(B)εt

Need to select appropriate orders p,q,d, andwhether to include c.

Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select p,q, c by minimising AIC.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.

Forecasting without forecasters ARIMA modelling 25

Algorithm choices driven by forecast accuracy.

How does auto.arima() work?

A non-seasonal ARIMA process

φ(B)(1− B)dyt = c + θ(B)εt

Need to select appropriate orders p,q,d, andwhether to include c.

Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select p,q, c by minimising AIC.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.

Forecasting without forecasters ARIMA modelling 25

Algorithm choices driven by forecast accuracy.

How does auto.arima() work?

A seasonal ARIMA process

Φ(Bm)φ(B)(1− B)d(1− Bm)Dyt = c + Θ(Bm)θ(B)εt

Need to select appropriate orders p,q,d, P,Q,D, andwhether to include c.

Hyndman & Khandakar (JSS, 2008) algorithm:Select no. differences d via KPSS unit root test.Select D using OCSB unit root test.Select p,q, P,Q, c by minimising AIC.Use stepwise search to traverse model space,starting with a simple model and consideringnearby variants.

Forecasting without forecasters ARIMA modelling 26

M3 comparisons

Method MAPE sMAPE MASE

Theta 17.83 12.86 1.40

ForecastPro 18.00 13.06 1.47

BJauto 19.14 13.73 1.55

AutoARIMA 18.98 13.75 1.47

ETS-additive 18.58 13.69 1.48

ETS 19.33 13.57 1.59

ETS-ARIMA 18.17 13.11 1.44Forecasting without forecasters ARIMA modelling 27

M3 conclusions

MYTHS

Simple methods do better.

Exponential smoothing is better than ARIMA.

FACTS

The best methods are hybrid approaches.

ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.

I have an algorithm that does better than all ofthese, but it takes too long to be practical.

Forecasting without forecasters ARIMA modelling 28

M3 conclusions

MYTHS

Simple methods do better.

Exponential smoothing is better than ARIMA.

FACTS

The best methods are hybrid approaches.

ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.

I have an algorithm that does better than all ofthese, but it takes too long to be practical.

Forecasting without forecasters ARIMA modelling 28

M3 conclusions

MYTHS

Simple methods do better.

Exponential smoothing is better than ARIMA.

FACTS

The best methods are hybrid approaches.

ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.

I have an algorithm that does better than all ofthese, but it takes too long to be practical.

Forecasting without forecasters ARIMA modelling 28

M3 conclusions

MYTHS

Simple methods do better.

Exponential smoothing is better than ARIMA.

FACTS

The best methods are hybrid approaches.

ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.

I have an algorithm that does better than all ofthese, but it takes too long to be practical.

Forecasting without forecasters ARIMA modelling 28

M3 conclusions

MYTHS

Simple methods do better.

Exponential smoothing is better than ARIMA.

FACTS

The best methods are hybrid approaches.

ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.

I have an algorithm that does better than all ofthese, but it takes too long to be practical.

Forecasting without forecasters ARIMA modelling 28

M3 conclusions

MYTHS

Simple methods do better.

Exponential smoothing is better than ARIMA.

FACTS

The best methods are hybrid approaches.

ETS-ARIMA (the simple average of ETS-additiveand AutoARIMA) is the only fully documentedmethod that is comparable to the M3competition winners.

I have an algorithm that does better than all ofthese, but it takes too long to be practical.

Forecasting without forecasters ARIMA modelling 28

References

RJ Hyndman and Y Khandakar (2008).“Automatic time series forecasting : theforecast package for R”. Journal of StatisticalSoftware 26(3)

RJ Hyndman (2011). “Major changes to theforecast package”.robjhyndman.com/hyndsight/forecast3/.

RJ Hyndman and G Athanasopoulos (2013).Forecasting: principles and practice. OTexts.OTexts.com/fpp/.

Forecasting without forecasters ARIMA modelling 29

Outline

1 Motivation

2 Exponential smoothing

3 ARIMA modelling

4 Time series with complex seasonality

5 Hierarchical and grouped time series

6 Functional time series

Forecasting without forecasters Time series with complex seasonality 30

Examples

Forecasting without forecasters Time series with complex seasonality 31

US finished motor gasoline products

Weeks

Tho

usan

ds o

f bar

rels

per

day

1992 1994 1996 1998 2000 2002 2004

6500

7000

7500

8000

8500

9000

9500

Examples

Forecasting without forecasters Time series with complex seasonality 31

Number of calls to large American bank (7am−9pm)

5 minute intervals

Num

ber

of c

all a

rriv

als

100

200

300

400

3 March 17 March 31 March 14 April 28 April 12 May

Examples

Forecasting without forecasters Time series with complex seasonality 31

Turkish electricity demand

Days

Ele

ctric

ity d

eman

d (G

W)

2000 2002 2004 2006 2008

1015

2025

TBATS model

TBATSTrigonometric terms for seasonality

Box-Cox transformations for heterogeneity

ARMA errors for short-term dynamics

Trend (possibly damped)

Seasonal (including multiple and

non-integer periods)

Forecasting without forecasters Time series with complex seasonality 32

Examples

fit <- tbats(gasoline)fcast <- forecast(fit)plot(fcast)

Forecasting without forecasters Time series with complex seasonality 33

Forecasts from TBATS(0.999, {2,2}, 1, {<52.1785714285714,8>})

Weeks

Tho

usan

ds o

f bar

rels

per

day

1995 2000 2005

7000

8000

9000

1000

0

Examples

fit <- tbats(callcentre)fcast <- forecast(fit)plot(fcast)

Forecasting without forecasters Time series with complex seasonality 34

Forecasts from TBATS(1, {3,1}, 0.987, {<169,5>, <845,3>})

5 minute intervals

Num

ber

of c

all a

rriv

als

010

020

030

040

050

0

3 March 17 March 31 March 14 April 28 April 12 May 26 May 9 June

Examples

fit <- tbats(turk)fcast <- forecast(fit)plot(fcast)

Forecasting without forecasters Time series with complex seasonality 35

Forecasts from TBATS(0, {5,3}, 0.997, {<7,3>, <354.37,12>, <365.25,4>})

Days

Ele

ctric

ity d

eman

d (G

W)

2000 2002 2004 2006 2008 2010

1015

2025

References

Automatic algorithm described inAM De Livera, RJ Hyndman, and RD Snyder(2011). “Forecasting time series with complexseasonal patterns using exponentialsmoothing”. Journal of the American StatisticalAssociation 106(496), 1513–1527.

Slightly improved algorithm implemented inRJ Hyndman (2012). forecast: Forecastingfunctions for time series.cran.r-project.org/package=forecast.

More work required!

Forecasting without forecasters Time series with complex seasonality 36

Outline

1 Motivation

2 Exponential smoothing

3 ARIMA modelling

4 Time series with complex seasonality

5 Hierarchical and grouped time series

6 Functional time series

Forecasting without forecasters Hierarchical and grouped time series 37

Introduction

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Examples

Manufacturing product hierarchies

Pharmaceutical sales

Net labour turnover

Forecasting without forecasters Hierarchical and grouped time series 38

Introduction

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Examples

Manufacturing product hierarchies

Pharmaceutical sales

Net labour turnover

Forecasting without forecasters Hierarchical and grouped time series 38

Introduction

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Examples

Manufacturing product hierarchies

Pharmaceutical sales

Net labour turnover

Forecasting without forecasters Hierarchical and grouped time series 38

Introduction

Total

A

AA AB AC

B

BA BB BC

C

CA CB CC

Examples

Manufacturing product hierarchies

Pharmaceutical sales

Net labour turnover

Forecasting without forecasters Hierarchical and grouped time series 38

Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Example: Pharmaceutical products are organizedin a hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.

A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.

Example: daily numbers of calls to HP call centresare grouped by product type and location of callcentre.

Forecasting without forecasters Hierarchical and grouped time series 39

Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Example: Pharmaceutical products are organizedin a hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.

A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.

Example: daily numbers of calls to HP call centresare grouped by product type and location of callcentre.

Forecasting without forecasters Hierarchical and grouped time series 39

Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Example: Pharmaceutical products are organizedin a hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.

A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.

Example: daily numbers of calls to HP call centresare grouped by product type and location of callcentre.

Forecasting without forecasters Hierarchical and grouped time series 39

Hierarchical/grouped time seriesA hierarchical time series is a collection ofseveral time series that are linked together in ahierarchical structure.

Example: Pharmaceutical products are organizedin a hierarchy under the Anatomical TherapeuticChemical (ATC) Classification System.

A grouped time series is a collection of timeseries that are aggregated in a number ofnon-hierarchical ways.

Example: daily numbers of calls to HP call centresare grouped by product type and location of callcentre.

Forecasting without forecasters Hierarchical and grouped time series 39

Hierarchical data

Total

A B C

Forecasting without forecasters Hierarchical and grouped time series 40

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Hierarchical data

Total

A B C

Forecasting without forecasters Hierarchical and grouped time series 40

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Hierarchical data

Total

A B C

Y t = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

YA,tYB,tYC,t

Forecasting without forecasters Hierarchical and grouped time series 40

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Hierarchical data

Total

A B C

Y t = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

︸ ︷︷ ︸

S

YA,tYB,tYC,t

Forecasting without forecasters Hierarchical and grouped time series 40

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Hierarchical data

Total

A B C

Y t = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

︸ ︷︷ ︸

S

YA,tYB,tYC,t

︸ ︷︷ ︸

Bt

Forecasting without forecasters Hierarchical and grouped time series 40

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Hierarchical data

Total

A B C

Y t = [Yt, YA,t, YB,t, YC,t]′ =

1 1 11 0 00 1 00 0 1

︸ ︷︷ ︸

S

YA,tYB,tYC,t

︸ ︷︷ ︸

BtY t = SBt

Forecasting without forecasters Hierarchical and grouped time series 40

Yt : observed aggregate of allseries at time t.

YX,t : observation on series X attime t.

Bt : vector of all series atbottom level in time t.

Grouped data

Total

A

AX AY

B

BX BY

Total

X

AX BX

Y

AY BY

Y t =

YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t

=

1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1

︸ ︷︷ ︸

S

YAX,tYAY,tYBX,tYBY,t

︸ ︷︷ ︸

Bt

Forecasting without forecasters Hierarchical and grouped time series 41

Grouped data

Total

A

AX AY

B

BX BY

Total

X

AX BX

Y

AY BY

Y t =

YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t

=

1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1

︸ ︷︷ ︸

S

YAX,tYAY,tYBX,tYBY,t

︸ ︷︷ ︸

Bt

Forecasting without forecasters Hierarchical and grouped time series 41

Grouped data

Total

A

AX AY

B

BX BY

Total

X

AX BX

Y

AY BY

Y t =

YtYA,tYB,tYX,tYY,tYAX,tYAY,tYBX,tYBY,t

=

1 1 1 11 1 0 00 0 1 11 0 1 00 1 0 11 0 0 00 1 0 00 0 1 00 0 0 1

︸ ︷︷ ︸

S

YAX,tYAY,tYBX,tYBY,t

︸ ︷︷ ︸

Bt

Y t = SBt

Forecasting without forecasters Hierarchical and grouped time series 41

Forecasts

Key idea: forecast reconciliation

å Ignore structural constraints and forecast everyseries of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial forecasts for horizon h,made at time n, stacked in same order as Y t.

Optimal reconciled forecasts:

Yn(h) = S(S′S)−1S′Yn(h)

Forecasting without forecasters Hierarchical and grouped time series 42

Forecasts

Key idea: forecast reconciliation

å Ignore structural constraints and forecast everyseries of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial forecasts for horizon h,made at time n, stacked in same order as Y t.

Optimal reconciled forecasts:

Yn(h) = S(S′S)−1S′Yn(h)

Forecasting without forecasters Hierarchical and grouped time series 42

Forecasts

Key idea: forecast reconciliation

å Ignore structural constraints and forecast everyseries of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial forecasts for horizon h,made at time n, stacked in same order as Y t.

Optimal reconciled forecasts:

Yn(h) = S(S′S)−1S′Yn(h)

Forecasting without forecasters Hierarchical and grouped time series 42

Forecasts

Key idea: forecast reconciliation

å Ignore structural constraints and forecast everyseries of interest independently.

å Adjust forecasts to impose constraints.

Let Yn(h) be vector of initial forecasts for horizon h,made at time n, stacked in same order as Y t.

Optimal reconciled forecasts:

Yn(h) = S(S′S)−1S′Yn(h)

Forecasting without forecasters Hierarchical and grouped time series 42

Independent of covariance structure of hierarchy!

Optimal reconciliation weights are S(S′S)−1S′,independent of data.

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up and top-down,especially for middle levels.

Covariates can be included in base forecasts.

Adjustments can be made to base forecasts atany level.

Point forecasts are always aggregate consistent.

Very simple and flexible method. Can work withany hierarchical or grouped time series.

Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up and top-down,especially for middle levels.

Covariates can be included in base forecasts.

Adjustments can be made to base forecasts atany level.

Point forecasts are always aggregate consistent.

Very simple and flexible method. Can work withany hierarchical or grouped time series.

Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up and top-down,especially for middle levels.

Covariates can be included in base forecasts.

Adjustments can be made to base forecasts atany level.

Point forecasts are always aggregate consistent.

Very simple and flexible method. Can work withany hierarchical or grouped time series.

Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up and top-down,especially for middle levels.

Covariates can be included in base forecasts.

Adjustments can be made to base forecasts atany level.

Point forecasts are always aggregate consistent.

Very simple and flexible method. Can work withany hierarchical or grouped time series.

Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up and top-down,especially for middle levels.

Covariates can be included in base forecasts.

Adjustments can be made to base forecasts atany level.

Point forecasts are always aggregate consistent.

Very simple and flexible method. Can work withany hierarchical or grouped time series.

Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up and top-down,especially for middle levels.

Covariates can be included in base forecasts.

Adjustments can be made to base forecasts atany level.

Point forecasts are always aggregate consistent.

Very simple and flexible method. Can work withany hierarchical or grouped time series.

Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43

Features

Forget “bottom up” or “top down”. Thisapproach combines all forecasts optimally.

Method outperforms bottom-up and top-down,especially for middle levels.

Covariates can be included in base forecasts.

Adjustments can be made to base forecasts atany level.

Point forecasts are always aggregate consistent.

Very simple and flexible method. Can work withany hierarchical or grouped time series.

Conceptually easy to implement: OLS on baseforecasts.Forecasting without forecasters Hierarchical and grouped time series 43

Challenges

Computational difficulties in big hierarchies dueto size of the S matrix and non-singularbehavior of (S′S).

Need to estimate covariance matrix to produceprediction intervals.

Forecasting without forecasters Hierarchical and grouped time series 44

Challenges

Computational difficulties in big hierarchies dueto size of the S matrix and non-singularbehavior of (S′S).

Need to estimate covariance matrix to produceprediction intervals.

Forecasting without forecasters Hierarchical and grouped time series 44

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))

Forecasting without forecasters Hierarchical and grouped time series 45

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))

Forecasting without forecasters Hierarchical and grouped time series 45

Total

A

AX AY

B

BX BY

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))

# Forecast 10-step-ahead using optimal combination method# ETS used for each series by defaultfc <- forecast(y, h=10)

Forecasting without forecasters Hierarchical and grouped time series 46

Example using Rlibrary(hts)

# bts is a matrix containing the bottom level time series# g describes the grouping/hierarchical structurey <- hts(bts, g=c(1,1,2,2))

# Forecast 10-step-ahead using optimal combination method# ETS used for each series by defaultfc <- forecast(y, h=10)

# Select your own methodsally <- allts(y)allf <- matrix(, nrow=10, ncol=ncol(ally))for(i in 1:ncol(ally))

allf[,i] <- mymethod(ally[,i], h=10)allf <- ts(allf, start=2004)# Reconcile forecasts so they add upfc2 <- combinef(allf, Smatrix(y))

Forecasting without forecasters Hierarchical and grouped time series 47

References

RJ Hyndman, RA Ahmed, G Athanasopoulos,and HL Shang (2011). “Optimal combinationforecasts for hierarchical time series”.Computational Statistics and Data Analysis55(9), 2579–2589

RJ Hyndman, RA Ahmed, and HL Shang (2013).hts: Hierarchical time series.cran.r-project.org/package=hts.

RJ Hyndman and G Athanasopoulos (2013).Forecasting: principles and practice. OTexts.OTexts.com/fpp/.Forecasting without forecasters Hierarchical and grouped time series 48

Outline

1 Motivation

2 Exponential smoothing

3 ARIMA modelling

4 Time series with complex seasonality

5 Hierarchical and grouped time series

6 Functional time series

Forecasting without forecasters Functional time series 49

Fertility rates

Forecasting without forecasters Functional time series 50

Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.

ft(x) = µ(x) +K∑

k=1

βt,k φk(x) + et(x)

Forecasting without forecasters Functional time series 51

Decomposition separates time and age to allowforecasting.

Estimate µ(x) as mean ft(x) across years.

Estimate βt,k and φk(x) using functional (weighted)principal components.

Univariate models used for automatic forecasting ofscores {βt,k}.

Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.

ft(x) = µ(x) +K∑

k=1

βt,k φk(x) + et(x)

Forecasting without forecasters Functional time series 51

Decomposition separates time and age to allowforecasting.

Estimate µ(x) as mean ft(x) across years.

Estimate βt,k and φk(x) using functional (weighted)principal components.

Univariate models used for automatic forecasting ofscores {βt,k}.

Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.

ft(x) = µ(x) +K∑

k=1

βt,k φk(x) + et(x)

Forecasting without forecasters Functional time series 51

Decomposition separates time and age to allowforecasting.

Estimate µ(x) as mean ft(x) across years.

Estimate βt,k and φk(x) using functional (weighted)principal components.

Univariate models used for automatic forecasting ofscores {βt,k}.

Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.

ft(x) = µ(x) +K∑

k=1

βt,k φk(x) + et(x)

Forecasting without forecasters Functional time series 51

Decomposition separates time and age to allowforecasting.

Estimate µ(x) as mean ft(x) across years.

Estimate βt,k and φk(x) using functional (weighted)principal components.

Univariate models used for automatic forecasting ofscores {βt,k}.

Functional data modelLet ft,x be the observed data in period t at age x,t = 1, . . . ,n.

ft(x) = µ(x) +K∑

k=1

βt,k φk(x) + et(x)

Forecasting without forecasters Functional time series 51

Decomposition separates time and age to allowforecasting.

Estimate µ(x) as mean ft(x) across years.

Estimate βt,k and φk(x) using functional (weighted)principal components.

Univariate models used for automatic forecasting ofscores {βt,k}.

Fertility application

Forecasting without forecasters Functional time series 52

15 20 25 30 35 40 45 50

050

100

150

200

250

Australia fertility rates (1921−2006)

Age

Fer

tility

rat

e

Fertility model

Forecasting without forecasters Functional time series 53

15 20 25 30 35 40 45 50

05

1015

Age

Mu

15 20 25 30 35 40 45 50

0.00

0.10

0.20

0.30

AgeP

hi 1

Year

Bet

a 1

1920 1960 2000

−20

−10

05

10

15 20 25 30 35 40 45 50

−0.

2−

0.1

0.0

0.1

0.2

Age

Phi

2Year

Bet

a 2

1920 1960 2000

−10

010

2030

Forecasts of ft(x)

Forecasting without forecasters Functional time series 54

15 20 25 30 35 40 45 50

050

100

150

200

250

Australia fertility rates (1921−2006)

Age

Fer

tility

rat

e

Forecasts of ft(x)

Forecasting without forecasters Functional time series 54

15 20 25 30 35 40 45 50

050

100

150

200

250

Australia fertility rates (1921−2006)

Age

Fer

tility

rat

e

Forecasts of ft(x)

Forecasting without forecasters Functional time series 54

15 20 25 30 35 40 45 50

050

100

150

200

250

Australia fertility rates (1921−2006)

Age

Fer

tility

rat

e

Forecasts of ft(x)

Forecasting without forecasters Functional time series 54

15 20 25 30 35 40 45 50

050

100

150

200

250

Australia fertility rates (1921−2006)

Age

Fer

tility

rat

e

80% prediction intervals

R code

Forecasting without forecasters Functional time series 55

15 20 25 30 35 40 45 50

050

100

150

200

250

Australia fertility rates (1921−2006)

Age

Fer

tility

rat

e

library(demography)plot(aus.fert)fit <- fdm(aus.fert)fc <- forecast(fit)

References

RJ Hyndman and S Ullah (2007). “Robustforecasting of mortality and fertility rates: Afunctional data approach”. ComputationalStatistics and Data Analysis 51(10), 4942–4956

RJ Hyndman and HL Shang (2009). “Forecastingfunctional time series (with discussion)”.Journal of the Korean Statistical Society 38(3),199–221

RJ Hyndman (2012). demography: Forecastingmortality, fertility, migration and populationdata.cran.r-project.org/package=demography.

Forecasting without forecasters Functional time series 56

For further information

robjhyndman.com

Slides and references for this talk.

Links to all papers and books.

Links to R packages.

A blog about forecasting research.

Forecasting without forecasters Functional time series 57