A Bottoms-Up Approach to Time Series Analysis Prepared for: 27th International Symposium on...

A Bottoms-Up Approach to

Time Series Analysis

Prepared for: 27th International Symposium on Forecasting

June 24, 2007

New York City, N.Y.

David P. Reilly

Automatic Forecasting Systems Inc.

203/19/07

Automatic Forecasting Systems, Inc. (AFS)

Phone: 215-675-0652email: [email protected] Site: www.autobox.com

Forecasting History is Always Easier Than Forecasting The Future

403/19/07

There are many good pieces of software on the market and a lot of what is demonstrated here can be accomplished by your existing software.The objective is to provide transparent methodology that you can use in your research and for you to possibly upgrade your approach.

Since we have Autobox on hand it is natural to use it in our data examples. We will be in the Exhibitor’s Area if you have any questions or want to see a demo.

503/19/07

Statistical packages have enormous influence over analysis, especially over that of the less sophisticated user. There is a tendency for the user to do what is readily available in their software.

In preparing material for this presentation, I reviewed a number of web sites and found that university professors were similarly restricted to the software/methodology that their university provided .

703/19/07

Forecasting

Forecasting is difficult, especially about the future. Victor Borge

803/19/07

Using Good Methods Forecasting Becomes Easier For Example: Good Forecast #1!

Epidemiological Forecasting:Comparing the Forecast Accuracies of Different

Forecasting Methodson a

"Difficult" Time Seriesby

Robert A. Yaffee, Ph.D. New York University, New York, N.Y.

Kostas Nikolopoulos, Ph.D. Manchester Business School, Manchester, U.K.

David P. Reilly, Automatic Forecasting Systems, Hatboro,Pa.

Sven F. Crone, Lancaster University, Lancaster, U.K.

Rick J. Douglass, Ph.D. Montana Technical University, Butte, Mt.

Kent D. Wagoner, Ph.D. Ithaca College, Ithaca, NY.

Brian R. Amman, Ph.D. CDC Special Pathogens Branch, Atlanta, Ga.

Tom Ksiazek, CDC Special Pathogens Branch, Atlanta, Ga.

James N. Mills, Ph.D. CDC Special Pathogens Branch, Altanta, Ga.

2007 International Symposium on Forecasting

New York, New York

June 26, 2007 Tuesday 3:30pm Hudson

1003/19/07

The Endogenous Series

Abundance of Peromyscus maniculatus (deer mouse) in the Montana Cascade

MN

Ato

tal

Date

1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005

50

10

01

50

20

02

50

30

0

1303/19/07

Good Forecast #2 ! Banking Application-Duffy

1403/19/07

Prof. Frost is Also a Faculty Member at Southern Methodist University

1503/19/07

Analytics Push-Out The Onset of a Seasonal Pulse

1603/19/07

Before You Dismiss This talk as Boring Banking Stuff !

1903/19/07

Mark is Interested in Species Other Than Mice

2003/19/07

He Has Collected and Makes Readily Available the Historical Weight of Playboy

Bunnies at http://www.mark-frost.com

2103/19/07

As a Professor, Mark knows that Interesting Data Sets Can Motivate

Attentiveness !

A Transfer Function

2903/19/07

Causal Model( ) ( ) ( ) ( )

,( ) ( ) ( ) ( )

( )

( )

( )

t

t t t tt t b t ts

t t t t t

t

t t

t

t

t

L w L L Ly X I e

L d L L L

where

y dependent series

L lagged or led polynomial of

L nonseasonal moving average polynomial

L seasonal moving average polynomial

first

( )

( )

( .)

t

s

t

t

t b

t

difference

seasonal difference

L autoregressive polynomial

L seasonal autoregressive polynomial

X time varying parameters prewhitened and differenced if nec

I computer based automatic intervention d

, , , .)

t

etection and modeling

(outliers, seasonal pulses local trends level shifts etc

e disturbance

3103/19/07

AFS Philosophy

An unexamined life is not worth living.…..

Socrates

An unexamined model is not worth using

Dave R

3203/19/07

The Standard Deviation is Ill-Suited To Detect Unusual Behavior

-5

0

5

10

15

20

25

30

Actual

Upper 2s

Lower 2s

Local Time-trend

Cannot assume independence of the observations

Outliers impact standard deviation and mean

3303/19/07

An Observation at the Mean can be Unusual (Inlier)

3403/19/07

3503/19/07

Errors of Nature, Sports and Monsters

The problem is that you can't catch an outlier without a model (at least a mild one) for your data. Else how would you know that a point violated that model? In fact, the process of growing understanding and finding and examining outliers must be iterative. This isn't a new thought. Bacon, writing in Novum Organum about 400 years ago said: "Errors of Nature, Sports and Monsters correct the understanding in regard to ordinary things, and reveal general forms. For whoever knows the ways of Nature will more easily notice her deviations; and, on the other hand, whoever knows her deviations will more accurately describe her ways."

3703/19/07

Independent Samples

3803/19/07

A More Common Data Set

3903/19/07

Serious Disconnect between the Teaching and Practice of Statistics

99.9% of all Academic presentation of statistical tools REQUIRES independent observations

In time series data, this is clearly not the case

A source for spurious correlation is a common cause acting on the variables. Granger & Newbold, Journal of Econometrics, 1974) pointed out that the misleading inference comes about through applying the regression theory for stationary series to non-stationary series.

4203/19/07

Tendency To Over-Believe Ones Own Eyes

4403/19/07

Visually, we see trend and seasonality

4503/19/07

AND WE EXPECT OUR ANALYTICS TO SUPPORT THIS !

4603/19/07

Bad Analytics Can Support the Bad Eye !

4703/19/07

Select A Model From A List

4803/19/07

Selecting A Model From A List

4903/19/07

An Assumed Model

5003/19/07

GOOD ANALYTICS SEE A LITTLE BIT BETTER (Sometimes Better Than Ones Own Eyes)

5103/19/07

A Level Shift Does Not A Trend Make

5203/19/07

Daily Sales of Bud 6 Pack In a Store in Texas

5303/19/07

5403/19/07

Two Approaches

GTS General To Specific THEORY BASED

start with most general model possibly based upon theory or more frequently based upon the Long Lag Strategy and step-down

STG Specific To General

start with an initial theory-based model or allow the data to suggest an initial model and then step-up and step-down

5503/19/07

GTS General To Specific

Top-Down or Stepdown Elimination:

First fit the model with all possible predictors and all possible lags.

Then sequentially eliminate those predictors that are least significant in a last-in (partial) test conducting a necessity test via t and F tests without verifying that these tests are valid i.e. are the errors Gaussian? A Major Flaw !

5603/19/07

Let SSE1 be the error sums of squares for thecomplete model Y = 0 + 1x1 + 2x2 + 12x1x2

Let SSE2 be the error sums of squares for the reduced model (Y = 0 + 1x1).

Since Model 1 includes more terms than Model 2,Model 1 fits better or No Worse than Model 2.

Hence we must have SSE1 SSE2

The difference, SSE2 - SSE1 is a measure of the

drop in the error sum of squares attributable tothe variables removed from the complete model.

5703/19/07

Define the mean square drop as: MSdrop = (SSE2 - SSE1 ) / (k - g),where k is the number of terms in the complete model (Model 1) and g (< k) is the number of terms in the reduced model (Model 2).

The mean square error for the complete model is: MSE1 = SSE1 / (n-k-1)

To test the hypothesis that the terms left out of the complete model do not contribute significantly to explaining the variability in y we use the following F statistic.

F = MSdrop/MSE1

Reject Ho: Left out parameters = 0 if F > F(k-g,n-k-1,

F-Statistic for Step-Down Models

5803/19/07

To test the hypothesis that the terms left out of the complete model do not contribute significantly to explaining the variability in Y we use the following F statistic.

F = MSdrop/MSE1

WHICH REQUIRES THAT THE MEAN OF THE ERRORS FROM THE COMPLETE MODEL DOES

NOT DIFFER FROM ZERO …EVERYWHERE

F-Statistic for Step-Down Models

5903/19/07

GTS General To Specific

eXXXYYYY

tsttt

rtttt

s

r

....10

....21

10

21

This approach requires, among other things that the mean of the errors (e) is 0.0 everywhere otherwise the F Tests for simplification are not correct as the e’s are not distributed as a central chi-square variable. This violation leads to a downward bias in the F Test as the MSE is larger than it should be leading to a false acceptance of the Null Hypothesis or what Prof. Ord of Georgetown University refers to as “The Alice in Wonderland Test” asserting that all is well !.

Test for Common Factors are conducted in order to simplify the model form and the number of required parameters.

6303/19/07


A Starting Model is used which can be based upon theory or one can simply use the statistical characteristics of the data to suggest “an initial model” as the base point. Residuals from the starting model are used to suggest step-forward augmentation direction culminating in Necessity and Sufficiency Checks.

6403/19/07


• Begin with a Base Model and add structure evident in the noise thus transferring it to the signal and each time new structure (sufficiency

test) is added into the model, check all the other coefficients already in the model with a last-in test to determine if they should continue to be in the model

• Drop any predictor that cannot pass the last-in test. (necessity test)

6703/19/07

If After Fitting a Model

Y(t ) = [W(B)] X(t ) + A(t )

The ACF of the error process A(t ) exhibits structure there are a number of possible remedies

1. Fix the Lag Structure [W(B)] where W(b) = input lag structure reflecting static relationship of Y to X

2 Fix the ARIMA structure [T(B)/P(B)] A(t )

3. Identify and Include Deterministic Series e.g. Pulses, Level Shifts, Seasonal Pulses and/or Local Trends

4. Transform the data in order to achieve constant variance

5. Partition the data in order to locally optimize model form and parameters

6803/19/07

Regression

1. Lag Structure [W(B)]

ARIMA

2. ARIMA structure [T(B)/P(B)] A(t ) to proxy the effect of Unspecified Stochastic Series

Dummy Structure

3. Identification of Interventions to proxy the effect of Unspecified Deterministic Series

…….

4. Transforming the data in order to achieve constant variance of the residuals requires a model to generate these residuals

5. Partition the data in order to locally optimize model form and parameters requires a model to generate these residuals

7203/19/07

6 Permutations To Deal With Model Form

1. Fix Regression First then fix ARIMA then fix Dummy Structure

2. Fix Regression First then fix Dummy Structure then fix ARIMA 3. Fix ARIMA First then fix Regression then fix Dummy Structure 4. Fix ARIMA First then fix Dummy Structure then fix Regression 5. Fix Dummy Structure then fix Regression then fix ARIMA 6. Fix Dummy Structure then fix ARIMA then fix Regression

7303/19/07

BUD 6 PACK FINAL MODEL Optimal Strategy: Fix Regression First then fix ARIMA then fix

Dummy Structure

HOLIDAY EFFECTS

7403/19/07

Final Model

WEEK EFFECTS

7503/19/07

Final Model

PECULIAR DAYS

PULSES

7603/19/07

7703/19/07

You Can Relax Help Is On The Way

( AN AFS SENIOR DEVELOPER AFTER A TOUGH DAY’S PROGRAMMING !)

7803/19/07

Hierarchical Structure

•Qualitative

Judgmental

Analogical

•Quantitative

• Causal Models

Smoothing or Memory Models

Trend Decomposition

7903/19/07

123 tttXXY

Accounts for the timing and form of the impact of the known user-suggested cause series

An Example of Causal Modeling

8003/19/07

321 ]3/1[]3/1[]3/1[ ttttYYYY

Accounts for omitted stochastic cause series

An Example of Memory Modeling

8103/19/07

36162310 ]5.2[]5[.]5[]2[

]5[.]2[

ttttt

t

SPTPL

generallymore

T

Y

Y

Accounts for Dummy Deterministic Series

An Example of Dummy Modeling

8203/19/07

Causal Models

Smoothing or Memory Models

Trend Decomposition

Yt = Causal + Memory + Dummy

Quantitative: Quantitative:

Time Series AnalysisTime Series Analysis

8303/19/07

ttBB

ttDBeXBY

sdiagnostic viadetected is D

process. noise mean white zero a is

and variable,cause suggested-user a is

t

t

t

e

X



Accounts for omitted deterministic series

The Objective

8403/19/07

The Angels that you know and the Devils that you don’t know

Known: User Suggested Dependent Series (Y)

User Suggested Support Series (X)

Unknown: Lag Structure for (X)

Omitted Stochastic Series (S)

Omitted Deterministic Series (D)

8503/19/07

1. Causal Modeling (Known Series)


2. Memory Component (Y’s and e’s)

Unknown: Omitted Stochastic Series (S)

3. Pulse, Level Shift, Seasonal Pulse, Trend

Unknown: Omitted Deterministic Series (D)


8603/19/07

1. Causal Modeling (Known Series)


2. Memory Component (Y’s and e’s)

Unknown: Omitted Stochastic Series (S)

3. Pulse, Level Shift, Seasonal Pulse, Trend

Unknown: Omitted Deterministic Series (D)

Yt = Known Series + Unknown Stochastic + Unknown Deterministic

Yt = Known Series + Previous Values of Y’s and e’s + Dummies


8703/19/07

tttnXBY

structure. omitted ngrepresenti

ableslack varior process noise a is

and , variablescause suggested-user are

t

t

n

X

Response FunctionAccounts for the timing and form of the impact of the known user-suggested cause series

8903/19/07

tBB

tteXBY



t

t

e

X

Response FunctionAccounts for the timing and form of the impact of the known user-suggested cause series

Error ComponentAccounts for omitted stochastic cause variables and/or omitted deterministic series

9003/19/07

GTS does not incorporate structure on the error term thus “MASKING” the effect of the omitted stochastic variables by conveniently using a long-lagged model in the known series

eXXXYYYY

tsttt

rtttt

s

r

....10

....21

10

21

“Prevailing general sentiment among econometricians today is that disturbance serial correlation implies misspecification (of the known X’s) , hence the need to rethink the original specification’s characteristics and form rather than to apply an essentially mechanical correction.” C. Renfro

9103/19/07

PARSIMONY IS IN QUESTION !

eXXXYYYY

tsttt

rtttt

s

r

....10

....21

10

21

“Prevailing general sentiment among econometricians today is that disturbance serial correlation implies misspecification (of the known X’s) , hence the need to rethink the original specification’s characteristics and form rather than to apply an essentially mechanical correction.” C. Renfro

9403/19/07

ttBB

ttDBweXBY

sdiagnostic viadetected is D



t

t

t

e

X



Accounts for omitted deterministic series

The Objective

9503/19/07

tBB

tXY 1

1

The Objective: KNOWN USER SUGGESTED

ttXBY

Accounts for the timing and form of the impact of the known user-suggested cause series .Restated in conventional terms .

9603/19/07

tBB

teY

process. noise mean white zero a is te


The Objective:Memory Structure

9803/19/07

ttDBwY

SeriesDummy a is Dt

Accounts for deterministic series

The Objective:Dummy Structure

10003/19/07

What We Don’t Know(1)

Which of the specified input series have an effect and their temporal form i.e. contemporaneous, lead and/or lag of those effects. In other words what lags of the known inputs are needed to render the final model errors to be uncorrelated with all omitted lags of the known input series

10103/19/07


The effect of unusual activity in the mean of the output series due to unspecified stochastic series. In other words what lags of either Y or the error process are sufficient to render the final model errors to be uncorrelated on itself .

10203/19/07

The Omitted Stochastic Series S

)(

)]([

)(

tBB

tt

ttBB

tt

tBB

t

tttt

aXB

eeeBXB

eeS

eSBXB

YY

Y

10303/19/07


The effect of unusual activity in the mean of the output series due to unspecified deterministic series. In other words what transformation, if any is necessary to render the mean of the final model errors to be homogenous compensating for the effect of unspecified deterministic series.

10403/19/07

The Omitted Stochastic Series D

][ tBB

tt

tttt

aXB

eDBXB

YY

10503/19/07


What transformation, if any is necessary to render the variance of the final model errors to be homogenous.

10603/19/07


What transformation, if any is necessary to render the coefficients of the final model to be locally constant. In other words how many observations should be used as the basis for model identification and parameter estimation as parameters may have varied/changed over time. In our experience, Threshold Autoregressive Models (TAR) or STAR Models have not been found to be effective due in part to inadequate model identification preceding the TAR process.

DOING HARD TIME SERIESHARD VERSION

Transforming Time Series (Detecting and Remedies

for Structural Breaks)

to render the distribution of the errors homogeneous

10903/19/07

DRUGS LIKE TRANSFORMATIONS

CAN BE GOOD AND BAD FOR YOU

11003/19/07

tBB

ttnXBY

Assumptions:W(b) is a set of constantsE(n)=0V(n)= X is a matrix of input series

Generalized Linear Model

11103/19/07

Generalized Linear Model Assuming Uncorrelated Residuals and Possible

Variance and Parameter Changes

tBB

tteXBY

Assumptions:

1. E(e)=0

2. V(e)= 2i i=1,2,…

3. X is a matrix of known input series possibly augmented with D’S

11203/19/07

Errors should display the same spread regardless of the value of the predicted response and for all subsets of time.

1. Zero expectation: E(ei) = 0 for all i.

2. Constant variance: V(ei) = s2e for all

i.

Is NOT Automatically satisfied because we include a constant term. What is guaranteed is that the overall mean of the residuals is 0.0 not necessarily the local mean. If one includes the empirically identified D Series then this assumption holds.

Tools for Assumption Checking

11403/19/07

SCEDASTICITY

In the OLS model, we assume that the variance of the error term is constant (homoscedasticity)

niuE i ,,2,1 )( 22

22 )( iiuE

But, if we have heteroscedasticity, then

11603/19/07

Generalized Linear Model Assuming

Uncorrelated Residuals

uXY

Assumptions:ions:

1. E(u)=0

2. V(u)=2I

3. X is a matrix of known input series

11703/19/07

Generalized Linear Model Assuming Uncorrelated Residuals and Possible

Variance Changes and Parameter Changes

uXY there are J distinct groups thus J sets of B due

to Parameter Changes

Assumptions:ions:

1. E(u)=0

2. V(e)= 2i i=1,2,…,n

3. X is a matrix of known input series

11903/19/07

Assumptions

Independence: Corr(ei,ej) = 0 for all i<> j.

mean error constant: E(ei) = 0 for all i.

variance constant V(ei) = for all i.

parameters constant for all all i in each of the j groups.

12003/19/07

Stationarity = Constancy

12103/19/07

Statisticians are not “Wordsmiths”Statisticians use the word “transformation” in many contexts

Y=log(z) to remedy expected value and variance dependency

Y=(1-b)z to remedy autoregressive dependency

Y=(1/2)z to remedy structural variance heterogeneity

Separating data into homogenous regimes due to model/parameter changes

12203/19/07

Transformations

1. To render the MEAN of the residuals constant AND uncorrelated with each other.

2. To render the VARIANCE of the residuals constant

3. To render the COEFFICIENTS of the model constant

12303/19/07

Mean Error Constant: E(ei) = 0 for all i.

Symptoms: Anomalies in the errors

Remedy: Pulse;

Remedy: Level shift;

Remedy: Seasonal pulse;

Remedy: Time trend;

12403/19/07

Independence: Corr(ei,ej)=0 for all i<> j.

Symptoms : ACF shows structure

Remedy: Arima or Lag Structure For Known X’s

12503/19/07

With the use of the Autocorrelation Function (with autocorrelations on the y axis and the different time lags on the x axis) it is possible to detect autocorrelated structure requiring remedial action.

12603/19/07

Variance Constant V(ei) = for all i.

Symptoms: Local variances differ

Remedy: Structural breaks; Tsay

Remedy: Level dependency; Box-Cox

Remedy: Stochastic process; Garch

12703/19/07

Parameters Constant Over All Sub-Groups

Symptoms: Local parameters differ

Remedy: Structural breaks; Chow Test

12803/19/07

12903/19/07

Park Test

Glejser Test

White Test

Breusch-Pagan/Godfrey Test

Goldfeld-Quandt Test

Testing for Heteroscedasticity Without Explicit Remedial Action

13003/19/07

Outliers/Inliers

Model misspecification

Incorrect data transformation

Incorrect combining of data over time

Reasons for Heteroscedasticity

13103/19/07

When Our Assumptions Hold

13203/19/07

Eight Examples of Possible Violations

Mean of the Errors Changes: (Taio/Box/Chang)

1. A 1 period change in Level ( i.e a Pulse )

2. A contiguous multi-period change in Level ( Intercept Change)

3. Systematically with the Season (Seasonal Pulse)

4. A change in Trend

Variance of the Errors Changes:

5. At Discrete Points in Time (Tsay Test)

6. Linked to the Expected Value (Box-Cox)

7. Can be described as an ARMA Model (Garch)

8. Due to Parameter Changes (Chow, Tong/Tar Model)

13303/19/07

The Family of Dummy Variables

Pulse Dt = 0,0,0,0,1,0,0,0

Level Shift Dt = 0,0,0,0,1,1,1,1,1,,,,

Seasonal Pulse Dt = 0,1,0,0,0,1,0,0,0,1,,,,,

Time Trend Dt = 0,0,0,0,1,2,3,4,5,,,,, Note that a Pulse is the difference of a Level Shift

Note that a Level Shift is the difference of a Time Trend

13403/19/07

Example of a Pulse Intervention

Dt represents a pulse or a one-time intervention at time period 6.

Dt = 0,0,0,0,0,1,0,0,0

13503/19/07

Modeling Interventions - Level Shift

If there was a level shift and not a pulse then it is clear that a single pulse model would be inadequate thus

0,,,,,,,,,,,,,i-1,i,,,,,,,,,,,,,,,,T

Dt = ,0,0,0,1,1,1,1,1,1,,,,,,,T

or Dt = 0 t < i

Dt = 1 t > i-1

13603/19/07

Traditional Level Shift

13703/19/07

Another Kind Of Level Shift

13803/19/07

Modeling Interventions - Seasonal Pulses

There are other kinds of pulses that might need to be considered otherwise our model may be insufficient.

For example, December sales are high.

D D D

Zt = 0 i <>12,24,36,48,60

Zt = 1 i = 12,24,36,48,60

13903/19/07

Modeling Interventions – Local Time Trend

The fourth and final form of a deterministic variable is the the local time trend. For example,

1………. i-1, I,,, T

Dt = 0 t < i Dt = 1 (t-(i-1)) * 1 >= i

Dt = 0,0,0,0,0,0,1,2,3,4,5,,,,,

14003/19/07

In Far Away Places !

Some researchers are still using a variable called the COUNTING VARIABLE which assumes that there is one trend and that it has a common effect over all time. This is anachronistic *.

Dt = 1,2,3,4,5,,,,,T

14103/19/07

The Trend Poem

attributed to Sir Francis Cairncross

A Trend is a Trend is a Trend

But the question is

Will it bend?

Will it alter its course

through some unforeseen force

And come

to a premature end?

14203/19/07

When Our Assumptions Hold

14303/19/07

Can You Visually Detect The Violation and Suggest The Remedy

14403/19/07

When Our Assumptions Fail:

14503/19/07

When Our Assumptions Fail: Pulse Interventions Effects Mean of the Errors

14603/19/07


14703/19/07

When Our Assumptions Fail: Level Shift Intervention Effects Mean of the Errors

14803/19/07

A Level Shift In A Trended Series

14903/19/07

Random vs. Level Shift Interventions

15003/19/07

Level Shift

15103/19/07

CASE 3

15203/19/07


15303/19/07

When Our Assumptions Fail: Seasonal Pulse Interventions Effects Mean of the Errors

15403/19/07

Random vs. Seasonal Pulse

15703/19/07

Analytics Push-Out A Possible Structural Break

15803/19/07

CASE 4

15903/19/07

Random vs. Time Trended Residuals

16103/19/07

When Our Assumptions Fail: Original Series Exhibits Two Trends

16203/19/07

User Uses One Trend

16303/19/07

16403/19/07

One Trend

16503/19/07

When Our Assumptions Fail: Trending Residuals : First 100 of 300

16603/19/07

Residuals From the Two-Trended Model

16703/19/07

16803/19/07

16903/19/07

Daily German Telecom Revenue

17003/19/07

Two Trend Model with Daily Series, Holiday Series

17103/19/07

The Residuals

17303/19/07

RESIDUALS FROM AN INADEQUATE MODEL

17403/19/07

The VARIANCE of the errors may CHANGE over time

At Discrete Points in Time

Based Upon Level of the Series

Based Upon A Stochastic Model

Based Upon a Change in Model Parameters

17503/19/07

CASE 5

17603/19/07

When Our Assumptions Fail

17703/19/07

When Our Assumptions Fail: Break-Point: Suggesting Change in Variance

17803/19/07

WEIGHTED LEAST SQUARES

n

P

100

01

0

001

2

1

17903/19/07

WEIGHTED LEAST SQUARES

***22

*11

*

221

1

iKiKiii

i

i

i

KiK

i

i

ii

i

uxxxy

uxxy

18003/19/07

Tsay Studied the Daily IBM Series

18103/19/07

A Reasonable ARIMA Model Incorporating Pulses

18203/19/07

A Reasonable Residual ACF

18303/19/07

Identification of Variance Break Points

18403/19/07

The Weights Needed to Stabilize The Variance

18503/19/07

18603/19/07

18703/19/07

Random vs. Break-Point Change in Variance

18803/19/07

CASE 6

18903/19/07

When Our Assumptions Fail: Level Dependent: Suggesting Systematic Change in Variance

19003/19/07

Remedy Via Box-Cox Suggesting Logarithms

19103/19/07

Upwards Trending Actuals (Y=3+2*i)

19203/19/07

Code To Create A Linear Dependency Between the Variance of the Errors and the Level of The Series

19303/19/07

Random vs. Level Dependent Variance

19403/19/07

CASE 7

19503/19/07

Stochastic Variance: Suggesting Systematic Change in Variance Caused By A Random Walk Model in (Errors)**2

19603/19/07

Code To Create A Set of Errors Whose Squares Follow A Random Walk Model

19703/19/07

Random vs. Stochastic Variance

19803/19/07

CASE 8

19903/19/07

Auto-Correlative Structure Changes Over Time Suggesting Parameter Changes Over

Time (ACTUALS)

20003/19/07

Random vs. Non-Constant Parameter Case

20103/19/07

When Our Assumptions Fail: Auto-correlative Structure Changes

Over Time Suggesting Parameter Changes Over Time

20203/19/07

Based On All 300 Observations

20303/19/07

Based On All 300 Observations

20403/19/07

Local Estimation Suggests Transient Parameters 1-176 Versus 177-300 Provides Maximum Contrast

20503/19/07

Local Estimation Suggests Transient Parameters

20603/19/07

Before and After

20703/19/07

Final Model

20803/19/07

Correlation of Residuals

20903/19/07

Residuals From Final Model

21003/19/07

STORIES TO TELL !

1. THE AIRLINE SERIES

2. THE YAFFE SERIES

3. SHOPPERS AND THE UNUSUAL

4. USING TOO MANY DATA POINTS

21103/19/07

THE INFAMOUS AIRLINE SERIES

21203/19/07

The Airline Series received a lot of attention initially by R.G. Brown and then by Box and Jenkins. It was modeled using a logarithmic transform as conventional wisdom suggested increasing variability with increasing level.

21303/19/07

WHICH COMES FIRST THE CHICKEN OR THE EGG ?

21403/19/07

WHICH COMES FIRST THE MODEL OR THE TRANSFORM ?

21503/19/07

21603/19/07

21703/19/07

21803/19/07

Visual “proof” of the need to deal with non-stationary variance

21903/19/07

22003/19/07

22103/19/07

Adding a seasonal differencing

22203/19/07


22303/19/07

LOCAL VARIANCE & LOCAL MEAN

22403/19/07

NORMALIZED SCATTER PLOT

22503/19/07

LOCAL VARIANCE AS A FUNCTION OF LOCAL MEAN

22603/19/07


Untreated one could incorrectly conclude the variance of the errors was linked with higher levels of Y. This spurious conclusion was reached by the Box-Cox Test which responded to higher variance of the residuals at the high end of Y but not elsewhere.

22703/19/07

22803/19/07

22903/19/07

No Evidence of Non-Constant Variance

23003/19/07

Implementing a Test For Parameter Changes at Point 92

23303/19/07

THE YAFFE EXAMPLE

23403/19/07

An Example of STG

23503/19/07

The Output Series

23603/19/07

The Input Series

23703/19/07

Output

Input

23803/19/07

A Simple OLS Model

23903/19/07

Residuals From OLS

24003/19/07

Augmentations

24103/19/07

Residuals From Augmented Model

24203/19/07

Actual and Forecasts From Augmented Model

24303/19/07

An Example: Shoppers Are Creatures of Habit

24403/19/07

SHOPPERS

24503/19/07

Where Are The Anomalies ?

24703/19/07

24803/19/07

The Anomalies are Discovered Which Leads To The Question of Why ?

24903/19/07

Final Model Optimal Strategy: Fix Regression First then fix ARIMA then fix Dummy Structure

25003/19/07

Final Model

WEEK EFFECTS

25103/19/07

Final Model

PECULIAR DAYS

PULSES

25203/19/07

The Residuals Appear To Be Free Of Structure

25303/19/07

Which Was The Objective All Along !

25403/19/07

Statisticians Are Noise Makers !

25503/19/07

Forecasts With Confidence Limits

25603/19/07

Using Too Many Data Points

In a 1973 JRSS paper Chatfield and Prothero reported on a landmark case study (n=83) which raised serious questions about the idea that data transformations were a panacea.

Researchers at that time were strongly suggesting very powerful and potentially dangerous power transformations to render the error process with constant variance.

Their model/data clearly had a violation (symptom) of the constancy of variance assumption but the suggested cure (cause) of taking cube roots or logarithms was correctly deemed inadequate by the authors.

We have taken this data set and have found that perhaps a more plausible explanation is that the parameters had changed over time. Thus the symptom had more than one assignable cause.

25703/19/07

25803/19/07

A Reasonable Model

25903/19/07

Searching For Optimal Breakpoint

26003/19/07

Formal F test Regarding Homogeneity of Parameters

26103/19/07

Final Model Based Upon the Last 33 Observations

26203/19/07

26303/19/07

Residuals From Final Model

26403/19/07

Actual and Forecasts

26503/19/07

Actuals, Fitted Values and Forecasts

26603/19/07

Fit and Forecast

26703/19/07

Automatic Forecasting Systems, Inc. (AFS)

Phone: 215-675-0652email: [email protected] Site: www.autobox.com

Date post:	21-Jan-2016
Category:	Documents
Upload:	gabriel-howard
View:	213 times
Download:	0 times

A Bottoms-Up Approach to Time Series Analysis Prepared for: 27th International Symposium on...

Documents