Date post: | 21-Jan-2016 |
Category: |
Documents |
Upload: | gabriel-howard |
View: | 213 times |
Download: | 0 times |
A Bottoms-Up Approach to
Time Series Analysis
Prepared for: 27th International Symposium on Forecasting
June 24, 2007
New York City, N.Y.
David P. Reilly
Automatic Forecasting Systems Inc.
203/19/07
Automatic Forecasting Systems, Inc. (AFS)
Phone: 215-675-0652email: [email protected] Site: www.autobox.com
Forecasting History is Always Easier Than Forecasting The Future
403/19/07
There are many good pieces of software on the market and a lot of what is demonstrated here can be accomplished by your existing software.The objective is to provide transparent methodology that you can use in your research and for you to possibly upgrade your approach.
Since we have Autobox on hand it is natural to use it in our data examples. We will be in the Exhibitor’s Area if you have any questions or want to see a demo.
503/19/07
Statistical packages have enormous influence over analysis, especially over that of the less sophisticated user. There is a tendency for the user to do what is readily available in their software.
In preparing material for this presentation, I reviewed a number of web sites and found that university professors were similarly restricted to the software/methodology that their university provided .
703/19/07
Forecasting
Forecasting is difficult, especially about the future. Victor Borge
803/19/07
Using Good Methods Forecasting Becomes Easier For Example: Good Forecast #1!
Epidemiological Forecasting:Comparing the Forecast Accuracies of Different
Forecasting Methodson a
"Difficult" Time Seriesby
Robert A. Yaffee, Ph.D. New York University, New York, N.Y.
Kostas Nikolopoulos, Ph.D. Manchester Business School, Manchester, U.K.
David P. Reilly, Automatic Forecasting Systems, Hatboro,Pa.
Sven F. Crone, Lancaster University, Lancaster, U.K.
Rick J. Douglass, Ph.D. Montana Technical University, Butte, Mt.
Kent D. Wagoner, Ph.D. Ithaca College, Ithaca, NY.
Brian R. Amman, Ph.D. CDC Special Pathogens Branch, Atlanta, Ga.
Tom Ksiazek, CDC Special Pathogens Branch, Atlanta, Ga.
James N. Mills, Ph.D. CDC Special Pathogens Branch, Altanta, Ga.
2007 International Symposium on Forecasting
New York, New York
June 26, 2007 Tuesday 3:30pm Hudson
1003/19/07
The Endogenous Series
Abundance of Peromyscus maniculatus (deer mouse) in the Montana Cascade
MN
Ato
tal
Date
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005
50
10
01
50
20
02
50
30
0
1303/19/07
Good Forecast #2 ! Banking Application-Duffy
1403/19/07
Prof. Frost is Also a Faculty Member at Southern Methodist University
1503/19/07
Analytics Push-Out The Onset of a Seasonal Pulse
1603/19/07
Before You Dismiss This talk as Boring Banking Stuff !
1903/19/07
Mark is Interested in Species Other Than Mice
2003/19/07
He Has Collected and Makes Readily Available the Historical Weight of Playboy
Bunnies at http://www.mark-frost.com
2103/19/07
As a Professor, Mark knows that Interesting Data Sets Can Motivate
Attentiveness !
A Transfer Function
2903/19/07
Causal Model( ) ( ) ( ) ( )
,( ) ( ) ( ) ( )
( )
( )
( )
t
t t t tt t b t ts
t t t t t
t
t t
t
t
t
L w L L Ly X I e
L d L L L
where
y dependent series
L lagged or led polynomial of
L nonseasonal moving average polynomial
L seasonal moving average polynomial
first
( )
( )
( .)
t
s
t
t
t b
t
difference
seasonal difference
L autoregressive polynomial
L seasonal autoregressive polynomial
X time varying parameters prewhitened and differenced if nec
I computer based automatic intervention d
, , , .)
t
etection and modeling
(outliers, seasonal pulses local trends level shifts etc
e disturbance
3103/19/07
AFS Philosophy
An unexamined life is not worth living.…..
Socrates
An unexamined model is not worth using
Dave R
3203/19/07
The Standard Deviation is Ill-Suited To Detect Unusual Behavior
-5
0
5
10
15
20
25
30
Actual
Upper 2s
Lower 2s
Local Time-trend
Cannot assume independence of the observations
Outliers impact standard deviation and mean
3303/19/07
An Observation at the Mean can be Unusual (Inlier)
3403/19/07
3503/19/07
Errors of Nature, Sports and Monsters
The problem is that you can't catch an outlier without a model (at least a mild one) for your data. Else how would you know that a point violated that model? In fact, the process of growing understanding and finding and examining outliers must be iterative. This isn't a new thought. Bacon, writing in Novum Organum about 400 years ago said: "Errors of Nature, Sports and Monsters correct the understanding in regard to ordinary things, and reveal general forms. For whoever knows the ways of Nature will more easily notice her deviations; and, on the other hand, whoever knows her deviations will more accurately describe her ways."
3703/19/07
Independent Samples
3803/19/07
A More Common Data Set
3903/19/07
Serious Disconnect between the Teaching and Practice of Statistics
99.9% of all Academic presentation of statistical tools REQUIRES independent observations
In time series data, this is clearly not the case
A source for spurious correlation is a common cause acting on the variables. Granger & Newbold, Journal of Econometrics, 1974) pointed out that the misleading inference comes about through applying the regression theory for stationary series to non-stationary series.
4203/19/07
Tendency To Over-Believe Ones Own Eyes
4403/19/07
Visually, we see trend and seasonality
4503/19/07
AND WE EXPECT OUR ANALYTICS TO SUPPORT THIS !
4603/19/07
Bad Analytics Can Support the Bad Eye !
4703/19/07
Select A Model From A List
4803/19/07
Selecting A Model From A List
4903/19/07
An Assumed Model
5003/19/07
GOOD ANALYTICS SEE A LITTLE BIT BETTER (Sometimes Better Than Ones Own Eyes)
5103/19/07
A Level Shift Does Not A Trend Make
5203/19/07
Daily Sales of Bud 6 Pack In a Store in Texas
5303/19/07
5403/19/07
Two Approaches
GTS General To Specific THEORY BASED
start with most general model possibly based upon theory or more frequently based upon the Long Lag Strategy and step-down
STG Specific To General
start with an initial theory-based model or allow the data to suggest an initial model and then step-up and step-down
5503/19/07
GTS General To Specific
Top-Down or Stepdown Elimination:
First fit the model with all possible predictors and all possible lags.
Then sequentially eliminate those predictors that are least significant in a last-in (partial) test conducting a necessity test via t and F tests without verifying that these tests are valid i.e. are the errors Gaussian? A Major Flaw !
5603/19/07
Let SSE1 be the error sums of squares for thecomplete model Y = 0 + 1x1 + 2x2 + 12x1x2
Let SSE2 be the error sums of squares for the reduced model (Y = 0 + 1x1).
Since Model 1 includes more terms than Model 2,Model 1 fits better or No Worse than Model 2.
Hence we must have SSE1 SSE2
The difference, SSE2 - SSE1 is a measure of the
drop in the error sum of squares attributable tothe variables removed from the complete model.
5703/19/07
Define the mean square drop as: MSdrop = (SSE2 - SSE1 ) / (k - g),where k is the number of terms in the complete model (Model 1) and g (< k) is the number of terms in the reduced model (Model 2).
The mean square error for the complete model is: MSE1 = SSE1 / (n-k-1)
To test the hypothesis that the terms left out of the complete model do not contribute significantly to explaining the variability in y we use the following F statistic.
F = MSdrop/MSE1
Reject Ho: Left out parameters = 0 if F > F(k-g,n-k-1,
F-Statistic for Step-Down Models
5803/19/07
To test the hypothesis that the terms left out of the complete model do not contribute significantly to explaining the variability in Y we use the following F statistic.
F = MSdrop/MSE1
WHICH REQUIRES THAT THE MEAN OF THE ERRORS FROM THE COMPLETE MODEL DOES
NOT DIFFER FROM ZERO …EVERYWHERE
F-Statistic for Step-Down Models
5903/19/07
GTS General To Specific
eXXXYYYY
tsttt
rtttt
s
r
....10
....21
10
21
This approach requires, among other things that the mean of the errors (e) is 0.0 everywhere otherwise the F Tests for simplification are not correct as the e’s are not distributed as a central chi-square variable. This violation leads to a downward bias in the F Test as the MSE is larger than it should be leading to a false acceptance of the Null Hypothesis or what Prof. Ord of Georgetown University refers to as “The Alice in Wonderland Test” asserting that all is well !.
Test for Common Factors are conducted in order to simplify the model form and the number of required parameters.
6303/19/07
STG Specific To General
A Starting Model is used which can be based upon theory or one can simply use the statistical characteristics of the data to suggest “an initial model” as the base point. Residuals from the starting model are used to suggest step-forward augmentation direction culminating in Necessity and Sufficiency Checks.
6403/19/07
STG Specific To General
• Begin with a Base Model and add structure evident in the noise thus transferring it to the signal and each time new structure (sufficiency
test) is added into the model, check all the other coefficients already in the model with a last-in test to determine if they should continue to be in the model
• Drop any predictor that cannot pass the last-in test. (necessity test)
6703/19/07
If After Fitting a Model
Y(t ) = [W(B)] X(t ) + A(t )
The ACF of the error process A(t ) exhibits structure there are a number of possible remedies
1. Fix the Lag Structure [W(B)] where W(b) = input lag structure reflecting static relationship of Y to X
2 Fix the ARIMA structure [T(B)/P(B)] A(t )
3. Identify and Include Deterministic Series e.g. Pulses, Level Shifts, Seasonal Pulses and/or Local Trends
4. Transform the data in order to achieve constant variance
5. Partition the data in order to locally optimize model form and parameters
6803/19/07
Regression
1. Lag Structure [W(B)]
ARIMA
2. ARIMA structure [T(B)/P(B)] A(t ) to proxy the effect of Unspecified Stochastic Series
Dummy Structure
3. Identification of Interventions to proxy the effect of Unspecified Deterministic Series
…….
4. Transforming the data in order to achieve constant variance of the residuals requires a model to generate these residuals
5. Partition the data in order to locally optimize model form and parameters requires a model to generate these residuals
7203/19/07
6 Permutations To Deal With Model Form
1. Fix Regression First then fix ARIMA then fix Dummy Structure
2. Fix Regression First then fix Dummy Structure then fix ARIMA 3. Fix ARIMA First then fix Regression then fix Dummy Structure 4. Fix ARIMA First then fix Dummy Structure then fix Regression 5. Fix Dummy Structure then fix Regression then fix ARIMA 6. Fix Dummy Structure then fix ARIMA then fix Regression
7303/19/07
BUD 6 PACK FINAL MODEL Optimal Strategy: Fix Regression First then fix ARIMA then fix
Dummy Structure
HOLIDAY EFFECTS
7403/19/07
Final Model
WEEK EFFECTS
7503/19/07
Final Model
PECULIAR DAYS
PULSES
7603/19/07
7703/19/07
You Can Relax Help Is On The Way
( AN AFS SENIOR DEVELOPER AFTER A TOUGH DAY’S PROGRAMMING !)
7803/19/07
Hierarchical Structure
•Qualitative
Judgmental
Analogical
•Quantitative
• Causal Models
Smoothing or Memory Models
Trend Decomposition
7903/19/07
123 tttXXY
Accounts for the timing and form of the impact of the known user-suggested cause series
An Example of Causal Modeling
8003/19/07
321 ]3/1[]3/1[]3/1[ ttttYYYY
Accounts for omitted stochastic cause series
An Example of Memory Modeling
8103/19/07
36162310 ]5.2[]5[.]5[]2[
]5[.]2[
ttttt
t
SPTPL
generallymore
T
Y
Y
Accounts for Dummy Deterministic Series
An Example of Dummy Modeling
8203/19/07
Causal Models
Smoothing or Memory Models
Trend Decomposition
Yt = Causal + Memory + Dummy
Quantitative: Quantitative:
Time Series AnalysisTime Series Analysis
8303/19/07
ttBB
ttDBeXBY
sdiagnostic viadetected is D
process. noise mean white zero a is
and variable,cause suggested-user a is
t
t
t
e
X
Accounts for omitted stochastic cause series
Accounts for the timing and form of the impact of the known user-suggested cause series
Accounts for omitted deterministic series
The Objective
8403/19/07
The Angels that you know and the Devils that you don’t know
Known: User Suggested Dependent Series (Y)
User Suggested Support Series (X)
Unknown: Lag Structure for (X)
Omitted Stochastic Series (S)
Omitted Deterministic Series (D)
8503/19/07
1. Causal Modeling (Known Series)
User Suggested Support Series (X)
2. Memory Component (Y’s and e’s)
Unknown: Omitted Stochastic Series (S)
3. Pulse, Level Shift, Seasonal Pulse, Trend
Unknown: Omitted Deterministic Series (D)
STG Specific To General
8603/19/07
1. Causal Modeling (Known Series)
User Suggested Support Series (X)
2. Memory Component (Y’s and e’s)
Unknown: Omitted Stochastic Series (S)
3. Pulse, Level Shift, Seasonal Pulse, Trend
Unknown: Omitted Deterministic Series (D)
Yt = Known Series + Unknown Stochastic + Unknown Deterministic
Yt = Known Series + Previous Values of Y’s and e’s + Dummies
STG Specific To General
8703/19/07
tttnXBY
structure. omitted ngrepresenti
ableslack varior process noise a is
and , variablescause suggested-user are
t
t
n
X
Response FunctionAccounts for the timing and form of the impact of the known user-suggested cause series
8903/19/07
tBB
tteXBY
process. noise mean white zero a is
and variable,cause suggested-user a is
t
t
e
X
Response FunctionAccounts for the timing and form of the impact of the known user-suggested cause series
Error ComponentAccounts for omitted stochastic cause variables and/or omitted deterministic series
9003/19/07
GTS does not incorporate structure on the error term thus “MASKING” the effect of the omitted stochastic variables by conveniently using a long-lagged model in the known series
eXXXYYYY
tsttt
rtttt
s
r
....10
....21
10
21
“Prevailing general sentiment among econometricians today is that disturbance serial correlation implies misspecification (of the known X’s) , hence the need to rethink the original specification’s characteristics and form rather than to apply an essentially mechanical correction.” C. Renfro
9103/19/07
PARSIMONY IS IN QUESTION !
eXXXYYYY
tsttt
rtttt
s
r
....10
....21
10
21
“Prevailing general sentiment among econometricians today is that disturbance serial correlation implies misspecification (of the known X’s) , hence the need to rethink the original specification’s characteristics and form rather than to apply an essentially mechanical correction.” C. Renfro
9403/19/07
ttBB
ttDBweXBY
sdiagnostic viadetected is D
process. noise mean white zero a is
and variable,cause suggested-user a is
t
t
t
e
X
Accounts for omitted stochastic cause series
Accounts for the timing and form of the impact of the known user-suggested cause series
Accounts for omitted deterministic series
The Objective
9503/19/07
tBB
tXY 1
1
The Objective: KNOWN USER SUGGESTED
ttXBY
Accounts for the timing and form of the impact of the known user-suggested cause series .Restated in conventional terms .
9603/19/07
tBB
teY
process. noise mean white zero a is te
Accounts for omitted stochastic cause series
The Objective:Memory Structure
9803/19/07
ttDBwY
SeriesDummy a is Dt
Accounts for deterministic series
The Objective:Dummy Structure
10003/19/07
What We Don’t Know(1)
Which of the specified input series have an effect and their temporal form i.e. contemporaneous, lead and/or lag of those effects. In other words what lags of the known inputs are needed to render the final model errors to be uncorrelated with all omitted lags of the known input series
10103/19/07
What We Don’t Know(2)
The effect of unusual activity in the mean of the output series due to unspecified stochastic series. In other words what lags of either Y or the error process are sufficient to render the final model errors to be uncorrelated on itself .
10203/19/07
The Omitted Stochastic Series S
)(
)]([
)(
tBB
tt
ttBB
tt
tBB
t
tttt
aXB
eeeBXB
eeS
eSBXB
YY
Y
10303/19/07
What We Don’t Know(3)
The effect of unusual activity in the mean of the output series due to unspecified deterministic series. In other words what transformation, if any is necessary to render the mean of the final model errors to be homogenous compensating for the effect of unspecified deterministic series.
10403/19/07
The Omitted Stochastic Series D
][ tBB
tt
tttt
aXB
eDBXB
YY
10503/19/07
What We Don’t Know(4)
What transformation, if any is necessary to render the variance of the final model errors to be homogenous.
10603/19/07
What We Don’t Know(5)
What transformation, if any is necessary to render the coefficients of the final model to be locally constant. In other words how many observations should be used as the basis for model identification and parameter estimation as parameters may have varied/changed over time. In our experience, Threshold Autoregressive Models (TAR) or STAR Models have not been found to be effective due in part to inadequate model identification preceding the TAR process.
DOING HARD TIME SERIESHARD VERSION
Transforming Time Series (Detecting and Remedies
for Structural Breaks)
to render the distribution of the errors homogeneous
10903/19/07
DRUGS LIKE TRANSFORMATIONS
CAN BE GOOD AND BAD FOR YOU
11003/19/07
tBB
ttnXBY
Assumptions:W(b) is a set of constantsE(n)=0V(n)= X is a matrix of input series
Generalized Linear Model
11103/19/07
Generalized Linear Model Assuming Uncorrelated Residuals and Possible
Variance and Parameter Changes
tBB
tteXBY
Assumptions:
1. E(e)=0
2. V(e)= 2i i=1,2,…
3. X is a matrix of known input series possibly augmented with D’S
11203/19/07
Errors should display the same spread regardless of the value of the predicted response and for all subsets of time.
1. Zero expectation: E(ei) = 0 for all i.
2. Constant variance: V(ei) = s2e for all
i.
Is NOT Automatically satisfied because we include a constant term. What is guaranteed is that the overall mean of the residuals is 0.0 not necessarily the local mean. If one includes the empirically identified D Series then this assumption holds.
Tools for Assumption Checking
11403/19/07
SCEDASTICITY
In the OLS model, we assume that the variance of the error term is constant (homoscedasticity)
niuE i ,,2,1 )( 22
22 )( iiuE
But, if we have heteroscedasticity, then
11603/19/07
Generalized Linear Model Assuming
Uncorrelated Residuals
uXY
Assumptions:ions:
1. E(u)=0
2. V(u)=2I
3. X is a matrix of known input series
11703/19/07
Generalized Linear Model Assuming Uncorrelated Residuals and Possible
Variance Changes and Parameter Changes
uXY there are J distinct groups thus J sets of B due
to Parameter Changes
Assumptions:ions:
1. E(u)=0
2. V(e)= 2i i=1,2,…,n
3. X is a matrix of known input series
11903/19/07
Assumptions
Independence: Corr(ei,ej) = 0 for all i<> j.
mean error constant: E(ei) = 0 for all i.
variance constant V(ei) = for all i.
parameters constant for all all i in each of the j groups.
12003/19/07
Stationarity = Constancy
12103/19/07
Statisticians are not “Wordsmiths”Statisticians use the word “transformation” in many contexts
Y=log(z) to remedy expected value and variance dependency
Y=(1-b)z to remedy autoregressive dependency
Y=(1/2)z to remedy structural variance heterogeneity
Separating data into homogenous regimes due to model/parameter changes
12203/19/07
Transformations
1. To render the MEAN of the residuals constant AND uncorrelated with each other.
2. To render the VARIANCE of the residuals constant
3. To render the COEFFICIENTS of the model constant
12303/19/07
Mean Error Constant: E(ei) = 0 for all i.
Symptoms: Anomalies in the errors
Remedy: Pulse;
Remedy: Level shift;
Remedy: Seasonal pulse;
Remedy: Time trend;
12403/19/07
Independence: Corr(ei,ej)=0 for all i<> j.
Symptoms : ACF shows structure
Remedy: Arima or Lag Structure For Known X’s
12503/19/07
With the use of the Autocorrelation Function (with autocorrelations on the y axis and the different time lags on the x axis) it is possible to detect autocorrelated structure requiring remedial action.
12603/19/07
Variance Constant V(ei) = for all i.
Symptoms: Local variances differ
Remedy: Structural breaks; Tsay
Remedy: Level dependency; Box-Cox
Remedy: Stochastic process; Garch
12703/19/07
Parameters Constant Over All Sub-Groups
Symptoms: Local parameters differ
Remedy: Structural breaks; Chow Test
12803/19/07
12903/19/07
Park Test
Glejser Test
White Test
Breusch-Pagan/Godfrey Test
Goldfeld-Quandt Test
Testing for Heteroscedasticity Without Explicit Remedial Action
13003/19/07
Outliers/Inliers
Model misspecification
Incorrect data transformation
Incorrect combining of data over time
Reasons for Heteroscedasticity
13103/19/07
When Our Assumptions Hold
13203/19/07
Eight Examples of Possible Violations
Mean of the Errors Changes: (Taio/Box/Chang)
1. A 1 period change in Level ( i.e a Pulse )
2. A contiguous multi-period change in Level ( Intercept Change)
3. Systematically with the Season (Seasonal Pulse)
4. A change in Trend
Variance of the Errors Changes:
5. At Discrete Points in Time (Tsay Test)
6. Linked to the Expected Value (Box-Cox)
7. Can be described as an ARMA Model (Garch)
8. Due to Parameter Changes (Chow, Tong/Tar Model)
13303/19/07
The Family of Dummy Variables
Pulse Dt = 0,0,0,0,1,0,0,0
Level Shift Dt = 0,0,0,0,1,1,1,1,1,,,,
Seasonal Pulse Dt = 0,1,0,0,0,1,0,0,0,1,,,,,
Time Trend Dt = 0,0,0,0,1,2,3,4,5,,,,, Note that a Pulse is the difference of a Level Shift
Note that a Level Shift is the difference of a Time Trend
13403/19/07
Example of a Pulse Intervention
Dt represents a pulse or a one-time intervention at time period 6.
Dt = 0,0,0,0,0,1,0,0,0
13503/19/07
Modeling Interventions - Level Shift
If there was a level shift and not a pulse then it is clear that a single pulse model would be inadequate thus
0,,,,,,,,,,,,,i-1,i,,,,,,,,,,,,,,,,T
Dt = ,0,0,0,1,1,1,1,1,1,,,,,,,T
or Dt = 0 t < i
Dt = 1 t > i-1
13603/19/07
Traditional Level Shift
13703/19/07
Another Kind Of Level Shift
13803/19/07
Modeling Interventions - Seasonal Pulses
There are other kinds of pulses that might need to be considered otherwise our model may be insufficient.
For example, December sales are high.
D D D
Zt = 0 i <>12,24,36,48,60
Zt = 1 i = 12,24,36,48,60
13903/19/07
Modeling Interventions – Local Time Trend
The fourth and final form of a deterministic variable is the the local time trend. For example,
1………. i-1, I,,, T
Dt = 0 t < i Dt = 1 (t-(i-1)) * 1 >= i
Dt = 0,0,0,0,0,0,1,2,3,4,5,,,,,
14003/19/07
In Far Away Places !
Some researchers are still using a variable called the COUNTING VARIABLE which assumes that there is one trend and that it has a common effect over all time. This is anachronistic *.
Dt = 1,2,3,4,5,,,,,T
14103/19/07
The Trend Poem
attributed to Sir Francis Cairncross
A Trend is a Trend is a Trend
But the question is
Will it bend?
Will it alter its course
through some unforeseen force
And come
to a premature end?
14203/19/07
When Our Assumptions Hold
14303/19/07
Can You Visually Detect The Violation and Suggest The Remedy
14403/19/07
When Our Assumptions Fail:
14503/19/07
When Our Assumptions Fail: Pulse Interventions Effects Mean of the Errors
14603/19/07
When Our Assumptions Fail:
14703/19/07
When Our Assumptions Fail: Level Shift Intervention Effects Mean of the Errors
14803/19/07
A Level Shift In A Trended Series
14903/19/07
Random vs. Level Shift Interventions
15003/19/07
Level Shift
15103/19/07
CASE 3
15203/19/07
When Our Assumptions Fail:
15303/19/07
When Our Assumptions Fail: Seasonal Pulse Interventions Effects Mean of the Errors
15403/19/07
Random vs. Seasonal Pulse
15703/19/07
Analytics Push-Out A Possible Structural Break
15803/19/07
CASE 4
15903/19/07
Random vs. Time Trended Residuals
16103/19/07
When Our Assumptions Fail: Original Series Exhibits Two Trends
16203/19/07
User Uses One Trend
16303/19/07
16403/19/07
One Trend
16503/19/07
When Our Assumptions Fail: Trending Residuals : First 100 of 300
16603/19/07
Residuals From the Two-Trended Model
16703/19/07
16803/19/07
16903/19/07
Daily German Telecom Revenue
17003/19/07
Two Trend Model with Daily Series, Holiday Series
17103/19/07
The Residuals
17303/19/07
RESIDUALS FROM AN INADEQUATE MODEL
17403/19/07
The VARIANCE of the errors may CHANGE over time
At Discrete Points in Time
Based Upon Level of the Series
Based Upon A Stochastic Model
Based Upon a Change in Model Parameters
17503/19/07
CASE 5
17603/19/07
When Our Assumptions Fail
17703/19/07
When Our Assumptions Fail: Break-Point: Suggesting Change in Variance
17803/19/07
WEIGHTED LEAST SQUARES
n
P
100
01
0
001
2
1
17903/19/07
WEIGHTED LEAST SQUARES
***22
*11
*
221
1
iKiKiii
i
i
i
KiK
i
i
ii
i
uxxxy
uxxy
18003/19/07
Tsay Studied the Daily IBM Series
18103/19/07
A Reasonable ARIMA Model Incorporating Pulses
18203/19/07
A Reasonable Residual ACF
18303/19/07
Identification of Variance Break Points
18403/19/07
The Weights Needed to Stabilize The Variance
18503/19/07
18603/19/07
18703/19/07
Random vs. Break-Point Change in Variance
18803/19/07
CASE 6
18903/19/07
When Our Assumptions Fail: Level Dependent: Suggesting Systematic Change in Variance
19003/19/07
Remedy Via Box-Cox Suggesting Logarithms
19103/19/07
Upwards Trending Actuals (Y=3+2*i)
19203/19/07
Code To Create A Linear Dependency Between the Variance of the Errors and the Level of The Series
19303/19/07
Random vs. Level Dependent Variance
19403/19/07
CASE 7
19503/19/07
Stochastic Variance: Suggesting Systematic Change in Variance Caused By A Random Walk Model in (Errors)**2
19603/19/07
Code To Create A Set of Errors Whose Squares Follow A Random Walk Model
19703/19/07
Random vs. Stochastic Variance
19803/19/07
CASE 8
19903/19/07
Auto-Correlative Structure Changes Over Time Suggesting Parameter Changes Over
Time (ACTUALS)
20003/19/07
Random vs. Non-Constant Parameter Case
20103/19/07
When Our Assumptions Fail: Auto-correlative Structure Changes
Over Time Suggesting Parameter Changes Over Time
20203/19/07
Based On All 300 Observations
20303/19/07
Based On All 300 Observations
20403/19/07
Local Estimation Suggests Transient Parameters 1-176 Versus 177-300 Provides Maximum Contrast
20503/19/07
Local Estimation Suggests Transient Parameters
20603/19/07
Before and After
20703/19/07
Final Model
20803/19/07
Correlation of Residuals
20903/19/07
Residuals From Final Model
21003/19/07
STORIES TO TELL !
1. THE AIRLINE SERIES
2. THE YAFFE SERIES
3. SHOPPERS AND THE UNUSUAL
4. USING TOO MANY DATA POINTS
21103/19/07
THE INFAMOUS AIRLINE SERIES
21203/19/07
The Airline Series received a lot of attention initially by R.G. Brown and then by Box and Jenkins. It was modeled using a logarithmic transform as conventional wisdom suggested increasing variability with increasing level.
21303/19/07
WHICH COMES FIRST THE CHICKEN OR THE EGG ?
21403/19/07
WHICH COMES FIRST THE MODEL OR THE TRANSFORM ?
21503/19/07
21603/19/07
21703/19/07
21803/19/07
Visual “proof” of the need to deal with non-stationary variance
21903/19/07
22003/19/07
22103/19/07
Adding a seasonal differencing
22203/19/07
Adding a seasonal differencing
22303/19/07
LOCAL VARIANCE & LOCAL MEAN
22403/19/07
NORMALIZED SCATTER PLOT
22503/19/07
LOCAL VARIANCE AS A FUNCTION OF LOCAL MEAN
22603/19/07
Adding a seasonal differencing
Untreated one could incorrectly conclude the variance of the errors was linked with higher levels of Y. This spurious conclusion was reached by the Box-Cox Test which responded to higher variance of the residuals at the high end of Y but not elsewhere.
22703/19/07
22803/19/07
22903/19/07
No Evidence of Non-Constant Variance
23003/19/07
Implementing a Test For Parameter Changes at Point 92
23303/19/07
THE YAFFE EXAMPLE
23403/19/07
An Example of STG
23503/19/07
The Output Series
23603/19/07
The Input Series
23703/19/07
Output
Input
23803/19/07
A Simple OLS Model
23903/19/07
Residuals From OLS
24003/19/07
Augmentations
24103/19/07
Residuals From Augmented Model
24203/19/07
Actual and Forecasts From Augmented Model
24303/19/07
An Example: Shoppers Are Creatures of Habit
24403/19/07
SHOPPERS
24503/19/07
Where Are The Anomalies ?
24703/19/07
24803/19/07
The Anomalies are Discovered Which Leads To The Question of Why ?
24903/19/07
Final Model Optimal Strategy: Fix Regression First then fix ARIMA then fix Dummy Structure
25003/19/07
Final Model
WEEK EFFECTS
25103/19/07
Final Model
PECULIAR DAYS
PULSES
25203/19/07
The Residuals Appear To Be Free Of Structure
25303/19/07
Which Was The Objective All Along !
25403/19/07
Statisticians Are Noise Makers !
25503/19/07
Forecasts With Confidence Limits
25603/19/07
Using Too Many Data Points
In a 1973 JRSS paper Chatfield and Prothero reported on a landmark case study (n=83) which raised serious questions about the idea that data transformations were a panacea.
Researchers at that time were strongly suggesting very powerful and potentially dangerous power transformations to render the error process with constant variance.
Their model/data clearly had a violation (symptom) of the constancy of variance assumption but the suggested cure (cause) of taking cube roots or logarithms was correctly deemed inadequate by the authors.
We have taken this data set and have found that perhaps a more plausible explanation is that the parameters had changed over time. Thus the symptom had more than one assignable cause.
25703/19/07
25803/19/07
A Reasonable Model
25903/19/07
Searching For Optimal Breakpoint
26003/19/07
Formal F test Regarding Homogeneity of Parameters
26103/19/07
Final Model Based Upon the Last 33 Observations
26203/19/07
26303/19/07
Residuals From Final Model
26403/19/07
Actual and Forecasts
26503/19/07
Actuals, Fitted Values and Forecasts
26603/19/07
Fit and Forecast
26703/19/07
Automatic Forecasting Systems, Inc. (AFS)
Phone: 215-675-0652email: [email protected] Site: www.autobox.com