Post on 28-Jun-2020
transcript
STABLE SEASONAL PATTERN (SSP)MODELS:
HISTORY AND RECENT DEVELOPMENTS
ISF2003
MERIDA, MEXICO
JUNE 16, 2003
THOMAS B. FOMBYDEPARTMENT OF ECONOMICS
SOUTHERN METHODIST UNIVERSITYDALLAS, TEXAS 75275E-Mail: tfomby@mail.smu.edu
Website: http://faculty.smu.edu/tfomby/
OUTLINE
I. INTRODUCTION A. IMPORTANCE OF MODELING SEASONALITY B. MANY REPRESENTATIONS i. Deterministic - Seasonal Dummies ii. Box-Jenkins - Seasonal Unit Roots iii. Unobserved Components (A.C. Harvey) iv. Periodic Autoregression (P. Franses) iv. Semiparametric Modeling (Burman and Shumway) v. SSP
II. BRIEF HISTORY OF SSP A. EARLY PAPERS B. MORE RECENT DEVELOPMENTS i. Bayesian Modeling ii.Sampling Theoretic Modeling C. A NEW DIRECTION: COMPOSITIONAL ANALYSIS
i. Previous Applications: Statistics, Economics, Marketing
ii. Direct Application to Modeling Seasonal Proportions
III. THEORETICAL FRAMEWORK (Chen, Cai, Fomby) A. Diagnostic Tests
B. Forecasting
IV. APPLICATION: HAWAIIAN TOURISM DATA A. Diagnostic Tests B. Forecasting Performance
V. SOME TENTATIVE CONCLUSIONS ON SSP MODELING
Three Graphs of
Hawaiian Tourism Data:
1. Original Data
2. Logged Data
3. 1st and 12th Difference of Logged Data
GAIN IN MODELING SEASONALITY
HAWAIIAN TOURISM DATA (1970 Jan.-1995 Dec.)In-Sample Data Set = 1970-1984 (180 obs.) Out-of-Sample Data Set = 1985-1995 (132 obs.)
FORECAST BJSEASONAL BJNAIVEEFFICIENCY
RATIOS Diebold-Mariano Test
HORIZON MAE MSE MAE MSE E (MAE) E (MSE) ,MAEtµ ,MSEtµ
1(n=132) 19,002.56 6.53E+08 43,108.72 2.67E+09 2.27 4.09 -4.53 -3.53(<.0001) (0.0006)
6(n=126) 30,213.78 1.62E+09 55,147.80 4.22E+09 1.83 2.60 -4.75 -4.26(<.0001) (<.0001)
12(n=120) 34,025.48 1.79E+09 60,166.53 5.27E+09 1.77 2.94 -0.83 -0.38(0.4082) (0.7022)
BJNAIVE=log (Tourism)~ARIMA (1,1,1)
BJSEASONAL=log (Tourism)~ARIMA (1,1,1) x SARIMA (0,1,1)
Probability values for Diebold-Mariano (1995) test are for 0Pr( | |)t t> , 0t = observed tDiebold, Francis X. and Mariano, R.S. (1995), "Comparing Predictive Accuracy,"Journal of Business & Economic Statistics, 13, 253-263
MANY REPRESENTATIONS
OF
SEASONAL TIME SERIES
DETERMINISTIC SEASONALDUMMIES
0 1 1 1 2 2t t t l t ty t D D Dβ β γ γ γ ε= + + + + + +…
where
1 2 0γ γ γ+ + + =…
1 1 2 2t t t m t m tuε ρε ρ ε ρ ε− − −= + + + +…
tu wn∼
BOX-JENKINS MULTIPLICATIVESEASONAL MODEL
( ) ( ) ( ) ( ),d D
t d D tB B y B B aφ µ θ Φ ∆ ∆ − =Θ
Non-seasonal AR (p): ( ) 11 ppB B Bφ φ φ= − − −…
Seasonal AR (P): ( ) 11 PPB B BΦ = −Φ − −Φ…
Non-seasonal MA (q): ( ) 11 qqB B Bθ θ θ= − − −…
Seasonal MA (Q): ( ) 11 QQB B BΘ = −Θ − −Θ…
All polynomials ( )Bφ , ( )BΦ , ( )Bθ , and ( )BΘ haveroots outside the unit circle and no common roots across the ARand MA polynomials.
d = order of first differencingD = order of seasonal differencing
Orders of d and D are often determined by the OCSB test. See
Osborn, D.R., A. P. L. Chui, J. P. Smith and C. R. Birchenhall(1988), “Seasonality and the Order of Integration forConsumption,” Oxford Bulletin of Economics and Statistics, 50,361-377.
The %Logtest macro in SAS is often used to determine if the datashould be logged before analysis. See SAS/ETS Online DOC(Version 8, 1999) Chapter 4, “LOGTEST MACRO,” pp. 138-140.
UNOBSERVABLE COMPONENTS MODEL(A. C. HARVEY “STRUCTURAL” TIME
SERIES MODEL (1989))
ttttty εγϕµ +++= (Measurement Eq.) (1)
=tµ trend, =tϕ cycle, =tγ seasonal, =tε irregular
tttt ηβµµ ++= −− 11 (Level) (2)
ttt ζββ += −1 (Slope) (3)
+
−
=
−
−**
1
1* cossin
sincos
t
t
t
t
cc
cc
t
t
κκ
ϕϕ
λλλλ
ρϕϕ
(Cycle) (4)
( )[ ]
∑=
+=2/
1
* sincoss
jjjjjt tt λγλγγ
(Seasonal) (5)
( )2,0~ εσε NIIDt , ( )2,0~ ηση NIIDt ,
( )2,0~ ζσζ NIIDt , ( )2,0~ κσκ NIIDt ,
( )2**,0~
κσκ NIIDt . *tt κκζηε ⊥⊥⊥⊥ (An Orthogonal Decomposition)
Harvey, A. C. (1989) Forecasting Structural Time SeriesModels and the Kalman Filter (Cambridge Univ. Press:Cambridge). Also see his STAMP© program.
PERIODIC AUTOREGRESSIVEMODEL [PAR (p)]
(TIAO and GRUPE (1980), FRANSES (1996))
tptsptsst yyy εφφµ ++++= −− ,1,1
s=1, 2, …, S (Seasons of year)
( )2,0~ εσε wnt or ( )2,0~ st wn σε ,
t (mod S) = s
Tiao, G. C. and Grupe, M. R. (1980), “HiddenPeriodic Autoregressive-Moving Average Models inTime Series Data,” Biometrika, 67, 365-373
Franses, P. H. (1996), Periodicity and StochasticTrends in Economic Time Series, Oxford: OxfordUniversity Press
SEMIPARAMETRIC MODELING OFSEASONAL TIME SERIES
(BURMAN AND SHUMWAY (1998))
( ) ( ) ijjiiij etty ++= γβα ,
i =1, 2, …, T (years), j =1, 2, …, S (seasons)
where 01
=∑=
S
jjγ , 1
1
2 =∑=
S
jjγ
( )itα = semiparametric (spline) trend function ( )jtβ = semiparametric (spline) modulation function
jγ = fixed seasonal effects
ije = white noise
Candidate for ( )itα = ( ) 11 ' θitbCandidate for ( )itβ = ( ) 22 ' θitbWhere ( ) ( ) ( ) ( )( )3 3 32 3
1 2' 1, , , , , , , kb t t t t t u t u t u+ + +
= − − −
And ( )3 0it u+
− = for iut ≤ and iut − otherwise. The spline breakpoints iu must be specified a priori (not always easy to do, evenwith data inspection).
Burman, P. and Shumway, R. H. (1998), “SemiparametricModeling of Seasonal Time Series,” Journal of Time SeriesAnalysis, 19, 127-145.
"DEFINITION" OF STABLE SEASONALPATTERNS
Robert M. Oliver (1987) “Bayesian Forecasting with StableSeasonal Patterns,” JBES, 5, 77-85. Provides a descriptionof what is now referred to as STABLE SEASONALPATTERN (SSP) Modeling.
“In many seasonal forecasting problems the fraction of eventsthat occur by a certain time within the season are stable over time;that is, the pattern and shape of event occurrence is similar fromseason to season although the end-of-season totals and thecumulative count of events within any given season are uncertain.”P. 77 (Note: the term “season” can be thought of as “year”)
“There have been numerous references in the scientific literatureto seasonal models in which the demand in a specified period oftime (such as a month), is viewed as a fraction of total demand inthe entire season (such as a year) …” p. 77
Various informal methods have been used to inspectseasonal data for stable seasonal patterns:
i.) Plot of Monthly Proportions versus Year, e.g. Burman and Shumway (1998), Chen and Fomby (1999) Three-dimensional plot P vs. Month vs. Year (z) (x) (y)
ii.) Plot of Cumulative Proportions (months 1-12) by year.Two- dimensional plot overlayed.
DeAlba and Mendoza (2001) and DeAlba and De Pascual(2003)
HISTORY OF THOUGHT
SSP MODELS(Selective Review)
Early Works
Chang, S. H. and E. E. Fyffe (1971), “Estimation of ForecastErrors for Seasonal Style-Goods Sales,” Management Science, B,16, 89-96. Comment (due to Oliver (1987)): Assumed randomend-of-year totals but that the underlying noise process wasstationary and homoscedastic, even though one of their twoproposed methods for estimating the seasonal fractions implicitlyused a binomial noise variance.
Kildes, R. and Stevens, C. (1978), “Look-No data: BayesianForecasting and the Effects of Prior Knowledge,” in Forecastingand Planning, eds. R. Fildes and D. Wood, London: Saxon Houseand Gower Press, 119-136. Comment (due to Oliver (1987)):Studied case where cumulative monthly sales are expressed as afraction of annual sales. The analysis and results, however, containthe incorrect assumption that cumulative seasonal errors areindependent and variances are stationary from period to periodwithin a season.
Olsen, J. A. (1982), “Generalized Least Squares and MaximumLikelihood Estimation of the Logistic Function for TechnologyDiffusion,” Technological Forecasting and Social Changes, 21,241-249. Comment (Oliver (1987)): Built model of adapters of anew technology. He was concerned with the number of futureadapters by Gaussian approximation but did not provide anupdating formula as new data becomes available.
More Recent Work
Bayesian
• Oliver (1987) Used a Kalman Filter approach based on a linear Bayesmodel with Gaussian Priors and the end-of-year total following arandom walk without drift. (Note: Therefore, the model is not strictlyapplicable to data that exhibits drift. No structured modeling of themean. TF)
• De Alba and Mendoza (1996) (Discrete-Small Sample Case) Bayesianapproach using a binomial distribution for the remaining yearoccurrences, a Jeffery’s Prior for the remaining year proportion, and avague prior for the end-of-year total. The marginal posteriordistribution of the end-of-year total turns out to be of the Beta-Pascaldistribution form (Raiffa and Schlaifer, 1961). Assuming quadraticloss, the Bayes estimator becomes the posterior mean which can besolved for numerically. (Note: Although the results are exact, themethod requires an informative prior. Uninformative priors (such asthe Laplace prior, Jeffreys-Bernardo prior, or conjugate analysis arenot available in this model as the implied parameter values result inposteriors without finite moments.)
• De Alba and Mendoza (2001) (Continuous-Large Sample Case) UseBayesian approach to get a posterior predictive distribution for end-of-year total based on the inverted Pareto distribution, produces highlycompetitive point-wise forecasts despite the fact that the predictivemodel has no moments due to its heavy right tail. Unfortunately, theHPD predictive intervals are for most practical purposes, useless. (Notrend model.)
• Mendoza and De Alba (2003), Working paper. Corrects some of theproblems of the model proposed by De Alba and Mendoza (2001).The model maintains the level of simplicity and does not require anykind of approximation. The point- wise forecasts are comparable tothose obtained in De Alba and Mendoza (2001) but the posteriorpredictive intervals are much improved. (No trend model.)
Sampling Theoretic
• Guerrero, V. M. and Alizondo, J. A. (1997), “Forecasting aCumulative Variable Using its Partially Accumulated Data,"Management Science, 43, 879-889. Use simple regressiontechniques to project accumulated sums into year-end totals.Method is ad hoc (no known optimality properties.) No trendmodel.
• Chen and Fomby (1999) Developed CPAR(p) andCPAR(p)X models. Trend model is included.
• De Alba and De Pascual (2003) Examining Ratio Estimatorsuggested from the cluster sampling literature (Kish, 1965).No Trend model.
• Chen, Cai, and Fomby (research in progress) Use ofcompositional analysis to model SSP data. Trend model isincluded.
SOME RECENT
SAMPLING THEORETIC
ESTIMATORS
THE RATIO ESTIMATOR
De Alba, Enrique and Jose Nieto de Pascual (2003), “AForecasting Method for Events with Stable Seasonality,”Agrociencia, 37, 33-44.
iπ = proportion of total (year-end) outcome that is expected to occur in the I-th period
within the year
∑=
r
ii
1π = 1
iR = quantity observed in i-th interval
RRr
ii =∑
=1 = total quantity for year
=ip estimator of iπ for i = 1, 2,…,r
The Direct Estimator of the year-end total, R, is
j
jj p
RT =ˆ ; (j=1,2,…, m) , m<r (1)
The Compound Estimator of the year-end total, R, is
∑ ∑= =
==m
j
m
j j
jj
cm p
Rm
Tm
T1 1
1ˆ1ˆ(2)
(My comment: It follows that the Direct Estimator of the next
period’s total, 1+mR , is
11ˆˆ
++ =⋅ mmm RTp
while the Compound Estimator of the next period’s total is
1 1ˆ ˆc c
m m mp T R+ +⋅ = . )
Using Taylor Series approximations, it is shown in De Alba andNeito de Pascual that the Compound Estimator is “approximately”unbiased. The “approximate” variance of the CompoundEstimator of the year-end total is also derived. Through the sameapproximate arguments it is shown that
mjTVarTVar jc
m ,,2,1),ˆ()ˆ( =≤
and, “approximately speaking,” there is an efficiency gain in usingthe Compound Estimator to estimate the year-end total. (MyComment: It would seem that similar arguments would lead to theconclusion that the Compound Estimator of next period’s total, orfor that matter any other yet-to-be-realized period (within year)
total would be more efficiently estimated by using the CompoundEstimator.)
HOW DO YOU ESTIMATE jπ ? Assuming stableseasonal patterns one can use the estimator
∑
∑
=
== n
ii
n
iij
j
R
Rp
1
1 , .,,2,1 rj =
MY NOTE: The Compound Estimator c
mT is easy to computeand use. However, not much is known about its finite sample
properties and its relative efficiency to the Director Estimator jTin varied settings. However, when one only has a few years ofobservations on which to base forecasts and it appears thatseasonality is stable, then the Compound Estimator would seem tofit the desired criterion of “ease of implementation.” Quotingfrom Guerrero and Elizondo (1997):
“A difficulty with the Bayesian approach is that the resultingmethods usually require a large amount of expertise from theanalyst in order to implement them. Thus we follow in this articlethe idea of Kekre, Morton, and Smunt (1990) that some newapproaches are needed to provide ‘simpler forecasting methods inhopes of minimizing implementation problems.’ ”
ANOTHER NOTE: The ratio estimator does not incorporate amodel for the year-end total. As a result, when standing at the endof a given year, there exists no method for estimating next-year’send-of-year total.
FOMBY ADDITIONAL REMARKS
ON RATIO ESTIMATORS
The "pooled" estimator of jπ suggested by De Alba andNieto De Pascaual (2003) is
∑
∑
=
== n
ii
n
iij
j
R
Rp
1
1 , .,,2,1 rj =
Another possible estimator:
1 2*
1 2
( )/j j njj
n
R R Rp n
R R R= + + +
(1) (2) ( )( )/nj j jp p p n= + + +
jp= .
Only if 1 2 _nR R R R= = = = does *
j jp p= .
Which one is to be preferred and under what circumstances?
FOMBY REMARKS ON
RATIO ESTIMATORS CONTINUED
Consider the Compound Estimator of the year-end totalrecommended by De Alba and Nieto de Pascual (2003):
∑ ∑= =
==m
j
m
j j
jj
cm p
Rm
Tm
T1 1
1ˆ1ˆ .
Why not consider a "weighted" compound Estimator of year-endtotal like
*1 1 2 2
ˆ ˆ ˆ ˆcm m mT wT wT wT= + + +
where 1 2 1mw w w+ + + = but the iw are not necessarily
equal. Or if we thought one or more of the iT are substantiallybiased we could consider
**
0 1 1 2 2ˆ ˆ ˆ ˆcm m mT w wT wT wT= + + + +
where 0 0w ≠ and 1
1m
ii
w=
≠∑ . The first weighted Compound
Estimator is in the spirit of the Nelson (1972) combinationmethods and the second follows the spirit of the Granger-Ramanathan (1984) combination method. These methods choose
the weights based on ˆ( )mVar T and ˆ ˆ( , )j kCov T T for
, 1,2, ,j k m= but j k≠ .
DO THESE ESTIMATORS IMPROVE
ON THE COMPOUND ESTIMATOR ˆ cmT ?
Of course if you only had two or three years of data then thesecombination methods would probably not be very useful butwith 15 or more years of data these weighted compoundestimators might prove to be useful.
PARTIAL-SUM REGRESSIONSGuerrero, V.M. and J. Alan Elizondo (1997), “Forecasting aCumulative Variable Using Its Partially Accumulated Data,”Management Science, 43, 879-889.
1. Regression Model for Total Accumulated Variables withConstant Level:
ktktkkt RR ,,10 εββ ++= (3)
for rk ,...,2,1= and 1, 2 ,t n= ,
where Guerrero and Elizondo assume that kt ,ε is iid. Applying least squares to (3) provides the following end-of-
year total estimate, given the accumulated sum ktR , :
( )0 1 ,
ˆ ˆˆ kt k k t kR Rβ β= + for rk ,...,2,1= . (4)
2. Regression Model for Total Accumulated Variables with Non-constant Level:
First Model (in differences):
ktktktkktt RRRR ,,1,101 )( εγγ +−+=− −− for (5)
for rk ,...,2,1= and 1, 2 ,t n= ,
where Guerrero and Elizondo assume that kt ,ε is iid.Second Model (deterministic trend):
ktkktkkt tRR ,,2,10 εβββ +++= for (6)
for rk ,...,2,1= and 1, 2 ,t n= ,
where Guerrero and Elizondo assume that kt ,ε is iid.
NOTE: The method is essentially ad hoc. In addition, the
assumption that kt ,ε is iid may not always be tenable. Unlike theRatio estimator there is an explicit allowance for trend although thetrend characterization is simple and does not involve possiblyuseful exogenous variables. Also in the spirit of the CompoundEstimator of De Alba and De Pascual, one could take the simpleaverage of the k partial-sum predictions of the year-end total,
( )ˆ ktR , rk ,2,1= and probably obtain a more accurate
end-of-year total estimator. Moreover, one might exploit aSeemingly Unrelated Regression framework for the k partial-sum regressions and use a simple average of the k SURpredictions of the end-of-year total to obtain an even moreefficient estimator.
FORECASTING WITH
STABLE SEASONAL PATTERN MODELS
WITH AN APPLICATION TO
HAWAIIAN TOURISM DATA
Rong Chen Tom Fomby Department of Statistics Department of Economics University of Illinois-Chicago Southern Methodist University
Reference: JBES, Oct. 1999, Vol. 17, No. 4, 497 - 504.
BASIC PREMISE:
END-OF-YEAR FORECASTING
Given last year's tourist travel to Hawaii, what willthis year's total tourist travel be? Given last year'stourist travel and the first two month's travel thisyear, what will the end-of-year total be? TheHawaiian Tourism Bureau and its AssociationMembers would like to know.
A STABLE SEASONAL PATTERN APPROACH
Let there be 12 months in the season s = 1, 2, …, d =12 and let Xts = the number of tourists arriving inHawaii during month s and year t = 1, 2, … .
Then the end-of-year forecasting problem is to
forecast the end-of-year total ty
ForecastYearofstXyk
itit −−−+=∑
=
Reˆ1
∑∑+==
+=d
kiti
k
itit XXy
11
ˆˆ
With a stable seasonal pattern the fraction of theyear's total falling within the various seasons of theyear
dii ,....,2,1, =θremain constant and a plausible formula would begiven by
∑ ∑= +=
+=k
i
d
kltltit yXy
1 1
ˆ θ
But how do we determine the unobservables
tl y,θ ?
One could use sl 'θ obtained from using theprevious years' tourism per month vis-à-vis amultinomial distribution. The ty could be modeledusing the average of the previous year-end-totalsassuming stationarity of the totals or, as we dohere, based upon a conditional Poisson distributionconditioned by previous year-end-totals andpossibly other exogenous variables.
ILLUSTRATION OF THE GENERALSSP MODEL STRUCTURE
(Chen and Fomby JBES (1999))
1( , , )t tDx x 1,1 1,( , , )t t Dx x+ +
1( )M⇑ 1( )M⇑
2( )M
ty⇒ ⇒ 1ty + ⇒
FORECASTING WITH SSP MODEL:DISCRETE CASE
tkx are integer-valued seasonal time seriesAssume the seasonal pattern is stable over time.
1. One obvious choice for model 1M is the MultinomialModel:
1, , |t tD tx x y Π ∼ ( , )tMultinomial y θ
=θ ( ) 10,,,, 21 ≤≤ iD θθθθ and 11
=∑=
D
iiθ
2. Next step is to choose a model for the seasonal totaltime series ( 2M ).
If we assume that the inter-arrival time between touristsis independent and exponentially distributed, then we canassume a non-homogeneous Poisson model for theseason total, where the arrival rate is a function of thelagged values of the season totals and some exogenousvariables. Let =ty ( )11 ,,, yyy tt − be the lag values and
tz be a set of exogenous variables observable at time t.
Define a conditional Poisson model as
|( tyΠ ),,1 βtt zy − ~Poisson ( ) 2Mt =λ
where ft =λ ( )β,,1 tt zy − is a positive function of the pastobservations and/or some observable exogenousvariables and β is the associated parameter vector.
Two variants for 2M , the yearly totals model.
CPAR(p) Model
[ ]1
| , ( ),p
t i t ii
y yβ β −=
Π +∑t-1 0y β Poisson∼
where 0 0β > and 0, 1, ,i i pβ > = .
CPAR(p)X Model
[ ]| , ( )t ty λΠ =t-1 ty z Poisson
where
01 0
( )p q
t i t i i t ii i
a a y b z Cλ − − −= =
= + + +∑ ∑
Note 0 0a > , 0ia ≥ and 0ib ≥ . The constant C is used sothat 0tz C+ ≥ for all t and can be viewed as the interceptterm. Note the above setting is proper only when the yseries and z series are positively correlated. To incorporatea negatively correlated exogenous variable, one can imposerestrictions that 0ib ≤ and 0tz C+ ≤ . The parameter 0> isthe lag time for the changes in the exogenous variable tohave an effect on ty .
Given the conditional independence between 1M and 2M ,the conditional mean prediction is
( ) ∑+∑=+==
D
kiti
k
itit xky
11
ˆˆˆ λθwhere iθ are obtained from the maximum likelihoodestimation of θ in the multinomial distribution ( 1M )while tλ is obtained from CPAR(p) model or, in thepresence of exogenous variables, the CPAR(p)Xmodel.
FORECASTING WITH SSP MODEL:CONTINUOUS CASE
Define the Gaussian-Multinomial Distribution ( 1M )
A D-dimensional random variable ( )Dxx ,,1 is said to
follow G-MN ( )2
1 ,,,, σθθ Dy if
[ ]1 1, , Dx x −Π ∼ ( )2,N µ σ Σ
and ∑−=−
=
1
1
D
iiD xyx
where
( )( )
( )
−−−
−−−−−−
=Σ
−−−
−
−
11211
122221
112111
1
11
DDDD
D
D
θθθθθθ
θθθθθθθθθθθθ
This distribution can be viewed as a continuous version of themultinomial distribution.
It can be shown in this distribution that
2
1( , (1 ))
k
i k k ki
x N yγ σ γ γ=
−∑ ∼
where 1
k
k ii
γ θ=
= ∑ .
( )∑
∑ ∑=
=
= =T
tt
T
tt
k
iti
k
y
yx
1
2
1 1γ
2M : [ ] 2| ( , )t t yy N µ σΠ t-1y ∼
and
( )t fµ = t-1 ty ,z ,β
,1 1
1(1 ) (1 )
k D
t k ti i ti i kk k k k
cxc c
µ θµγ γ γ γ= = +
= ++ − + −∑ ∑
where 22 / yc σσ= .
Figures 4, 5, and 6
of Chen and Fomby, JBES, 1999
were displayed here
Figure 4. Prediction Using the CPAR Model.The straight lines are the true yearly totals. The x's are thepredictions using a CPAR(2) model and the o's are predictionsusing a CPAR(1)x model.
Figure 5. Prediction Using Gaussian AR Modeland ARIMA Model.The straight lines are the true yearly totals. The x's are predictions using anSSP model with AR(1) for the annual series and the o's are predictionsusing an ARIMA model.
Figure 6. Mean Squared Error (a) and MeanAbsolute Error (b) of Different PredictionProcedures for Out-of-Sample Period 1985-1995.
ARIMA, Gaussian AR(1), − − − −CPAR(1)x
ANOTHER APPROACH TO
STABLE SEASONAL PATTERN
MODELING:
COMPOSITIONAL ANALYSIS
• ANALYZE SEASON PROPORTIONS USINGCOMPOSITIONAL ANALYSIS
• COMPOSITIONAL ANALYSIS REPRESENTS OLD ANDESTABLISHED STRANDS OF THE FOLLOWINGLITERATURES (AMONG MANY OTHERS THAT I AMNOT AWARE OF):A) ECONOMICSB) MARKETINGC) GEOLOGYD) BIOLOGYE) ECOLOGYF) STATISTICS
ECONOMICS
REFERENCES:
• Bewley, Ronald, Allocation Models: Specification,Estimation, and Applications (Ballinger Publishing Co.:Cambridge, Mass.), 1986.
• Pollak, Robert and Wales, Terence J., Demand SystemSpecification and Estimation (Oxford University Press:Oxford, UK), 1992.
AREAS:
• Consumer demand models are subject to a budget constraint,and total household expenditure is allocated among a numberof goods and services. See Brown and Deaton (1972), Barten(1977), and Deaton (1986), for example.
• Asset demand models are subject to a balance sheetconstraint, and wealth is distributed among a variety of assetsand liabilities. See Parkin (1970) and White (1975), forexample.
• International trade models are (implicitly) constrained by thefact that total world exports equals total world imports. SeeTheil (1980) for a discussion of the system-wide approach tointernational economics.
• Regional migration models utilize the fact that within aclosed system, total emigration equals total immigration.Greenwood and Hunt (1984) discuss the role of identities inspatial models.
MARKETING
REFERENCES:
• Ghosh, A., Neslin, S. and Shoemaker, R. (1984), "AComparison of Market Share Models and EstimationProcedures," Journal of Marketing Research, 21, 202-210.
• Cooper, L.G. and Nakanishi, M. (1988), Market-Share Analysis,(Boston: Kluwer Academic Publishers).
AREA:
Market share models are subject to the constraint that all sharessum to one. Interest lies in determining the effects of marketvariables and strategic business decisions on the market shares ofcompanies in particular markets.
STATISTICS - THEORETICALANALYSIS
REFERENCE:
• Aitchison, J. (1986), The Statistical Analysis of CompositionalData, (Chapman and Hall: New York).
Aitchison's book discusses the theoretical aspects of the statisticalanalysis of compositional data. Defines and studies the conceptsof Bases, Compositions, Subcompositions, Amalgamations, andPartitions. The book's emphasis is on the logratio analysis ofcompositions. Aitchison covers examples in geology, biology, andecology among other fields of application. See his book forreferences.
Compositional Data Analysis
Applied to SSP Modeling
Rong ChenAirong Cai
Department of Informationand Decision Sciences
University of Illinois at ChicagoChicago, Illinois
Tom Fomby
Department of EconomicsSouthern Methodist University
Dallas, Texas
Compositional Forecastingtkx are integer-valued seasonal time series
Assume the seasonal pattern is stable over time.
1. One-Step Ahead Forecasting (Model 1)
Consider the following two-part decomposition:
( , )t i t iy x where
1
( 1),1 1
D i
ti t j tjj i j
y x x−
−= + =
= +∑ ∑ .
Define log( )ti ti tir X Y= / . Using the marginal normaldistribution tests of Aitchison (1986), we could testthe assumption tir ∼ N 2( )i iµ σ, with estimatesˆ ˆ iiµ σ, . Once this hypothesis is accepted, tiX can
easily be forecasted given tiY . At the end of timewhen tiy is given, so:
22 2
2 22( ) ( ) ( ( 1))i
i i i iti i i ti tir N X y e e e
σµ µ σ σµ σ + +, ⇒ / Λ , − ⇒∼ ∼
2 2ˆ2 2ˆ( ) 2 1ˆi i
i ititi ti tiE X y e y e t … n i … Dx
σσµ µ+ +/ = ⇒ = = , , , = , ,
2. To Make Multiple Forecasts, say 1,, ,ti t i tkx x x+ (Model
2), construct the following composition:
( , , , )ti ti tky x x
where
1
( 1)1 1
D i
ti t j tjj k j
y x x−
−= + =
= +∑ ∑
and proceed as in Model 1.
3. To Forecast end-of-period total ty after observing
( , , , )ti ti tky x x (Model 3), construct the followingcomposition:
( , )tk t tky y y−
where
1
k
tk tjj
y x=
= ∑ and 1
D
t tjj
y x=
= ∑
4. To forecast end-of-period total ty with the helpof a yearly total (trend) model, construct a two component
model ( )tk t tky y y, − , where
1 1
and yk D
tk tj t tjj j
y x x= =
= = .∑ ∑ In this model assuming
2log( ) ( )t tktk k k
tk
y yr Ny
µ σ−
= ,∼
and
1( )t t ty f y e−= + for some model f . Here we also assume te is normally distributed with mean 0.
At the end of time when 1,tk ty y − are given, with
estimates 2 2ˆ ˆ ˆk ykµ σ σ, , , we get:
2 2
1 1
12 21 1
12 2
1
( | ) ( ) ( ) ( | ) ( | ) ( )
exp( [log( 1) ] )exp( ( ( )) ) ( )
(k z
t tk t t t tk t tk
tk t t t t tk
t tk k t t t tk
t
P y y y p y y y I y yp y y p y y I y y
y y y f y I y y
p yσ σ
µ
− −
−
−
, ∝ , , >
∝ >
∝ − / − − − − >
∝ 2 1| ) ( | ) ( )tk t t t tky p y y I y y− >
Using Monte Carlo methods the mean and variance of tycan easily be estimated. Suppose at each time we simulatem samples of ,ˆ 1tk i tky y i … m> , = , , from 1 ,ˆ( | )tk i tkp y y
given tky , and given 1ty − at the same time,
, 2 , 11
2 , 11
1 1
1 2 1
ˆ ˆ( ) ( | )
ˆ( | )
( ( ) | ) ( ) ( | ) ( )dy
( ) ( | ) ( | ) ( )dy
mtk i tk i ti
mtk i ti
t tk t t t tk t t tk t
t t tk t t t tk t
h y p y y
p y y
E h y y y h y p y y y I y y
h y p y y p y y I y y−=
−=
− −
−
, = , >
= >
∑≈∑
∫∫
,1
1
ˆ( )
mtk i tk ii
mtk ii
h y w
w
,=
,=
∑=∑
The random samples of ,ˆ tk iy could be generated using theformula
,ˆ ˆ{exp( ) 1} ,ˆ1 1 1
ktk i tk i tkky yt … n k … d i … m
εµ σ ,= + +
= , , , = , , , = , , where (0 1)tk i Nε , , .∼
In this way, functions of ty could be easily estimatedincluding mean or variance.
HAWAIIAN TOURISM DATA
Figure 1. Monthly Hawaiian Tourist Series (Monthly, West-bound monthly and East-bound monthly)
Figure 2. Yearly Hawaiian Tourist Series (Total, West-bound total and East-bound total)
Figure 1 shows the monthly observations from January1971 through December 1995. Figure 2 shows theaggregated yearly totals for the same period.
We apply the above compositional models to the Hawaiian tourism series and
we’ll focus on out-of-sample prediction performance. For this purpose, we use the datafrom 1971 to 1985 for initial parameter estimation. Starting in January 1986, when eachmonthly observation is added to the series, we re-estimate the parameters and obtain thecorresponding predictions, monthly or yearly total. These predictions are then comparedwith the true monthly or yearly total data.
Log-normality test of iR The computed marginal test statistics are presented in the following tables:
Table 3: Marginal test statistics
IMarginal 1 2 3 4 5 6Cramer-von
Mises0.0900 0.0792 0.1036 0.0814 0.0883 0.0695
Watson 0.0752 0.0685 0.0905 0.0735 0.0829 0.06477 8 9 10 11 12
Cramer-vonMises
0.0573 0.0718 0.0576 0.0901 0.0862 0.1093
Watson 0.0528 0.0694 0.0530 0.0867 0.0800 0.1016
Table 4: Cramer-von Mises Bivariate test statistics
2 3 4 5 6 7 8 9 10 111 0.0287 0.1641 0.0883 0.0667 0.0516 0.0327 0.0713 0.0993 0.0821 0.04322 0.1218 0.03 0.0242 0.037 0.0397 0.0031 0.032 0.0318 0.05383 0.008 0.0276 0.066 0.0307 0.0147 0.0492 0.0463 0.02214 0.0187 0.0226 0.0346 0.0339 0.0445 0.0456 0.07215 0.0694 0.0161 0.033 0.0534 0.0587 0.02646 0.0462 0.0353 0.0494 0.0557 0.06527 0.0473 0.0498 0.1378 0.02528 0.0356 0.0274 0.02229 0.0893 0.0364
10 0.0531
Table 5: Watson Bivariate test statistics
2 3 4 5 6 7 8 9 10 111 0.04 0.0407 0.042 0.0629 0.0438 0.0438 0.0655 0.0707 0.0539 0.04882 0.0271 0.0227 0.0317 0.0481 0.0453 0.0139 0.0359 0.0345 0.06353 0.016 0.0355 0.0432 0.0317 0.017 0.0321 0.021 0.03314 0.0289 0.0332 0.042 0.0301 0.0393 0.0334 0.0595 0.0256 0.0275 0.0176 0.0382 0.0263 0.03296 0.0534 0.044 0.0514 0.0336 0.05137 0.0332 0.058 0.0274 0.03648 0.036 0.0215 0.03229 0.0245 0.0302
10 0.0299
Table 6: Radius test statistics
Cramer-von Mises 0.0536Watson 0.0638
Comparison of these computed values with the corresponding critical values in Table 2shows no significant departure from log-normality at the 5% level by any of these tests.Thus the composition w is considered to be additive logistic normal distributed andcompositional data analysis are reasonable to be applied.
Composition data structure analysis Let
1log( )ti
t
Xti Xr = .
1. Seasonal effect test
A test for equality of variance is conducted and an insignificant p-value of 0.208 isobtained. Then we accept the null hypothesis of equal variances and apply the ANOVA.The F-statistic is 22.06 and p-value is 0, which shows there exists seasonal effects incomposition Hawaii data. The boxplot of tir in the figure 3 also shows seasonal effect.
Figure 3. Boxplot of tir
2. Stability test
Since there are only 25 years of Hawaiian data, we only divide them into 2 groups with12 years’ data in the first group and 13 years’ data in the second one. By applying MANOVA test we get a p-value of 0.20156 which is not significant at alland means the vector µ for the two groups could be considered to be equal at 5% level. 3. Forecasting and Comparison
Table 7: Marginal test of Model 1
Marginal I1 2 3 4 5 6
Cramer-von Mises 0.046 0.0894 0.2363 0.0382 0.0416 0.1054Watson 0.043 0.0805 0.2176 0.0375 0.0403 0.0866
I7 8 9 10 11 12
Cramer-von Mises 0.0748 0.0309 0.0381 0.0225 0.0266 0.0977Watson 0.0738 0.0285 0.0378 0.0223 0.024 0.0875
Table 8: Marginal test of Model 3 and Model 4
Marginal K1 2 3 4 5 6
Cramer-von Mises 0.088 0.1482 0.2146 0.1921 0.1413 0.1818Watson 0.08 0.1302 0.1898 0.1748 0.121 0.1492
K7 8 9 10 11
Cramer-von Mises 0.1078 0.0394 0.1032 0.036 0.0674Watson 0.0903 0.0364 0.0799 0.0357 0.0659
For model 1, 3 and 4 in section 4, we calculated the marginal modified statistics. In table4, it’s easy to find that all statistics are insignificant except for i=2, i.e. 3tr fails to followa normal distribution even at 1% level. For models 3 and 4, in table 5, combining the twostatistics together, those values for k=3,4 are significant at 1% level, but most areinsignificant at 1% level. Hence, in spite of the significant values, we still accept the nullhypothesis of normality.
Then we applied the models for forecasting. For model 1, we did 1-step aheadforecasting. For comparison, we also applied seasonal ARIMA model for 1-step aheadforecasting. Using standard Box and Jenkins techniques and data from 1971 to 1984, wefound the following model to be reasonable for the monthly Hawaiian tourist series:
12 121 12(1 )(1 ) log( ) (1 )(1 )ti tiB B X B B eφ θ θ− − = − − .
Figure 4. 1-step ahead forecast of monthly data 1986-1995: Truedata Model1 ARIMA,−, ; ;∗,
Figure 5. MSE of forecast of monthly data 1986-1995: Model1 ARIMA, ;∗,
Figure 4 shows model 1’s forecasts have bigger variability than ARIMA’s for mostmonths and model 1 is more reluctant to change the trend of forecast than ARIMA. Figure 5 gives a clear picture of ARIMA’s better forecasts than model 1’s, esp. for month2 and 3. For the last several months, model 1 performs as well as ARIMA.
For model 3, we also use ARIMA model to forecast yearly total for comparison. At thek th month, we use the estimated seasonal ARIMA model to obtain 1-step to (12 )k− -step-ahead predictions for months 1k + to 12. The predictions are then transformed backto the original scale, with bias adjustments.
Figure 6. Forecast of yearly data 1986-1995: Truedata Model3 ARIMA−, ; , ;∗,
Figure 7. MSE of forecast of yearly data 1986-1995: Model3 ARIMA, ;∗,
Figure 6 presents the prediction of model 3 against that of ARIMA, and figure 7 presentsthe MSE comparison of forecasts for model 3 and ARIMA. The two plots demonstratethat the performance of model 3 is a little better than that of ARIMA, esp. at month 3 and4. The last several months’ yearly total forecasts from model 3 is again better than thosefrom ARIMA model. Moreover, note that the forecasting procedure for model 3 is muchmore straightforward and simple than that of ARIMA.
For model 4, we generated m=10000 samples. And we assume a AR(1) model for1 1( ) i et t t tf z z c z eφ− −, . ., = + + . Similarly, two plots are presented to compare model 4 and
ARIMA.
Figure 8. Forecast of yearly data 1986-1995: Truedata Model4 ARIMA−, ; , ;∗,
Figure 9. MSE of forecast of yearly data 1986-1995: Model4 ARIMA, ;∗,
Evidently, the performance of model 4 is even better than ARIMA. At most months theforecasts from model 4 are better, esp. at the beginning of the year.
Figure 10. MSE of forecast of yearly data 1986-1995: Model4 Model3 ARIMA, ;∗, ;∇,
Figure 11. MAE of forecast of yearly data 1986-1995: Model4 Model3 ARIMA, ;∗, ;∇,
To compare model 3,4 and ARIMA, the MSE and MAE plots are presented. Althoughmodel 3’s and 4’s forecasts get closer and closer when approaching the end of years,model 4’s got obviously better forecasts than model 3, which can easily be seen fromfigure 10 and figure 11. And the overall performance of both model 3 and model 4 isbetter than that of ARIMA.
CONCLUSIONS
• From the perspective of the ease ofcommunication of statistical resultsbetween the time series analyst and the lay-user, it would seem that the idea of a SSPapproach using seasonal proportions makesa lot of sense. Proportions are easy for thelayman to understand. Seasonal Unit roots,unobserved components, or semi-parametric decompositions are not!
• In terms of choosing between the variousSSP models for forecasting purposes, itseems that two considerations loom large:(1) the amount of data available and (2) thesophistication of the analyst and theamount of time the analyst has tocompletely analyze and fit a complicatedmodel before a deadline passes.
Consider the following two-by-two decisionmatrix:
DECISION MATRIX
ComplexModel
Bayesian ModelUsing Informative
Prior
Bayesian ModelW/ uninformative prior orCPAR(p)X or G-MN model
SimpleModel
RatioEstimator
CPAR(p) model orRatio Estimator
Small Sample size Large Sample size
Question of Efficiency Gain?
• Models (like the Ratio Estimator and theBayesian models examined so far) that donot have an explicit model of trend whichtakes into account possible autocorrelationin year-end totals and/or allow forexogenous variables that potentially exhibitstrong signal-to noise ratios will likely berelatively inefficient forecasters in largesample sizes.
• More work needs to be done on testing thenull hypothesis of Stable Seasonality. IfStable Seasonality is rejected, how does onego about robustifying the SSP models to,say, "slowly-evolving" seasonal patterns?
• With respect to comparing the forecastingdesirability of the various models ofseasonality, it would seem the time isappropriate for a Makridakis, et. al. (1982)type of forecasting "bake off" competitionso that we might come to know moreextensively the strengths and weaknesses ofthe competing models. There are severalclassic textbook data sets that would lend
themselves to such a "bake off." Reference:Makridakis, S., A. Andersen, R. Carbone, R. Fildes, M. Hibon, R. Lewandowski, J.Newton, E. Parzen, and R. Winkler (1982), "The Accuracy of Extrapolation (TimeSeries) Methods: Results of a Forecasting Competition," Journal of Forecasting, 1,111-153. (Seven experts in each of 24 methods forecasted up to 1001 series for six upto eighteen time horizons and comparisons of forecasting accuracy measures acrosstechniques were made.)
• In the spirit of combination forecasting, it isquite likely that weighted combinationforecasts based on two or more of the bestseasonal forecasting models will likely dosubstantially better in forecasting than thebest single seasonal model in a majority ofcases. For the best of all forecasting worlds,we need to take on the data-miningmentality of breaking our data into thetraining, validation, and test parts to comeup with a short list of viable seasonalforecasting methods that are combinedusing the validation data and then judgedunconditionally (in terms of "true"forecasting accuracy) over the test data set.In the parlance of the data-miner,"committee rules (i.e. combinationmethods) invariably rule." DEMOCRACYIS A GOOD THING.