Post on 07-Jul-2018
transcript
8/19/2019 SAS Introduction to Time Series Forecasting-libre
1/34
Quick Review about How to Use SAS to
Analyze Time Series Data
1. Get to know SAS
How to Start SAS?
)f you use computer in this laboratory, please start SAS from Desktop or Start/programs.
You can use the SAS software at the laboratory of the Computer center of our university, or even
by the server of our university if you have the permission.
You can get a temporary license of the SAS software by contacting our computer assistant.
Five main windows
Program Editor -- Edit SAS programs
Log – Records the running messages of SAS session, which is very helpful for program
debugging.
Output – Display output from SAS procedures
Explorer – Manage SAS datasets or Create new libraries
Result – Show a tree-like summary of your Output window
Several important shortcuts
Open a new Program Editor window
Open SAS program which is composed before
Save your program as external files
Create a new library
Open Explorer window to manage SAS datasets
Submit the whole program or just submit a few lines SAS programs to SAS System
2. How to use SAS
Two important concepts
SAS library – A folder in which the SAS data set is. You can create a new library by libname or
shortcut .
SAS data set – Temporary and Permanent SAS data set.
8/19/2019 SAS Introduction to Time Series Forecasting-libre
2/34
Structure of SAS program
DATA step – Deal with SAS dataset, or change raw data into a SAS data set, which can be
identified by SAS System and dealt with by PROC step
=====================================
DATA dataset name;
INPUT variable;
CARDS;
…………………..data line
;
=====================================
The dataset name must contain no more than 8 characters alphabet a, b…, digit , … or
underscore (_)), and begin with alphabet or underscore.
PROC step – Deal with SAS data set, and output results of analysis
=====================================
PROC procedure name DATA= dataset name;
RUN;
=====================================
8/19/2019 SAS Introduction to Time Series Forecasting-libre
3/34
The procedure name is the name of SAS Command, and includes PRINT, PLOT, GPLOT , and
INSIGHT etc.
3. Change raw data into SAS datasetCreate a new library
Library Name Physical Path
Using SAS program.
Using shortcut.
SAS data set name
For example, lib1.blood means that data set blood is saved in the library lib1.
The library_name can be sashelp, sasuser , maps, work or lib1. The dataset_name is due to you,
such as blood .
When library_name is equal to work , the data set work.dataset_name is temporary SAS data set,
which will be deleted automatically when you shut down the SAS software. At this time, the
work can be ignored. For example, you use blood or work.blood as the name of the data set.
Three methods to deal with data through DATA Step
The size of raw data is small.
Lib1 D:\example
Libname lib D:\example
library_name.dataset_name
DATA dataset name;
INPUT variable ;CARDS;
………………. data line)
;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
4/34
8/19/2019 SAS Introduction to Time Series Forecasting-libre
5/34
OUTLIER options;
FORECAST options;
RUN;
QUIT;BY
A BY statement can be used in the ARIMA procedure to process a data set in groups of
observations defined by the BY variables. Note that all IDENTIFY, ESTIMATE, and FORECAST
statements specified are applied to all BY groups.
IDENTIFY
ALPHA= significance-level: The ALPHA= option specifies the significance level for tests in the
IDENTIFY statement. The default is 0.05.
ESACF: computes the extended sample autocorrelation function and uses these estimates to
tentatively identify the autoregressive and moving average orders of mixed models.The ESACF option generates two tables. The first table displays extended sample
autocorrelation estimates, and the second table displays probability values that can be used to
test the significance of these estimates. The P= (pmin: pmax) and Q= (qmin: qmax) options
determine the size of the table.
NLAG= number: indicates the number of lags to consider in computing the autocorrelations and
cross-correlations.
STATIONARITY=(ADF= AR orders DLAG= s) or STATIONARITY=(DICKEY= AR orders DLAG= s):
performs augmented Dickey-Fuller tests. If the DLAG=s option specified with s is greater than
one, seasonal Dickey-Fuller tests are performed. The maximum allowable value of s is 12. The
default value of s is one.
VAR= variable ( d1, d2, ..., dk ) : names the variable containing the time series to analyze. The
VAR= option is required. A list of differencing lags can be placed in parentheses after the
variable name to request that the series be differenced at these lags. For example, VAR=X(1)
takes the first differences of X. VAR=X(1,1) requests that X be differenced twice, both times with
lag 1, producing a second difference series, which is (Xt-Xt-1)-(Xt-1-Xt-2)=Xt-2Xt-1+Xt-2 .
VAR=X(2) differences X once at lag two (Xt-Xt-2) . If differencing is specified, it is the
differenced series that is processed by any subsequent ESTIMATE statement.
ESTIMATE
METHOD=ML/ULS /CLS: specifies the estimation method to use. METHOD=ML specifies the
maximum likelihood method. METHOD=ULS specifies the unconditional least-squares method.
METHOD=CLS specifies the conditional least-squares method. METHOD=CLS is the default.
P= order: specifies the autoregressive part of the model. By default, no autoregressive
parameters are fit. P=(l1, l2, ..., lk) defines a model with autoregressive parameters at the
8/19/2019 SAS Introduction to Time Series Forecasting-libre
6/34
specified lags. P= order is equivalent to P=(1, 2, ..., order). A concatenation of parenthesized lists
specifies a factored model. For example, P=(1,2,5)(6,12) specifies the autoregressive model
Q= order: specifies the moving average part of the model.
NOCONSTANT/NOINT: suppresses the fitting of a constant (or intercept) parameter in the
model. (That is, the parameter is omitted.)
PLOT: plots the residual autocorrelation functions. The sample autocorrelation, the sample
inverse autocorrelation, and the sample partial autocorrelation functions of the model residuals
are plotted.
FORECAST
ALPHA= n: sets the size of the forecast confidence limits. The ALPHA= value must be between 0
and 1. When you specify ALPHA=, the upper and lower confidence limits will have a confidence
level. The default is ALPHA=.05, which produces 95% confidence intervals. ALPHA values are
rounded to the nearest hundredth.
ID= variable: names a variable in the input data set that identifies the time periods associated
with the observations.
INTERVAL= interval /n: specifies the time interval between observations.
LEAD= n: specifies the number of multistep forecast values to compute.
OUT= SAS-data-set: writes the forecast (and other values) to an output data set.
8/19/2019 SAS Introduction to Time Series Forecasting-libre
7/34
Fitting the ARIMA Model to a Simulated Time Series
0. Simulate an AR(2) time series data
The model: Z(t)=0.5*Z(t-1)+0.4Z(t-2)+a(t)
The SAS program:
Simulate an MA(2):
/* Create a new library */
libname ts 'D:/TimeSeries';
/* Simulate an AR(2) process */
data ts.ar;
z1=0; z2=0;
do t = -50 to 200;
a = rannor( 32565 );z = z1*0.5 + z2*0.4 + a;
if t > 0 then output;
z2=z1; z1=z;
end;
keep z t;
run;
/* Simulate an MA(2) process */
data ts.ma;
a1=0; a2=0;
do t = -50 to 200;
a = rannor( 32565 );
z = a + a1*0.2+a2*0.5;
if t > 0 then output;
a2=a1; a1=a;
end;
keep z t;
run;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
8/34
Simulate an ARMA(1,1):
1. Draw the time plot
The SAS program:
The result:
Simulated AR(2) Time Series
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.ar;
plot z*t;
run;
quit;
/* Simulate an ARMA(1,1) process */
data ts.arma;
z1=0; a1=0;
do t = -50 to 200;a = rannor( 32565 );
z = z1*0.5 + a + a1*0.3;
if t > 0 then output;
a1=a; z1=z;
end;
keep z t;
run;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
9/34
2. Identify some suitable models
The SAS program:
The summary of the output:
/* Identify some suitable models with minimum requirement */ proc arima data=ts.ar;
identify alpha=0.05 var=z nlag=20;
run;
/* Use EACF to identify the orders of ARMA models */
identify alpha=0.05 var=z nlag=20 esacf p=(0:6) q=(0:8);
run;
/* Use Dickey-Fuller unit root tests to check the stationarity */
identify alpha=0.05 var=z nlag=20 stationarity=(dickey=(1, 2, 4));
run;
/* Take differencing on the data and analyze again */
identify alpha=0.05 var=z(1) nlag=20 stationarity=(dickey=5);run;
quit;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
10/34
The detailed output without differencing:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
11/34
Series Correlation Panel
3 deterministic trends
different values of k
3 different tests
8/19/2019 SAS Introduction to Time Series Forecasting-libre
12/34
The detailed output after first differencing:
Series Correlation Panel
8/19/2019 SAS Introduction to Time Series Forecasting-libre
13/34
We may reach three possible models:
ARIMA(3,0,0); ARIMA(0,1,1); and ARIMA(2,1,0).
3. Estimate the models
Candidate models: AR(3), ARMA(3,1) with AR coefficient at lag 2 suppressed and ARIMA(2,1,0)
without intercept.
The SAS program:
/* Identify some suitable models with minimum requirement */
proc arima data=ts.ar;
identify alpha=0.05 var=z nlag=20;
run;
/* Use EACF to identify the orders of ARMA models */
identify alpha=0.05 var=z nlag=20 esacf p=(0:6) q=(0:8);
run;
/* Use Dickey-Fuller unit root tests to check the stationarity */
identify alpha=0.05 var=z nlag=20 stationarity=(dickey=(1, 2, 4));
run;
/* Take diffferencing on the data and analyze again */
identify alpha=0.05 var=z(1) nlag=20 stationarity=(dickey=5);
run;
/* Use CLS method to estimate the AR(3) model */
identify var=z;
run;
estimate method=cls p=3 plot;
run;/* Use ULS method to estimate the ARMA(3,1) model */
/* with the second coefficient is suppressed */
estimate method=uls p=(1,3) q=1 plot;
run;
/* Use ML method to estimate the ARIMA(2,1,0) model without
intercept */
identify var=z(1);
run;
estimate method=ml p=2 noint plot;
run;
quit;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
14/34
The summary of the output:
The estimated AR(3) model:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
15/34
The important outputs for the fitted AR(3) model:
Mean
Estimated
parameters
P values of
significanceIntercep
Variance of the
white noise
Standard deviation
of the white noise
8/19/2019 SAS Introduction to Time Series Forecasting-libre
16/34
Outputs for ARMA(3,1) with AR coefficient at lag 2 suppressed:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
17/34
Outputs for ARIMA(2,1,0) without intercept:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
18/34
4. Diagnostic checking for the fitted ARIMA(2,1,0)
The SAS program:
The summary of the output:
/* Diagnostic checking for the fitted ARIMA(2,1,0) */
proc arima data=ts.ar;
identify var=z(1);run;
estimate method=ml p=2 noint plot;
run;
forecast out=ts.dc lead=0 id=t;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.dc;
plot residual*t;run;
quit;
/* Perform the normality test */
proc univariate data=ts.dc normal plot;
var residual;
run;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
19/34
The time plot:
A normality test:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
20/34
Distribution plot and Q-Q plot for normality:
Sample autocorrelation function (ACF) of the residuals and Sample partial ACF of the residuals:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
21/34
Ljung-Box test:
Analysis of over-parameterized models:o The SAS program:
o
The first over-parameterized model based on the sample partial ACF:
/* Analysis of over-parameterized models */
proc arima data=ts.ar;
identify var=z(1) nlag=20;
run;
estimate method=ml p=2 noint plot;
run;
estimate method=ml p=(1,2)(6) noint plot;
run;
estimate method=ml p=2 q=(6) noint plot;run;
quit;
Test statistic
Degree of
freedomP-values
8/19/2019 SAS Introduction to Time Series Forecasting-libre
22/34
o The second over-parameterized model based on the sample ACF:
o
Three fitted models:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
23/34
o
Conclusion is that the fitted ARIMA(2,1,0) is not adequate!
8/19/2019 SAS Introduction to Time Series Forecasting-libre
24/34
5. Do forecasting with the fitted ARIMA(2,1,0) model
The SAS program:
The results:
/* Do forecasting by using the fitted ARIMA(2,1,0) model */
proc arima data=ts.ar;
identify var=z(1) nlag=20;
run;
estimate method=ml p=2 noint plot;
run;
forecast out=ts.out lead=50 id=t;
run;
quit;
/* Draw the time plot */ symbol i=join v=none;
proc gplot data=ts.out;
plot z*t=1 forecast*t=2 l95*t=3 u95*t=3/overlay;
run;
quit;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
25/34
1
Fitting the Seasonal ARIMA Model to
The Airline Passenger Data
0. The data
The airline passenger data records the number of passengers traveling by air per month from
January, 1949 to December, 1960.
It is given as Series G in Box and Jenkins (1976), and has been used in time series analysis
literature as a standard example of a non-stationary seasonal time series.
1. Draw the time plot
The SAS program:
The time plot:
/* Create a new library */
libname ts 'D:/TimeSeries';
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=sashelp.air;
plot air*date;
run;
quit;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
26/34
2
Taking log transformation and drawing the time plot again.
The time plot:
2. Identify some suitable models
The SAS program:
/* Take log transformation*/
data ts.lair;
set sashelp.air;
lair=log(air);run;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.lair;
plot lair*date;
run;
quit;
/* Identify some suitable models*/
proc arima data=ts.lair;
identify alpha=0.05 var=lair;
run;
/* Take differencing since the sample ACF decays slowly */
identify alpha=0.05 var=lair(1);
run;
/* Take seasonal differencing since the sample ACF decays slowly
especially after periods */
identify alpha=0.05 var=lair(1,12);
run;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
27/34
3
The sample ACF of original sequence:
The sample ACF of the sequence after common differencing:
The sample ACF of the sequence after both common differencing and seasonal differencing:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
28/34
4
3. Estimate the seasonal ARIMA(0,1,1)X(0,1,1)12 model
The SAS program:
The estimated model:
proc arima data=ts.lair;
identify alpha=0.05 var=lair(1,12);
run;
/* Estimate the ARIMA(0,1,1)X(0,1,1)12 model to the data */
estimate method=ml q=(1)(12) plot;
run;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
29/34
5
4. Diagnostic checking the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model
The SAS program:
The sample ACF of residuals:
proc arima data=ts.lair;identify alpha=0.05 var=lair(1,12);
run;
/* Estimate the ARIMA(0,1,1)X(0,1,1)12 model to the data */
estimate method=ml q=(1)(12) plot;
run;
/* Diagnostic checking by overfit AR part */
estimate method=ml p=(9) q=(1)(12) plot;
run;
/* Diagnostic checking by overfit MA part */
estimate method=ml q=(1)(12)(23) plot;
run;
/* Export the data to do further diagnostic checking*/
forecast out=ts.out lead=0 id=date;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.out;
plot residual*date;
run;
quit;
/* Perform the normality test */
proc univariate data=ts.out normal plot;var residual;
run;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
30/34
6
The sample PACF of residuals:
Ljung-Box test:
Diagnostic checking by overfitting the AR part and the MA part:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
31/34
7
Compare the
estimated
coefficients
Compare
the model
criteria
8/19/2019 SAS Introduction to Time Series Forecasting-libre
32/34
8
The time plot of the residuals:
Normality tests:
Distribution plot and Q-Q plot for normality:
8/19/2019 SAS Introduction to Time Series Forecasting-libre
33/34
9
5. Do forecasting with the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model
The SAS program:
/* Do forecasting with the fitted seasonal ARIMA(0,1,1)X(0,1,1)12 model */
proc arima data=ts.lair;
identify alpha=0.05 var=lair(1,12);
run;
estimate method=ml q=(1)(12) plot;
run;
forecast out=ts.out lead=24 id=date interval=month;
run;
quit;
/* Draw the time plot */
symbol i=join v=none;
proc gplot data=ts.out;
plot lair*date=1 forecast*date=2 l95*date=3 u95*date=3/overlay;
run;
quit;
8/19/2019 SAS Introduction to Time Series Forecasting-libre
34/34
10
The result: