Click here to load reader
Date post: | 23-Aug-2014 |
Category: |
Science |
Upload: | sean-golliher |
View: | 166 times |
Download: | 4 times |
Click here to load reader
Toward Automatic Time-Series Forecasting UsingNeural Networks - Weixhong Yan
Presenter: Sean Golliher
1 / 19
Relationship to Research
Currently analyzing the performance of NEAT for Time SeriesForecasting (TSF)
Paper summarizes common approaches, and issues, using ANNs forTSF
2 / 19
Claims of the Paper
Develops an automatic TSF model using a Generalized RegressionNeural Network (GRNN)
Shows promising results by winning NN3 time-series competitionagainst 60 different models
3 / 19
General Problems with ANN
Most approaches are ad hoc meaning they do some type ofpreprocessing of the data
Typically try different ANN architectures to see which one performsbetter
Nelson et al. : ANN inconsistency on TSF is the result of differentpreprocessing strategies
Balkin et al. : ANNs require larger number of samples to be trained.Real-world examples, financial etc., are short training samples.
4 / 19
RBF
RBF can be viewed as local linear regression model
Apply Gaussian kernel to input data. All inputs go to node of form:
G (x) = exp
(−x − c
σ2
)(1)
Find center points by assigning c (center point) to each point in dataset (measuring the distance to center point).
This is equivalent to doing a local regression (sigma affects thesmoothing of the approximation).
Output layer (the weights) are trained using least-squares regression
5 / 19
Generalized Definition for Regression
Computation of most probable value of Y for each value of X basedon finite number of possibly noisy measurements of X
Conditional mean of y given X (regression of y on X ) is given by:
E [y |X] =
∫ ∞−∞
yf (X, y)dy∫ ∞−∞
f (X, y)dy
(2)
Since we don’t typically know the density function f(X, y) it can beestimated using a Parzen window density estimator.
6 / 19
Generalized Definition for Regression
The generalized definition yields the following regression function:
ˆY (X) =
n∑i=1
Y iexp(− D2
i2σ2
)n∑
i=1exp
(− D2
i2σ2
) (3)
Where D2i = (X− Xi )T (X− Xi )
In the case of GRNN X is the input data and Xi are the centers.
7 / 19
GRNN
G (x , xi ) are the standard radial basis functions
wi is the generalized regression equation
The spread factor dictates the performance
8 / 19
Claimed Benefits of GRNN
Easy to train
Can accurately approximate functions from sparse and noisy data
Note: Recent paper, Ahmed et al., claim GRNN inferior to MLP forTSF
9 / 19
Methodology Requirements
Minimal human intervention
Computationally efficient for a large number of series
Good forecasting over range of data sets
10 / 19
Preprocessing: Outliers
Real-world time series has outliers
Outliers identified by
|x | ≥ 4max(|ma|, |mb|) (4)
where ma = median(xi−3, xi−2, xi−1) andmb = median(xi+1, xi+2, xi+3)
If x is an outlier the value is replaced with average value of two pointsbefore and after x
11 / 19
Preprocessing: Trends
Real-world time series has trends. Could be due to seasonality or otherfactors.
Common approaches are curve fitting, filtering, and differencing.
Identifying trends is difficult to do algorithmically
Proposes detrending scheme:
Split series into segments. If monthly split into 12 if quarterly split into4Mean of historical observations within each segment is subtracted fromevery historical observation in segment.
If x is an outlier the value is replaced with average value of two pointsbefore and after x
12 / 19
Preprocessing: Seasonality
Identifying seasonality is typically a manual process
Author used a simple approach and defined short series as n ≤ 60 andlong n ≥ 60
Uses autocorrelation coefficients at one and two seasonal lags todecide if seasonal
Uses a standard method for subtracting out seasonality from seriesdata
13 / 19
ANN Modeling
Aspects of ANN modeling
Spread Factor. Typically found empirically since no good analyticapproach has been found. Some guidance was given by Haykinσ = dmax√
2nwhere dmax is max distance between the training points.
Proposes spread factor be set to d50, d75, d95 (percentiles) of thenearest distance of all training samples to rest of points.
Uses three GRNNs that all take the same input and are combined togive the final output.
Choice of combining three GRNNs is based on previous success inliterature
14 / 19
ANN Modeling Cont’d
Input selection is considered one of the most important aspects inTSF
Two general approaches: filter and wrapper
Filtering selects features based on data itself (independent of learningalgorithm)
Wrapping approaches use the learning algorithm. Wrapper typicallyperforms better.
Author uses contiguous lag and limits to one full season for 12 monthdata.
15 / 19
Experimental Results
Use NN3 time-series competition dataset which has composed ofDataset A and Dataset B
Dataset A is 111 monthly time series data drawn from empiricalbusiness time series
Dataset B is a small subset of Dataset A which consists of 11 timeseries
Error is measured using sMAPE
16 / 19
Experimental Results
B indicates statistical model and C indicates computationalintelligence model
17 / 19
Ablation Studies
SP: Spread, MSA: Multiple Step Ahead
18 / 19
Discussion
Are TSF competitions just a demonstration of the no free lunchtheorem? Why is the theorem not mentioned?
Did he prove his approach was “better” or did this approach justoutperform on a particular contest?
Why doesn’t the training of the GRNN factor out outliers andseasonality on its own? Isn’t that what training is for?
Why did he choose a GRNN? Previous papers said they performpoorly.
What kind of bias does the detrending scheme introduce?
Paper was “rule of thumb” oriented. Is there a way to make anautomatic approach more rigorous?
19 / 19