On the robustness of sluggish state-based neural
networks for providing useful insight into the output gap
Jane Binner, Jon Tepper, & Logan Kelly
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
Motivation • Central Banks need to put remedial measures
into place at least 12 months ahead of any significant perturbation from targets
• Models that can forecast significant volatility in US key macro-economic data at such horizons will provide policy makers with an early warning system
• To investigate whether instability exists in an unstructured VAR using advances in econometrics
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
Motivation • Previous success! • Can we test the stability of the output gap using
artificial intelligence techniques? • Multi-recurrent neural networks (MRN) are
powerful class of non-linear ARMA model that utilise multi-stage learning to form sluggish state spaces that can capture complex temporal dependencies
• Is it possible to extract rules from the “black box” to better understand the complex macro relationship between e.g. output gap and prices?
Data Definitions Monthly: 1961 - 2016
31 May 2017 4
Source: FRED, quarterly to monthly data all seasonally adjusted
Variable Description
gdpgap Monthly GDP gap
mprgdp Monthly Potential Real Gross Domestic Product
mrgdp Monthly Real Gross Domestic Product
ophnfb Nonfarm Business Sector: Real Output Per Hour of All Persons
pce Personal Consumption Expenditures: Chain-type Price Index
oil Spot Crude Oil Price: West Texas Intermediate (WTI)
ff Effective Federal Funds Rate
Artificial Neural Networks
Hidden
State:
)(tgW
)(th
)(tfW
Output
Response:
Temporal input window
)(ˆ ty
))()),(),(),...,2(),1((()(ˆ ttptttfgt gf WWxxxy
)1( tx )2( tx )( pt x
….
• Graphical models with layers of inter-
connected simple processing units
• Most common: feed-forward and
recurrent Multi-layered Perceptrons
(MLPs) using non-linear activation
functions such as sigmoid or hyperbolic
tangent
• general recursive rules are applied to
adapt these weights to minimize some
cost function E(W,b)
– Well-understood training algorithms
(Backprop, Conjugate Gradient Descent,
Levenberg-Marquardt)
• Universal function approximators (Hornik,
1991)
• BUT Overfitting and local minima common
2max
1 1
ln
11
2)(
2)(ˆ)(
2
11),(
layrs
l
lm
i j
l
ji
p
t
Wtytyp
bWE
),( bWEW
WWl
ij
l
ij
l
ij
m
ijifijj
Wxbnet1
jj
jj
netnet
netnet
jjee
eeneth
)tanh(
2' ))(tanh(1)( jjj netneth
n
jkjgjkk
Whby1
ˆ
),( bWEb
bbl
i
l
i
l
i
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
Recurrent Neural Networks • Typically extensions of FF-MLP trained with Backpropagation
– unit activations feedback to the same or preceding layer(s) implementing ‘infinite impulse response filters’
– Output feedback = non-linear Moving Average estimator
– Hidden layer feedback and/or input layer feedback = non-linear Auto Regressive estimator
– Hidden + output feedback results in non-linear ARMA[p,q]
– State units store information from previous time steps and collectively form an internal dynamic memory
– can theoretically represent universal Turing Machines (Siegelmann et al., 1991)
– Theoretically able to model non-stationary processes without requiring any preliminary transformation (Virili and Freisleben, 2000)
– Commonly trained with Backprop Thru Time (BPTT) (Rumelhart & McClelland, 1986; Williams & Peng, 1990)
• Known limitations
– Very difficult to train: long training times, local minima, overfitting
– Error gradients in BPTT degrade rapidly overtime thus quickly lose information about past input data (Bengio, 1994)
BUT
Embedding memory banks within an RNN architecture may compensate for
some of the deficiencies of BPTT (Lin et al 1998; Tepper et al 2016)
• Complex state-space model that combines input, hidden and output feedback
• Context vector represents an internal dynamic memory of varying rigidity
• Course- or fine-grained integration of temporal information determined by the number of
memory banks
Multi-Recurrent Networks (Ulbricht, 1994)
))1()1(()( iijji ztcvtaftcwhere
Simulates non-linear
ARMA[p,q] where p,
q > 1
Outperforms Echo
State Networks
(Tepper et al 2016)
MRN Methodology Model Fitting
– Task: learn to forecast five variables t+1 ahead using the In Sample
– Algorithm: Backprop Thru Time with decaying and
– Training data chunked into independent windows of 12 months (a single t+1 forecast is produced for each of these t-lag windows)
– Finite memory model as memory is reset after each t-lag window
– 5 to 20 trial runs per model configuration with initial random values. Training stopped at RMSE of 0.01 or 2000 epochs.
Model Selection
– A type of cross-validation implemented – three separate models (A, B, C), each with its own training and validation samples
– For each model, top 3 ensemble of MRNs is identified based on performance on respective validation set
– Out of Sample forecasts based on average of the different model forecasts
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
𝜂 𝜆=0.001
In Sample: Mar’61 – Dec’06
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
Training: Mar’61 – Sep’2006 Val: Oct – Dec’2006
Model A
Training: Jan’70 – Dec’2004 Val: Jan’2005 – Dec’ 2006
Model B
Training: Jan’80 – Dec’2005 Val: Jan’2006 – Dec’ 2006
Model C
Out of Sample: Jan’07– Dec’16
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
Model Evaluation (holistic) • Comparison of MRN Ensemble with simple linear VAR model
• Euclidean distance of min-max normalised forecasts
• Individual variable forecast performance evaluated via RMSE
• Errors evaluated for first 12 forecasts and then for full Out of Sample
First 12 Full Out Sample
MRN 0.102 0.281
VAR 0.149 0.261
mjwxy jjj ,...,2,1minarg
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
But exclude ffr…
MRN fitted to four variable system as excl FFR
First 12 Full Out Sample
MRN 0.0752 0.195
VAR 0.149 0.261
OR 100 hidden nodes (45,105 pmtrs)
First 12 Full Out Sample
MRN 0.0794 0.126
VAR 0.149 0.261
Model Evaluation (gdpgap)
First 12 Full Out Sample
MRN 0.0072 0.0247
VAR 0.0107 0.0237
But exclude ffr…
MRN fitted to four variable system as excl FFR
Graph does not show plot for MRN Model A
First 12 Full Out Sample
MRN Model A 0.0036 0.0084
MRN 0.0046 0.0158
VAR 0.0107 0.0237
OR 100 hidden nodes (45,105 pmtrs)
MRN fitted to four variable system as excl FFR
Graph does not show plot for MRN Model A
First 12 Full Out Sample
MRN Model A 0.0055 0.0114
MRN 0.0046 0.0158
VAR 0.0107 0.0237
Model Evaluation (opnf)
First 12 Full Out Sample
MRN 1.593 3.343
VAR 0.496 13.108
Model Evaluation (pce)
First 12 Full Out Sample
MRN 3.091 9.480
VAR 0.9935 4.007
Model Evaluation (oil)
First 12 Full Out Sample
MRN 11.568 31.566
VAR 19.079 57.966
But exclude ffr…
First 12 Full Out Sample
MRN Model A 6.585 13.264
MRN 8.486 15.186
VAR 19.079 57.966
MRN fitted to four variable system as excl FFR
Graph does not show plot for MRN Model A
OR 100 hidden nodes (45,105 pmtrs)
First 12 Full Out Sample
MRN Model A 7.190 22.336
MRN 8.486 15.186
VAR 19.079 57.966
MRN fitted to four variable system as excl FFR
Graph does not show plot for MRN Model A
Model Evaluation (ffr)
First 12 Full Out Sample
MRN 0.2854 0.3527
VAR 0.5611 1.3506
Results (MRN Model Responses)
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
Results (MRN Model Responses)
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
Conclusions
31 May 2017 26
• Early indications are that the MRNs provide stable and robust output gap forecasts that fit the data well and outperform Var throughout the entire test period
• MRNs prefer more available data and less validation during training – over-fitting can be better controlled with the regularization term or ‘drop out’
• FFR affected the ability of MRN to capture general non-linearities – needs investigating
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
Conclusions
31 May 2017 27
• Early indications are that the MRNs provide stable and robust output gap forecasts that fit the data well and outperform Var throughout the entire test period
• We need to calculate the “with shock” and “without shock” forecasts and find the difference, to explore potential of the methodology to estimate “impulse response functions”
• Future work: opening up the MRN to test assumptions underlying e.g. along the lines of DSGE models, and business cycle models
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
1. Motivation & Literature
2. Objectives 3. Artificial
Neural Network
4. Methodology & Constraints
5. Data 6. Results 7. Conclusions
Thank you! Any Questions?