On the robustness of sluggish state-based neural networks ...€¦ · Jane Binner, Jon Tepper, &...

On the robustness of sluggish state-based neural

networks for providing useful insight into the output gap

Jane Binner, Jon Tepper, & Logan Kelly

1. Motivation & Literature

2. Objectives 3. Artificial

Neural Network

4. Methodology & Constraints

5. Data 6. Results 7. Conclusions

Motivation • Central Banks need to put remedial measures

into place at least 12 months ahead of any significant perturbation from targets

• Models that can forecast significant volatility in US key macro-economic data at such horizons will provide policy makers with an early warning system

• To investigate whether instability exists in an unstructured VAR using advances in econometrics



Neural Network



Motivation • Previous success! • Can we test the stability of the output gap using

artificial intelligence techniques? • Multi-recurrent neural networks (MRN) are

powerful class of non-linear ARMA model that utilise multi-stage learning to form sluggish state spaces that can capture complex temporal dependencies

• Is it possible to extract rules from the “black box” to better understand the complex macro relationship between e.g. output gap and prices?

Data Definitions Monthly: 1961 - 2016

31 May 2017 4

Source: FRED, quarterly to monthly data all seasonally adjusted

Variable Description

gdpgap Monthly GDP gap

mprgdp Monthly Potential Real Gross Domestic Product

mrgdp Monthly Real Gross Domestic Product

ophnfb Nonfarm Business Sector: Real Output Per Hour of All Persons

pce Personal Consumption Expenditures: Chain-type Price Index

oil Spot Crude Oil Price: West Texas Intermediate (WTI)

ff Effective Federal Funds Rate

Artificial Neural Networks

Hidden

State:

)(tgW

)(th

)(tfW

Output

Response:

Temporal input window

)(ˆ ty

))()),(),(),...,2(),1((()(ˆ ttptttfgt gf WWxxxy

)1( tx )2( tx )( pt x

….

• Graphical models with layers of inter-

connected simple processing units

• Most common: feed-forward and

recurrent Multi-layered Perceptrons

(MLPs) using non-linear activation

functions such as sigmoid or hyperbolic

tangent

• general recursive rules are applied to

adapt these weights to minimize some

cost function E(W,b)

– Well-understood training algorithms

(Backprop, Conjugate Gradient Descent,

Levenberg-Marquardt)

• Universal function approximators (Hornik,

1991)

• BUT Overfitting and local minima common

2max

1 1

ln

11

2)(

2)(ˆ)(

2

11),(

layrs

l

lm

i j

l

ji

p

t

Wtytyp

bWE

),( bWEW

WWl

ij

l

ij

l

ij

m

ijifijj

Wxbnet1

jj

jj

netnet

netnet

jjee

eeneth

)tanh(

2' ))(tanh(1)( jjj netneth

n

jkjgjkk

Whby1

ˆ

),( bWEb

bbl

i

l

i

l

i



Neural Network



Recurrent Neural Networks • Typically extensions of FF-MLP trained with Backpropagation

– unit activations feedback to the same or preceding layer(s) implementing ‘infinite impulse response filters’

– Output feedback = non-linear Moving Average estimator

– Hidden layer feedback and/or input layer feedback = non-linear Auto Regressive estimator

– Hidden + output feedback results in non-linear ARMA[p,q]

– State units store information from previous time steps and collectively form an internal dynamic memory

– can theoretically represent universal Turing Machines (Siegelmann et al., 1991)

– Theoretically able to model non-stationary processes without requiring any preliminary transformation (Virili and Freisleben, 2000)

– Commonly trained with Backprop Thru Time (BPTT) (Rumelhart & McClelland, 1986; Williams & Peng, 1990)

• Known limitations

– Very difficult to train: long training times, local minima, overfitting

– Error gradients in BPTT degrade rapidly overtime thus quickly lose information about past input data (Bengio, 1994)

BUT

Embedding memory banks within an RNN architecture may compensate for

some of the deficiencies of BPTT (Lin et al 1998; Tepper et al 2016)

• Complex state-space model that combines input, hidden and output feedback

• Context vector represents an internal dynamic memory of varying rigidity

• Course- or fine-grained integration of temporal information determined by the number of

memory banks

Multi-Recurrent Networks (Ulbricht, 1994)

))1()1(()( iijji ztcvtaftcwhere

Simulates non-linear

ARMA[p,q] where p,

q > 1

Outperforms Echo

State Networks

(Tepper et al 2016)

MRN Methodology Model Fitting

– Task: learn to forecast five variables t+1 ahead using the In Sample

– Algorithm: Backprop Thru Time with decaying and

– Training data chunked into independent windows of 12 months (a single t+1 forecast is produced for each of these t-lag windows)

– Finite memory model as memory is reset after each t-lag window

– 5 to 20 trial runs per model configuration with initial random values. Training stopped at RMSE of 0.01 or 2000 epochs.

Model Selection

– A type of cross-validation implemented – three separate models (A, B, C), each with its own training and validation samples

– For each model, top 3 ensemble of MRNs is identified based on performance on respective validation set

– Out of Sample forecasts based on average of the different model forecasts



Neural Network



𝜂 𝜆=0.001

In Sample: Mar’61 – Dec’06



Neural Network



Training: Mar’61 – Sep’2006 Val: Oct – Dec’2006

Model A

Training: Jan’70 – Dec’2004 Val: Jan’2005 – Dec’ 2006

Model B

Training: Jan’80 – Dec’2005 Val: Jan’2006 – Dec’ 2006

Model C

Out of Sample: Jan’07– Dec’16



Neural Network



Model Evaluation (holistic) • Comparison of MRN Ensemble with simple linear VAR model

• Euclidean distance of min-max normalised forecasts

• Individual variable forecast performance evaluated via RMSE

• Errors evaluated for first 12 forecasts and then for full Out of Sample

First 12 Full Out Sample

MRN 0.102 0.281

VAR 0.149 0.261

mjwxy jjj ,...,2,1minarg



Neural Network



But exclude ffr…

MRN fitted to four variable system as excl FFR


MRN 0.0752 0.195

VAR 0.149 0.261

OR 100 hidden nodes (45,105 pmtrs)


MRN 0.0794 0.126

VAR 0.149 0.261

Model Evaluation (gdpgap)


MRN 0.0072 0.0247

VAR 0.0107 0.0237

But exclude ffr…


Graph does not show plot for MRN Model A


MRN Model A 0.0036 0.0084

MRN 0.0046 0.0158

VAR 0.0107 0.0237





MRN Model A 0.0055 0.0114

MRN 0.0046 0.0158

VAR 0.0107 0.0237

Model Evaluation (opnf)


MRN 1.593 3.343

VAR 0.496 13.108

Model Evaluation (pce)


MRN 3.091 9.480

VAR 0.9935 4.007

Model Evaluation (oil)


MRN 11.568 31.566

VAR 19.079 57.966

But exclude ffr…


MRN Model A 6.585 13.264

MRN 8.486 15.186

VAR 19.079 57.966





MRN Model A 7.190 22.336

MRN 8.486 15.186

VAR 19.079 57.966



Model Evaluation (ffr)


MRN 0.2854 0.3527

VAR 0.5611 1.3506

Results (MRN Model Responses)



Neural Network



Conclusions

31 May 2017 26

• Early indications are that the MRNs provide stable and robust output gap forecasts that fit the data well and outperform Var throughout the entire test period

• MRNs prefer more available data and less validation during training – over-fitting can be better controlled with the regularization term or ‘drop out’

• FFR affected the ability of MRN to capture general non-linearities – needs investigating



Neural Network



Conclusions

31 May 2017 27

• Early indications are that the MRNs provide stable and robust output gap forecasts that fit the data well and outperform Var throughout the entire test period

• We need to calculate the “with shock” and “without shock” forecasts and find the difference, to explore potential of the methodology to estimate “impulse response functions”

• Future work: opening up the MRN to test assumptions underlying e.g. along the lines of DSGE models, and business cycle models



Neural Network





Neural Network



Thank you! Any Questions?

Date post:	28-Jan-2021
Category:	Documents
Upload:	others
View:	1 times
Download:	0 times

On the robustness of sluggish state-based neural networks ...€¦ · Jane Binner, Jon Tepper, &...

Documents