+ All Categories
Home > Documents > On the robustness of sluggish state-based neural networks ...€¦ · Jane Binner, Jon Tepper, &...

On the robustness of sluggish state-based neural networks ...€¦ · Jane Binner, Jon Tepper, &...

Date post: 28-Jan-2021
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
27
On the robustness of sluggish state-based neural networks for providing useful insight into the output gap Jane Binner, Jon Tepper, & Logan Kelly 1. Motivation & Literature 2. Objectives 3. Artificial Neural Network 4. Methodology & Constraints 5. Data 6. Results 7. Conclusions
Transcript
  • On the robustness of sluggish state-based neural

    networks for providing useful insight into the output gap

    Jane Binner, Jon Tepper, & Logan Kelly

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

  • Motivation • Central Banks need to put remedial measures

    into place at least 12 months ahead of any significant perturbation from targets

    • Models that can forecast significant volatility in US key macro-economic data at such horizons will provide policy makers with an early warning system

    • To investigate whether instability exists in an unstructured VAR using advances in econometrics

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

  • Motivation • Previous success! • Can we test the stability of the output gap using

    artificial intelligence techniques? • Multi-recurrent neural networks (MRN) are

    powerful class of non-linear ARMA model that utilise multi-stage learning to form sluggish state spaces that can capture complex temporal dependencies

    • Is it possible to extract rules from the “black box” to better understand the complex macro relationship between e.g. output gap and prices?

  • Data Definitions Monthly: 1961 - 2016

    31 May 2017 4

    Source: FRED, quarterly to monthly data all seasonally adjusted

    Variable Description

    gdpgap Monthly GDP gap

    mprgdp Monthly Potential Real Gross Domestic Product

    mrgdp Monthly Real Gross Domestic Product

    ophnfb Nonfarm Business Sector: Real Output Per Hour of All Persons

    pce Personal Consumption Expenditures: Chain-type Price Index

    oil Spot Crude Oil Price: West Texas Intermediate (WTI)

    ff Effective Federal Funds Rate

  • Artificial Neural Networks

    Hidden

    State:

    )(tgW

    )(th

    )(tfW

    Output

    Response:

    Temporal input window

    )(ˆ ty

    ))()),(),(),...,2(),1((()(ˆ ttptttfgt gf WWxxxy

    )1( tx )2( tx )( pt x

    ….

    • Graphical models with layers of inter-

    connected simple processing units

    • Most common: feed-forward and

    recurrent Multi-layered Perceptrons

    (MLPs) using non-linear activation

    functions such as sigmoid or hyperbolic

    tangent

    • general recursive rules are applied to

    adapt these weights to minimize some

    cost function E(W,b)

    – Well-understood training algorithms

    (Backprop, Conjugate Gradient Descent,

    Levenberg-Marquardt)

    • Universal function approximators (Hornik,

    1991)

    • BUT Overfitting and local minima common

    2max

    1 1

    ln

    11

    2)(

    2)(ˆ)(

    2

    11),(

    layrs

    l

    lm

    i j

    l

    ji

    p

    t

    Wtytyp

    bWE

    ),( bWEW

    WWl

    ij

    l

    ij

    l

    ij

    m

    ijifijj

    Wxbnet1

    jj

    jj

    netnet

    netnet

    jjee

    eeneth

    )tanh(

    2' ))(tanh(1)( jjj netneth

    n

    jkjgjkk

    Whby1

    ˆ

    ),( bWEb

    bbl

    i

    l

    i

    l

    i

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

  • Recurrent Neural Networks • Typically extensions of FF-MLP trained with Backpropagation

    – unit activations feedback to the same or preceding layer(s) implementing ‘infinite impulse response filters’

    – Output feedback = non-linear Moving Average estimator

    – Hidden layer feedback and/or input layer feedback = non-linear Auto Regressive estimator

    – Hidden + output feedback results in non-linear ARMA[p,q]

    – State units store information from previous time steps and collectively form an internal dynamic memory

    – can theoretically represent universal Turing Machines (Siegelmann et al., 1991)

    – Theoretically able to model non-stationary processes without requiring any preliminary transformation (Virili and Freisleben, 2000)

    – Commonly trained with Backprop Thru Time (BPTT) (Rumelhart & McClelland, 1986; Williams & Peng, 1990)

    • Known limitations

    – Very difficult to train: long training times, local minima, overfitting

    – Error gradients in BPTT degrade rapidly overtime thus quickly lose information about past input data (Bengio, 1994)

    BUT

    Embedding memory banks within an RNN architecture may compensate for

    some of the deficiencies of BPTT (Lin et al 1998; Tepper et al 2016)

  • • Complex state-space model that combines input, hidden and output feedback

    • Context vector represents an internal dynamic memory of varying rigidity

    • Course- or fine-grained integration of temporal information determined by the number of

    memory banks

    Multi-Recurrent Networks (Ulbricht, 1994)

    ))1()1(()( iijji ztcvtaftcwhere

    Simulates non-linear

    ARMA[p,q] where p,

    q > 1

    Outperforms Echo

    State Networks

    (Tepper et al 2016)

  • MRN Methodology Model Fitting

    – Task: learn to forecast five variables t+1 ahead using the In Sample

    – Algorithm: Backprop Thru Time with decaying and

    – Training data chunked into independent windows of 12 months (a single t+1 forecast is produced for each of these t-lag windows)

    – Finite memory model as memory is reset after each t-lag window

    – 5 to 20 trial runs per model configuration with initial random values. Training stopped at RMSE of 0.01 or 2000 epochs.

    Model Selection

    – A type of cross-validation implemented – three separate models (A, B, C), each with its own training and validation samples

    – For each model, top 3 ensemble of MRNs is identified based on performance on respective validation set

    – Out of Sample forecasts based on average of the different model forecasts

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

    𝜂 𝜆=0.001

  • In Sample: Mar’61 – Dec’06

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

    Training: Mar’61 – Sep’2006 Val: Oct – Dec’2006

    Model A

    Training: Jan’70 – Dec’2004 Val: Jan’2005 – Dec’ 2006

    Model B

    Training: Jan’80 – Dec’2005 Val: Jan’2006 – Dec’ 2006

    Model C

  • Out of Sample: Jan’07– Dec’16

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

  • Model Evaluation (holistic) • Comparison of MRN Ensemble with simple linear VAR model

    • Euclidean distance of min-max normalised forecasts

    • Individual variable forecast performance evaluated via RMSE

    • Errors evaluated for first 12 forecasts and then for full Out of Sample

    First 12 Full Out Sample

    MRN 0.102 0.281

    VAR 0.149 0.261

    mjwxy jjj ,...,2,1minarg

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

  • But exclude ffr…

    MRN fitted to four variable system as excl FFR

    First 12 Full Out Sample

    MRN 0.0752 0.195

    VAR 0.149 0.261

  • OR 100 hidden nodes (45,105 pmtrs)

    First 12 Full Out Sample

    MRN 0.0794 0.126

    VAR 0.149 0.261

  • Model Evaluation (gdpgap)

    First 12 Full Out Sample

    MRN 0.0072 0.0247

    VAR 0.0107 0.0237

  • But exclude ffr…

    MRN fitted to four variable system as excl FFR

    Graph does not show plot for MRN Model A

    First 12 Full Out Sample

    MRN Model A 0.0036 0.0084

    MRN 0.0046 0.0158

    VAR 0.0107 0.0237

  • OR 100 hidden nodes (45,105 pmtrs)

    MRN fitted to four variable system as excl FFR

    Graph does not show plot for MRN Model A

    First 12 Full Out Sample

    MRN Model A 0.0055 0.0114

    MRN 0.0046 0.0158

    VAR 0.0107 0.0237

  • Model Evaluation (opnf)

    First 12 Full Out Sample

    MRN 1.593 3.343

    VAR 0.496 13.108

  • Model Evaluation (pce)

    First 12 Full Out Sample

    MRN 3.091 9.480

    VAR 0.9935 4.007

  • Model Evaluation (oil)

    First 12 Full Out Sample

    MRN 11.568 31.566

    VAR 19.079 57.966

  • But exclude ffr…

    First 12 Full Out Sample

    MRN Model A 6.585 13.264

    MRN 8.486 15.186

    VAR 19.079 57.966

    MRN fitted to four variable system as excl FFR

    Graph does not show plot for MRN Model A

  • OR 100 hidden nodes (45,105 pmtrs)

    First 12 Full Out Sample

    MRN Model A 7.190 22.336

    MRN 8.486 15.186

    VAR 19.079 57.966

    MRN fitted to four variable system as excl FFR

    Graph does not show plot for MRN Model A

  • Model Evaluation (ffr)

    First 12 Full Out Sample

    MRN 0.2854 0.3527

    VAR 0.5611 1.3506

  • Results (MRN Model Responses)

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

  • Results (MRN Model Responses)

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

  • Conclusions

    31 May 2017 26

    • Early indications are that the MRNs provide stable and robust output gap forecasts that fit the data well and outperform Var throughout the entire test period

    • MRNs prefer more available data and less validation during training – over-fitting can be better controlled with the regularization term or ‘drop out’

    • FFR affected the ability of MRN to capture general non-linearities – needs investigating

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

  • Conclusions

    31 May 2017 27

    • Early indications are that the MRNs provide stable and robust output gap forecasts that fit the data well and outperform Var throughout the entire test period

    • We need to calculate the “with shock” and “without shock” forecasts and find the difference, to explore potential of the methodology to estimate “impulse response functions”

    • Future work: opening up the MRN to test assumptions underlying e.g. along the lines of DSGE models, and business cycle models

    1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

  • 1. Motivation & Literature

    2. Objectives 3. Artificial

    Neural Network

    4. Methodology & Constraints

    5. Data 6. Results 7. Conclusions

    Thank you! Any Questions?


Recommended