Deep Energy-Based NARX Models
Johannes N. Hendriks1, Fredrik K. Gustafsson2
Antonio H. Ribeiro2, Adrian G. Wills1,Thomas B. Schon2
1The University of Newcastle, Australia2Uppsala University, Sweden
Workshop on Nonlinear System Identification Benchmarks2021
Motivation
Common performance criteria such as maximum-likelihood orprediction-error criteria usually involve assumptions about
uncertainty, be they explicit or implicit
Nonlinear SysId Benchmarks, 2021 1 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Nonlinear ARX model (Gaussian noise)
§ Data model:yt “ fθpyt´1, ut´1q ` et ,
where et „ N p0, σq.
§ fθ ù model structure.
§ Maximum likelihood estimator:
pθ “ arg minθ
Tÿ
t“1
}yt ´ fθpyt´1, ut´1q}2.
Nonlinear SysId Benchmarks, 2021 2 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Nonlinear ARX model (Gaussian noise)
§ Data model:yt “ fθpyt´1, ut´1q ` et ,
where et „ N p0, σq.§ fθ ù model structure.
§ Maximum likelihood estimator:
pθ “ arg minθ
Tÿ
t“1
}yt ´ fθpyt´1, ut´1q}2.
Nonlinear SysId Benchmarks, 2021 2 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Nonlinear ARX model (Gaussian noise)
§ Data model:yt “ fθpyt´1, ut´1q ` et ,
where et „ N p0, σq.§ fθ ù model structure.
§ Maximum likelihood estimator:
pθ “ arg minθ
Tÿ
t“1
}yt ´ fθpyt´1, ut´1q}2.
Nonlinear SysId Benchmarks, 2021 2 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Energy-based NARX models
§ Arbitrary distributions:
yt |pyt´1, ut´1q „ pθpyt |yt´1, ut´1q,
§ Energy-based model:
pθpyt |yt´1, ut´1q “egθpyt ,yt´1,ut´1q
ş
egθpγ,yt´1,ut´1q dγ,
Gustafsson, F.K., Danelljan, M., Bhat, G., and Schon,T.B. (2020). Energy-based models for deepprobabilistic regression. In Proceedings of the European Conference on Computer Vision (ECCV)
§ gθ ù Highly flexible structure: in our case a neural network.
Nonlinear SysId Benchmarks, 2021 3 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Energy-based NARX models
§ Arbitrary distributions:
yt |pyt´1, ut´1q „ pθpyt |yt´1, ut´1q,
§ Energy-based model:
pθpyt |yt´1, ut´1q “egθpyt ,yt´1,ut´1q
ş
egθpγ,yt´1,ut´1q dγ,
Gustafsson, F.K., Danelljan, M., Bhat, G., and Schon,T.B. (2020). Energy-based models for deepprobabilistic regression. In Proceedings of the European Conference on Computer Vision (ECCV)
§ gθ ù Highly flexible structure: in our case a neural network.
Nonlinear SysId Benchmarks, 2021 3 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Energy-based NARX models
§ Arbitrary distributions:
yt |pyt´1, ut´1q „ pθpyt |yt´1, ut´1q,
§ Energy-based model:
pθpyt |yt´1, ut´1q “egθpyt ,yt´1,ut´1q
ş
egθpγ,yt´1,ut´1q dγ,
Gustafsson, F.K., Danelljan, M., Bhat, G., and Schon,T.B. (2020). Energy-based models for deepprobabilistic regression. In Proceedings of the European Conference on Computer Vision (ECCV)
§ gθ ù Highly flexible structure: in our case a neural network.
Nonlinear SysId Benchmarks, 2021 3 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Model training
§ Maximum likelihood
pθ “ arg maxθ
Tÿ
i“1
´ log pθpyt | yt´1, ut´1q
“ arg minθ
Tÿ
t“1
ˆ
´gθpyt , xtq ` ln
ż
egθpγ,xtq dγ
˙
§ Noise contrastive estimation:Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle forunnormalized statistical models. In Proceedings of the International Conference on Artificial Intelligenceand Statistics (AISTATS), 297–304
Nonlinear SysId Benchmarks, 2021 4 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Model training
§ Maximum likelihood
pθ “ arg maxθ
Tÿ
i“1
´ log pθpyt | yt´1, ut´1q
“ arg minθ
Tÿ
t“1
ˆ
´gθpyt , xtq ` ln
ż
egθpγ,xtq dγ
˙
§ Noise contrastive estimation:Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle forunnormalized statistical models. In Proceedings of the International Conference on Artificial Intelligenceand Statistics (AISTATS), 297–304
Nonlinear SysId Benchmarks, 2021 4 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Model training
§ Maximum likelihood
pθ “ arg maxθ
Tÿ
i“1
´ log pθpyt | yt´1, ut´1q
“ arg minθ
Tÿ
t“1
ˆ
´gθpyt , xtq ` ln
ż
egθpγ,xtq dγ
˙
§ Noise contrastive estimation:Gutmann, M. and Hyvarinen, A. (2010). Noise-contrastive estimation: A new estimation principle forunnormalized statistical models. In Proceedings of the International Conference on Artificial Intelligenceand Statistics (AISTATS), 297–304
Nonlinear SysId Benchmarks, 2021 4 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 1: AR model
yt “ 0.95yt´1 ` et .
Figure: Gaussian error et
Nonlinear SysId Benchmarks, 2021 5 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 1: AR model
yt “ 0.95yt´1 ` et .
Figure: Gaussian mixture error et
Nonlinear SysId Benchmarks, 2021 5 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 1: AR model
yt “ 0.95yt´1 ` et .
Figure: Cauchy error et
Nonlinear SysId Benchmarks, 2021 5 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 1: AR model
yt “ 0.95yt´1 ` et .
Figure: Gaussian error et with conditional variance
Nonlinear SysId Benchmarks, 2021 5 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 2: ARX model
yt “ 1.5yt´1 ´ 0.7yt´2 ` ut´1 ` 0.5ut´2 ` et ,
(a) Sequence (b) t=56
Figure: Estimates of pθpyt |xtq for a validation data.
Nonlinear SysId Benchmarks, 2021 6 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 3: nonlinear model
Model:
y˚t “
´
0.8´ 0.5e´y˚2t´1
¯
y˚t´1 ´
´
0.3` 0.9e´y˚2t´1
¯
y˚t´2
` ut´1 ` 0.2ut´2 ` 0.1ut´1ut´2 ` vt ,
yt “yt ` wt
Process and output error:
vt „N p0, σ2v q
wt „N p0, σ2v q Figure: System only with
process noise. Input inblue and output in red.
Chen, S., Billings, S.A., and Grant, P.M. (1990). Non-Linear System Identification Using Neural Networks.International Journal of Control, 51(6), 1191–1214.
Nonlinear SysId Benchmarks, 2021 7 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 3: nonlinear model
Table: Simulated nonlinear MSE on the validation set for the fullyconnected network (FCN) NARX model and EB-NARX model
N “ 100 N “ 250 N “ 500FCN EB-NARX FCN EB-NARX FCN EB-NARX
σ “ 0.1 0.122 0.099 0.069 0.070 0.057 0.054σ “ 0.3 0.398 0.390 0.353 0.354 0.289 0.308σ “ 0.5 0.860 0.869 0.809 0.822 0.754 0.779
Nonlinear SysId Benchmarks, 2021 8 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 3: nonlinear model
(a) Sequence (b) t=56
Figure: Estimates of pθpyt |xtq for a validation data.
Nonlinear SysId Benchmarks, 2021 9 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 4: Coupled Electric Drives
Figure: Illustration of the CE8 coupled electric drives system
Wigren, T. and Schoukens, M. (2017). Coupled electric drives data set and reference models. Technical Report.Uppsala Universitet, 2017
Nonlinear SysId Benchmarks, 2021 10 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 4: Coupled Electric Drives
Figure: pθpyt |xtq sequence
Nonlinear SysId Benchmarks, 2021 11 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 4: Coupled Electric Drives
Figure: t “ 40
Nonlinear SysId Benchmarks, 2021 11 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 4: Coupled Electric Drives
Figure: t “ 57
Nonlinear SysId Benchmarks, 2021 11 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Example 4: Coupled Electric Drives
Figure: t “ 60
Nonlinear SysId Benchmarks, 2021 11 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Conclusion
§ Energy based NARX learns the full conditional distributionrather than the point estimate.
§ The current paper only considers one-step-ahead predictionsand not multi-step-ahead predictions.
§ Propagate MAP point estimates vs probabilities.
§ Studying energy-based models for nonlinear ARMAX, outputerror and other types of models that can handle more generalnoise types.
Nonlinear SysId Benchmarks, 2021 12 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Conclusion
§ Energy based NARX learns the full conditional distributionrather than the point estimate.
§ The current paper only considers one-step-ahead predictionsand not multi-step-ahead predictions.
§ Propagate MAP point estimates vs probabilities.
§ Studying energy-based models for nonlinear ARMAX, outputerror and other types of models that can handle more generalnoise types.
Nonlinear SysId Benchmarks, 2021 12 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Conclusion
§ Energy based NARX learns the full conditional distributionrather than the point estimate.
§ The current paper only considers one-step-ahead predictionsand not multi-step-ahead predictions.
§ Propagate MAP point estimates vs probabilities.
§ Studying energy-based models for nonlinear ARMAX, outputerror and other types of models that can handle more generalnoise types.
Nonlinear SysId Benchmarks, 2021 12 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Conclusion
§ Energy based NARX learns the full conditional distributionrather than the point estimate.
§ The current paper only considers one-step-ahead predictionsand not multi-step-ahead predictions.
§ Propagate MAP point estimates vs probabilities.
§ Studying energy-based models for nonlinear ARMAX, outputerror and other types of models that can handle more generalnoise types.
Nonlinear SysId Benchmarks, 2021 12 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Thank you!
To appear in the 19th IFAC Symposium in System Identification.
Paper: https://arxiv.org/abs/2012.04136
Code: https://github.com/jnh277/ebm arx
Contact:[email protected]@[email protected]@[email protected]
Nonlinear SysId Benchmarks, 2021 13 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Thank you!
To appear in the 19th IFAC Symposium in System Identification.
Paper: https://arxiv.org/abs/2012.04136
Code: https://github.com/jnh277/ebm arx
Contact:[email protected]@[email protected]@[email protected]
Nonlinear SysId Benchmarks, 2021 13 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon
Thank you!
To appear in the 19th IFAC Symposium in System Identification.
Paper: https://arxiv.org/abs/2012.04136
Code: https://github.com/jnh277/ebm arx
Contact:[email protected]@[email protected]@[email protected]
Nonlinear SysId Benchmarks, 2021 13 / 13 Hendriks, Gustafsson, Ribeiro, Wills, Schon