A STUDY OF
NEURAL NETWORKS AND MULTIPLE NEURAL
NETWORKS
IN MAKING SHORT-TERM AND LONG-TERM TIME-SERIES
PREDICTION
OF PETROLEUM PRODUCTION AND GAS CONSUMPTION
A Thesis
Submitted to the Faculty of Graduate Studies and Research
In Partial Fulfillment of the Requirements
for the Degree of
Master of Science
in
Computer Science
University of Regina
by
Hanh Hong Nguyen
Regina, Saskatchewan
November, 2002
Copyright 2002: H.H. Nguyen
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A STUDY OF
NEURAL NETWORKS AND MULTIPLE NEURAL
NETWORKS
IN MAKING SHORT-TERM AND LONG-TERM TIME-SERIES
PREDICTION
OF PETROLEUM PRODUCTION AND GAS CONSUMPTION
A Thesis
Submitted to the Faculty of Graduate Studies and Research
In Partial Fulfillment of the Requirements
for the Degree of
Master of Science
in
Computer Science
University of Regina
by
Hanh Hong Nguyen
Regina, Saskatchewan
November, 2002
Copyright 2002: H.H. Nguyen
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
National Library of Canada
Acquisitions and Bibliographic Services
395 Wellington Street Ottawa ON KlA ON4 Canada
Bibliotheque nationale du Canada
Acquisisitons et services bibliographiques
395, rue Wellington Ottawa ON KlA ON4 Canada
The author has granted a non-exclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
Canada
Your file Votre reference ISBN: 0-612-82633-3 Our file Notre reference ISBN: 0-612-82633-3
L'auteur a accorde une licence non exclusive permettant a la Bibliotheque nationale du Canada de reproduire, preter, distribuer ou vendre des copies de cette these sous la forme de microfiche/film, de reproduction sur papier ou sur format electronique.
L'auteur conserve la propriete du droit d'auteur qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou aturement reproduits sans son autorisation.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
National Library of C an ad a
A cquisitions and Bibliographic S erv ices
395 Wellington Street Ottawa ON K1A 0N4 Canada
B ibliotheque nationale du C an a d a
A cquisisitons et se rv ic es b ib liographiques
395, rue Wellington Ottawa ON K1A 0N4 Canada
Your file Votre reference ISBN: 0-612-82633-3 Our file Notre reference ISBN: 0-612-82633-3
The author has granted a nonexclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of this thesis in microform, paper or electronic formats.
The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts from it may be printed or otherwise reproduced without the author's permission.
L'auteur a accorde une licence non exclusive permettant a la Bibliotheque nationale du Canada de reproduire, preter, distribuer ou vendre des copies de cette these sous la forme de microfiche/film, de reproduction sur papier ou sur format electronique.
L'auteur conserve la propriete du droit d'auteur qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent etre imprimes ou aturement reproduits sans son autorisation.
CanadaReproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UNIVERSITY OF REGINA
FACULTY OF GRADUATE STUDIES AND RESEARCH
CERTIFICATION OF THESIS WORK
We, the undersigned, certify that Hanh Hong Nguyen, candidate for the Degree of Master of Science, has presented a thesis titled A Study of Neural Networks and Multiple Neural Networks in Making Short-Term and Long-Term Time-Series Prediction of Petroleum Production and Gas Consumption, that the thesis is acceptable in form and content, and that the student demonstrated a satisfactory knowledge of the field covered by the thesis in an oral examination held ecember 13, 2002.
External Examiner:
Internal Examiners:
ordon Huang, Faculty of Engineering
Dr. Christine Chan, Supervisor
•
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UNIVERSITY OF REGINA
FACULTY OF GRADUATE STUDIES AND RESEARCH
CERTIFICATION OF THESIS WORK
We, the undersigned, certify that Hanh Hong Nguyen, candidate for the Degree of M aster of Science, has presented a thesis titled A Study o f Neural Networks and Multiple Neural Networks in Making Short-Term and Long-Term Time-Series Prediction o f Petroleum Production and Gas Consumption, that the thesis is acceptable in form and content, and that the studen t dem onstrated a satisfactory knowledge of the field coveredby the thesis in an oral examination held lecem ber 13, 2002.
External Examiner:p f^ jo rd o n Huang, Faculty of Engineering
Internal Examiners:Dr. Christine Chan, Supervisor
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UNIVERSITY OF REGINA
FACULTY OF GRADUATE STUDIES AND RESEARCH
PERMISSION TO USE POSTGRADUATE THESIS
TITLE OF THESIS: A Study of Neural Networks and Multiple Neural Networks in Making Short-Term and Long-Term Time-Series Prediction of Petroleum Production and Gas Consumption
NAME OF AUTHOR: Hanh Hong Nguyen
DEGREE: Master of Science
In presenting this thesis in partial fulfillment of the requirements for a postgraduate degree from the University of Regina, I agree that the Libraries of this University shall make it freely available for inspection. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the professor or professors who supervised my thesis work, or in their absence, by the Head of the Department or the Dean of the Faculty in which my thesis work was done. It is understood that with the exception of UMI Dissertations Publishing (UMI) that any copying, publication or use of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to me and to the University of Regina in any scholarly use which may be made of my material in my thesis.
SIGNATURE:
DATE: Dec 13t ' zo 02-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UNIVERSITY OF REGINA
FACULTY OF GRADUATE STUDIES AND RESEARCH
PERMISSION TO USE POSTGRADUATE THESIS
TITLE OF THESIS: A Study of Neural Networks and Multiple Neural Networks in Making Short-Term andLong-Term Tim e-Series Prediction of Petroleum Production and G as Consumption
NAME OF AUTHOR: Hanh Hong Nguyen
DEGREE: Master of Science
In presenting this thesis in partial fulfillment of the requirem ents for a postgraduate degree from the University of Regina, I ag ree that the Libraries of this University shall m ake it freely available for inspection. I further agree that permission for extensive copying of this thesis for scholarly purposes may be granted by the professor or professors who supervised my thesis work, or in their absence, by the Head of the Departm ent or the Dean of the Faculty in which my thesis work was done. It is understood that with the exception of UMI Dissertations Publishing (UMI) that any copying, publication or u se of this thesis or parts thereof for financial gain shall not be allowed without my written permission. It is also understood that due recognition shall be given to m e and to the University of Regina in any scholarly use which may be m ade of my material in my thesis.
SIGNATURE:
DATE: D | 3 " ^ 2.0 0 2-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract
The task of modeling data is difficult when the data of some variables are unavailable
either totally or partially during the examined time span. Lacking the data, it is at times
impossible to model causal relationships between those variables and the variable to be
forecasted. In such a case, a possible solution is to use univariate time series modeling
where the historical data of the variable of interest is used to develop a model. In this
thesis, a univariate time series approach, using solely the petroleum production and gas
flow rate respectively is taken to construct two stand-alone feed-forward neural network
forecasting models. Neural network approach was chosen for the tasks due to its ability to
handle non-linearity and its freedom from a priori selection of mathematical models. The
results of the experiments suggest that one-step-ahead forecasts can be made with
reasonably accuracy.
A relatively novel outcome of this thesis is the integration of individual artificial
neural networks into a single model that may produce better long-term predictions. Each
component network is constructed for making direct forecasts of different time interval
ahead. The combination of individual artificial neural networks, called a multiple neural
network model, propagates forward in different-length steps in order to make forecasts.
Due to the various step-lengths, it is expected that the number of recursion steps is
smaller, and hence the accumulative error is lower.
ii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Abstract
The task of modeling data is difficult when the data of some variables are unavailable
either totally or partially during the examined time span. Lacking the data, it is at times
impossible to model causal relationships between those variables and the variable to be
forecasted. In such a case, a possible solution is to use univariate time series modeling
where the historical data of the variable of interest is used to develop a model. In this
thesis, a univariate time series approach, using solely the petroleum production and gas
flow rate respectively is taken to construct two stand-alone feed-forward neural network
forecasting models. Neural network approach was chosen for the tasks due to its ability to
handle non-linearity and its freedom from a priori selection of mathematical models. The
results of the experiments suggest that one-step-ahead forecasts can be made with
reasonably accuracy.
A relatively novel outcome of this thesis is the integration of individual artificial
neural networks into a single model that may produce better long-term predictions. Each
component network is constructed for making direct forecasts of different time interval
ahead. The combination of individual artificial neural networks, called a multiple neural
network model, propagates forward in different-length steps in order to make forecasts.
Due to the various step-lengths, it is expected that the number of recursion steps is
smaller, and hence the accumulative error is lower.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgements
I wish to express my sincere thanks to Dr. Christine W. Chan for her supervision,
encouragement and financial support.
Thanks Saskatchewan Energy and Mines and SaskEngergy for providing the data on the
petroleum and gas consumption domains.
Special thanks go to Dr. Malcolm Wilson, Erik Nickel, Chris Gilboy and Dr. Gang Zhao
for providing their wise suggestions and expertise on the petroleum domain.
I am grateful to the Faculty of Graduate Studies and Research of the University of Regina
for providing the scholarships.
A warm word of thanks goes to my friends Tran Thi Minh Chau, Linhui Jiang and the
current and past students at the ERU lab for their friendships and useful discussions.
Last but not least, I wish to express my gratefulness to my parents and sister for their
incredible encouragement and emotional support.
iii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Acknowledgements
I wish to express my sincere thanks to Dr. Christine W. Chan for her supervision,
encouragement and financial support.
Thanks Saskatchewan Energy and Mines and SaskEngergy for providing the data on the
petroleum and gas consumption domains.
Special thanks go to Dr. Malcolm Wilson, Erik Nickel, Chris Gilboy and Dr. Gang Zhao
for providing their wise suggestions and expertise on the petroleum domain.
I am grateful to the Faculty of Graduate Studies and Research of the University of Regina
for providing the scholarships.
A warm word of thanks goes to my friends Tran Thi Minh Chau, Linhui Jiang and the
current and past students at the ERU lab for their friendships and useful discussions.
Last but not least, I wish to express my gratefulness to my parents and sister for their
incredible encouragement and emotional support.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
iii
Table of Contents
ABSTRACT ii
ACKNOWLEDGEMENT iii
TABLE OF CONTENTS ..iv
LIST OF TABLES .viii
LIST OF FIGURES .ix
1. INTRODUCTION 1
1.1 MOTIVATION AND RESEARCH OBJECTIVES 1
1.2 THESIS STRUCTURE 3
2. BACKGROUND ON FORCASTING AND TIME SERIES
FORECASTING 4
2.1 OVERVIEW OF FORCASTING..... 4
2.1.1 Forecasting System Framework ...4
2.1.2 Forecasting Applications 6
2.1.3 Classification of Forecasting Models 7
2.1.4 Selection of a Forecasting Model 9
2.2 OVERVIEW OF TIME SERIES AND TIME SERIES FORECASTING 10
2.2.1 Time Series ..10
2.2.2 Decompositions of Time Series ...12
2.2.3 Performance Criteria 16
2.2.4 Time Series Forecast Techniques ..18
iv
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table of Contents
ABSTRACT................................................................................................................................. ii
ACKNOWLEDGEMENT....................................................................................................... iii
TABLE OF CONTENTS......................................................................................................... iv
LIST OF TABLES................................................................................................................... viii
LIST OF FIGURES................................................................................................................... ix
1. INTRODUCTION................................................................................................................ 1
1.1 MOTIVATION AND RESEARCH OBJECTIVES.................................................... 1
1.2 THESIS STRUCTURE....................................................................................................3
2. BACKGROUND ON FORCASTING AND TIME SERIES
FORECASTING................................................................................................................... 4
2.1 OVERVIEW OF FORCASTING.................................................................................. 4
2.1.1 Forecasting System Framework.............................................................................4
2.1.2 Forecasting Applications......................................................................................... 6
2.1.3 Classification o f Forecasting Models.................................................................... 7
2.1.4 Selection o f a Forecasting M odel...........................................................................9
2.2 OVERVIEW OF TIME SERIES AND TIME SERIES FORECASTING.............10
2.2.1 Time Series............................................................................................................... 10
2.2.2 Decompositions o f Time Series............................................................................. 12
2.2.3 Performance Criteria..............................................................................................16
2.2.4 Time Series Forecast Techniques.......................................................................... 18
iv
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.2.4.1 Simple Exponential Smoothing 18
2.2.4.2 Holt-Winter 19
2.2.4.3 Univiriate Box-Jenkins .20
2.2.4.4 Memory Based Reasoning 22
2.2.4.5 Artificial Neural Networks 25
3. METHODOLOGY: NEURAL NETWORKS AND MULTIPLE-NEURAL-
NETWORK FRAMEWORK ...28
3.1 BACKGROUND ON NEURAL NETWORKS 28
3.1.1 Artificial Neurons .29
3.1.2 Transfer Functions ..31
3.2 BACK PROPAGATION LEARNING PROCEDURE .32
3.2.1 Generalized Delta Rule and Gradient Descent .32
3.2.2 Back-propagation Formulae .33
3.2.3 Back-propagation Procedure .37
3.3 CONSIDERATIONS ON NEURAL NETWORK TOPOLOGY AND
TRAINING PARAMETERS .38
3.4 LITERATURE REVIEW: MULTIPLE NEURAL NETWORKS
APPROACHES 39
3.5 MULTIPLE NEURAL NETWORK APPROACH .45
3.5.1 Motivation ...45
3.5.2 Structure of a Multiple Neural Network Model .47
3.6 TOOLS .49
3.6.1 NeuroOn-line Tool-kit .49
V
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.2.4.1 Simple Exponential Smoothing.......................................................................... 18
2.2.4.2 Holt-Winter........................................................................................................... 19
2.2.4.3 Univiriate Box-Jenkins........................................................................................ 20
2.2.4.4 Memory Based Reasoning..................................................................................22
2.2.4.5 Artificial Neural Networks..................................................................................25
3. METHODOLOGY: NEURAL NETWORKS AND MULTIPLE-NEURAL-
NET WORK FRAMEWORK.......................................................................................... 28
3.1 BACKGROUND ON NEURAL NETWORKS.......................................................28
3.1.1 Artificial Neurons.................................................................................................. 29
3.1.2 Transfer Functions................................................................................................ 31
3.2 BACK_PROPAGATION LEARNING PROCEDURE..........................................32
3.2.1 Generalized Delta Rule and Gradient Descent................................................. 32
3.2.2 Back-propagation Formulae................................................................................33
3.2.3 Back-propagation Procedure...............................................................................37
3.3 CONSIDERATIONS ON NEURAL NETWORK TOPOLOGY AND
TRAINING PARAMETERS........................................................................................ 38
3.4 LITERATURE REVIEW: MULTIPLE NEURAL NETWORKS
APPROACHES..............................................................................................................39
3.5 MULTIPLE NEURAL NETWORK APPROACH................................................. 45
3.5.1 Motivation.............................................................................................................. 45
3.5.2 Structure o f a Multiple Neural Network M odel.................................................47
3.6 TOOLS........................................................................................................................... 49
3.6.1 NeuroOn-line Tool-kit.......................................................................................... 49
v
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.6.2 Multiple Neural Network Tool ...50
4. CASE STUDIES .61
4.1 PETROLEUM PRODUCTION PREDICTION 61
4.1.1 Data ..64
4.1.1.1 Data Collection 64
4.1.1.2 Data Cleaning and Transformation .64
4.1.1.3 Data Set Manipulation .66
4.1.2 Using NeurOn-line ...66
4.1.2.1 Development of a Model of Production Time Series and Geoscience
Parameters .67
4.1.2.2 Development of a Model of Production Time Series Only .69
4.1.3 Using Multiple Neural Network 69
4.1.4 Results 71
4.1.4.1 NOL Model ..71
4.1.4.2 Multiple-ANN and Single-ANN Models .73
4.1.5 Discussions .75
4.1.6 Conclusion and Future Works ..76
4.2 HOURLY GAS FLOW PREDICTION .76
4.2.1 Data Collection and Pre-processing .79
4.2.2 Training and Validation 79
4.2.3 Testing 81
4.2.4 Discussions .83
4.2.5 Conclusion and Future Works .83
vi
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.6.2 Multiple Neural Network Tool..............................................................................50
4. CASE STUDIES.................................................................................................................61
4.1 PETROLEUM PRODUCTION PREDICTION........................................................ 61
4.1.1 Data .......................................................................................................................... 64
4.1.1.1 Data Collection..................................................................................................... 64
4.1.1.2 Data Cleaning and Transformation................................................................... 64
4.1.1.3 Data Set Manipulation......................................................................................... 66
4.1.2 Using NeurOn-line................................................................................................. 66
4.1.2.1 Development of a Model of Production Time Series and Geoscience
Parameters.............................................................................................................67
4.1.2.2 Development of a Model of Production Time Series Only............................69
4.1.3 Using Multiple Neural Network........................................................................... 69
4.1.4 Results......................................................................................................................71
4.1.4.1 NOL M odel.......................................................................................................... 71
4.1.4.2 Multiple-ANN and Single-ANN Models......................................................... 73
4.1.5 Discussions..............................................................................................................75
4.1.6 Conclusion and Future Works...............................................................................76
4.2 HOURLY GAS FLOW PREDICTION...................................................................... 76
4.2.1 Data Collection and Pre-processing................................................................... 79
4.2.2 Training and Validation........................................................................................ 79
4.2.3 Testing......................................................................................................................81
4.2.4 Discussions.............................................................................................................. 83
4.2.5 Conclusion and Future Works............................................................................. 83
vi
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5. OBSERVATIONS AND DISCUSSIONS ...84
5.1 DISCUSSIONS ON SUITABILITY OF TIME SERIES MODELLING IN
FORECASTING. .84
5.2 DISCUSSIONS ON USING THE NOL TOOL-KIT .85
5.3 DISCUSSIONS ON USING THE MNN TOOL .85
5.3.1 Reusing weights of lower-ordered ANNs .85
5.3.2 Using multi-step validation 86
5.3.3 Setting training parameters 87
5.3.4 Updating training parameters 88
6. CONCLUSION AND FUTURE WORKS .90
6.1 CONCLUDING SUMMARY ...90
6.2 FUTURE WORKS ...92
BIBLIOGRAPHY .95
APPENDIX A - RUNNING THE MNN TOOL .100
APPENDIX B - FORMATS OF PARAMETER AND DATA FILES FOR THE
MNN TOOL ..101
APPENDIX C - SAMPLE DATA ..104
vii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5. OBSERVATIONS AND DISCUSSIONS......................................................................84
5.1 DISCUSSIONS ON SUITABILITY OF TIME SERIES MODELLING IN
FORECASTING.............................................................................................................84
5.2 DISCUSSIONS ON USING THE NOL TOOL-KIT................................................85
5.3 DISCUSSIONS ON USING THE MNN TOOL....................................................... 85
5.3.1 Reusing weights o f lower-ordered ANNs.............................................................85
5.3.2 Using multi-step validation....................................................................................86
5.3.3 Setting training parameters....................................................................................87
5.3.4 Updating training parameters............................................................................... 88
6. CONCLUSION AND FUTURE WORKS.................................................................... 90
6.1 CONCLUDING SUMMARY......................................................................................90
6.2 FUTURE WORKS........................................................................................................ 92
BIBLIOGRAPHY..................................................................................................................... 95
APPENDIX A - RUNNING THE MNN TOOL............................................................... 100
APPENDIX B - FORMATS OF PARAMETER AND DATA FILES FOR THE
MNN TOOL..............................................................................................................................101
APPENDIX C - SAMPLE DATA....................................................................................... 104
vii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Tables
Table 4.1.1 Network Configuration — model 1 .68
Table 4.1.2 Network configuration — model 2 69
Table 4.1.3 Sensitivities — model 1 .73
Table 4.1.4 Sensitivities — model 2 .73
Table C.1 Sample of oil production data 104
Table C.2 Sample of raw core analysis data 106
Table C.3 Sample of pressure data 107
Table C.4 Sample of flow rate data at Melfort station 107
viii
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Tables
Table 4.1.1 Network Configuration - model 1........................................................................68
Table 4.1.2 Network configuration - model 2.........................................................................69
Table 4.1.3 Sensitivities - model 1............................................................................................73
Table 4.1.4 Sensitivities - model 2 ........................................................................................... 73
Table C .l Sample of oil production data................................................................................104
Table C.2 Sample of raw core analysis data.......................................................................... 106
Table C.3 Sample of pressure data..........................................................................................107
Table C.4 Sample of flow rate data at Melfort station..........................................................107
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
viii
List of Figures
Figure 2.1 Conceptual framework of a forecasting system ...5
Figure 2.2 A broad classification of forecasting methods ...8
Figure 2.3 Australian monthly production of basic iron .13
Figure 2.4 The time series from the previous figure after removing trend effect .13
Figure 2.5 A time series with seasonal effect .14
Figure 2.6 A sample k-d tree ..24
Figure 3.1 A Multi-layer Artificial Neural Network ..29
Figure 3.2 An Artificial Neuron .30
Figure 3.3 Activation functions 31
Figure 3.4 Layers in a feed-forward neural network 34
Figure 3.5 A sample MNN model .48
Figure 3.6 Classes of the neural network system of the MNN tool 50
Figure 3.7 Screen for inputting training parameters .53
Figure 3.8. Screen for inputting training parameters of component neural network 55
Figure 3.9. Screen for inputting testing parameters ..57
Figure 3.10. Screen for inputting parameters for prediction .58
Figure 3.11 Screen for training output .59
Figure 3.12 Screen for inputting testing output 59
Figure 3.13 Screens for prediction output .60
Figure 4.1.1 Well production history .63
ix
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Figures
Figure 2.1 Conceptual framework of a forecasting system..................................................... 5
Figure 2.2 A broad classification of forecasting methods........................................................8
Figure 2.3 Australian monthly production of basic iron.........................................................13
Figure 2.4 The time series from the previous figure after removing trend effect...............13
Figure 2.5 A time series with seasonal effect.......................................................................... 14
Figure 2.6 A sample k-d tree...................................................................................................... 24
Figure 3.1 A Multi-layer Artificial Neural Network..............................................................29
Figure 3.2 An Artificial Neuron.................................................................................................30
Figure 3.3 Activation functions.................................................................................................31
Figure 3.4 Layers in a feed-forward neural network.............................................................. 34
Figure 3.5 A sample MNN model..............................................................................................48
Figure 3.6 Classes of the neural network system of the MNN tool......................................50
Figure 3.7 Screen for inputting training parameters............................................................... 53
Figure 3.8. Screen for inputting training parameters of component neural network......... 55
Figure 3.9. Screen for inputting testing parameters................................................................ 57
Figure 3.10. Screen for inputting parameters for prediction..................................................58
Figure 3.11 Screen for training output......................................................................................59
Figure 3.12 Screen for inputting testing output.......................................................................59
Figure 3.13 Screens for prediction output.................................... 60
Figure 4.1.1 Well production history........................................................................................ 63
ix
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.1.2 Elimination of incomplete records ...65
Figure 4.1.3 Predicted vs. target — model 1 .71
Figure 4.1.4 Predicted vs. target — model 2 .72
Figure 4.1.5 Test errors for MNN and Single ANN for different prediction periods ..74
Figure 4.1.6 Desired vs. predicted outputs ...74
Figure 4.2.1 Schematic of St. Louis East system .77
Figure 4.2.2 Hourly flow during a day ...78
Figure 4.2.3 Validated RMSE of 5 models for 24 hour period .80
Figure 4.2.4 Test errors for MNN and single ANN for 24 hour period .81
Figure 4.2.5 Predicted vs. actual for 24 hours ahead .82
Figure 4.2.6 Predicted vs. actual for 6 hours ahead .82
Figure 5.1 Side effect of large validation window 88
Figure A.1 Main screen of the MNN tool 100
x
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.1.2 Elimination of incomplete records......................................................................65
Figure 4.1.3 Predicted vs. target - model 1............................................................................. 71
Figure 4.1.4 Predicted vs. target - model 2............................................................................. 72
Figure 4.1.5 Test errors for MNN and Single ANN for different prediction periods 74
Figure 4.1.6 Desired vs. predicted outputs.............................................................................. 74
Figure 4.2.1 Schematic of St. Louis East system.................................................................... 77
Figure 4.2.2 Hourly flow during a day.....................................................................................78
Figure 4.2.3 Validated RMSE of 5 models for 24 hour period..............................................80
Figure 4.2.4 Test errors for MNN and single ANN for 24 hour period............................... 81
Figure 4.2.5 Predicted vs. actual for 24 hours ahead.............................................................. 82
Figure 4.2.6 Predicted vs. actual for 6 hours ahead................................................................ 82
Figure 5.1 Side effect of large validation window................................................................. 88
Figure A .l Main screen of the MNN tool.............................................................................. 100
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
X
Chapter 1
Introduction
1.1 Motivation and Research Objectives
Forecasting is a key element of decision-making. The effectiveness of a decision often
depends heavily on events that occur after the decision. Therefore, the ability to
accurately predict the uncontrollable aspects of these events should improve the choice
that the decision-maker makes.
Time series modeling is a quantitative forecasting method. A time series is a
collection of observations made sequentially in time [Ch75]. In time series forecasting,
historical data is analyzed to identify common data patterns and develop a model that can
later be used for prediction of future values. Time series modeling have been applied in
various areas of business, engineering and science. Our study focuses on univariate time
series forecasting where the input variables are delay lags of the outputs.
The neural network technique was chosen for this research mainly because it is free
from a priori selection of mathematical models while Box-Jenkins, one of the most
widely used statistical techniques for forecasting, requires it. Model selection involves
examining various graphs based on the transformed data to try to identify potential
mathematical models that might provide a good fit to the data. Other advantages of neural
networks include the ability to learn from examples, the ability to capture non-linear
structure, their parallel computations, and their fault tolerance via redundant information
coding.
1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 1
Introduction
1.1 Motivation and Research Objectives
Forecasting is a key element of decision-making. The effectiveness of a decision often
depends heavily on events that occur after the decision. Therefore, the ability to
accurately predict the uncontrollable aspects of these events should improve the choice
that the decision-maker makes.
Time series modeling is a quantitative forecasting method. A time series is a
collection of observations made sequentially in time [Ch75]. In time series forecasting,
historical data is analyzed to identify common data patterns and develop a model that can
later be used for prediction of future values. Time series modeling have been applied in
various areas of business, engineering and science. Our study focuses on univariate time
series forecasting where the input variables are delay lags of the outputs.
The neural network technique was chosen for this research mainly because it is free
from a priori selection of mathematical models while Box-Jenkins, one of the most
widely used statistical techniques for forecasting, requires it. Model selection involves
examining various graphs based on the transformed data to try to identify potential
mathematical models that might provide a good fit to the data. Other advantages of neural
networks include the ability to learn from examples, the ability to capture non-linear
structure, their parallel computations, and their fault tolerance via redundant information
coding.
1
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Neural computing is an area in artificial intelligence first developed in 1940s. It
suffered a period of detention in the early 1970s after some limitations of a simple
perceptron were found in 1969 [Wi92]. In the late 1970s, neural networks have received
considerable renewal of interests due to several improvements in network structures and
algorithms. The back-propagation algorithm utilized in this research thesis is one
paradigm developed in this period.
There have been a reasonable number of successful neural network applications on
time series forecasting, e.g. financial application in [Wa01] or industrial application in
[DSMV01]. It will be observed in this thesis whether neural network technique is suitable
for the applications of petroleum production and gas consumption prediction.
Both the petroleum production and gas consumption prediction applications require
prediction of multiple units ahead. During the development process, we found that
applying recursively a neural network that makes one-step-ahead forecast is not sufficient
for long-term or multiple-step-ahead forecasts. The short-term and long-term trends of a
time series are often different from each other. If we use only one short-term neural
network recursively to predict long term, the results could be very inaccurate. Therefore,
we propose a multiple neural network model using short-term and long-term neural
networks combined together to estimate for a wide range of prediction terms.
A multiple neural network (MNN) is a group of neural networks, each of which is
trained for the purpose of predicting different terms. The ultimate goal of this
combination is to see if we can improve the accuracy of long-term forecasts.
In summary, the objectives of this research include:
2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Neural computing is an area in artificial intelligence first developed in 1940s. It
suffered a period of detention in the early 1970s after some limitations of a simple
perceptron were found in 1969 [Wi92]. In the late 1970s, neural networks have received
considerable renewal of interests due to several improvements in network structures and
algorithms. The back-propagation algorithm utilized in this research thesis is one
paradigm developed in this period.
There have been a reasonable number of successful neural network applications on
time series forecasting, e.g. financial application in [WaOl] or industrial application in
[DSMV01]. It will be observed in this thesis whether neural network technique is suitable
for the applications of petroleum production and gas consumption prediction.
Both the petroleum production and gas consumption prediction applications require
prediction of multiple units ahead. During the development process, we found that
applying recursively a neural network that makes one-step-ahead forecast is not sufficient
for long-term or multiple-step-ahead forecasts. The short-term and long-term trends of a
time series are often different from each other. If we use only one short-term neural
network recursively to predict long term, the results could be very inaccurate. Therefore,
we propose a multiple neural network model using short-term and long-term neural
networks combined together to estimate for a wide range of prediction terms.
A multiple neural network (MNN) is a group of neural networks, each of which is
trained for the purpose of predicting different terms. The ultimate goal of this
combination is to see if we can improve the accuracy of long-term forecasts.
In summary, the objectives of this research include:
2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Investigating the feasibility of using feed-forward neural networks and time series
modeling in two applications to forecast petroleum production and gas consumption,
and
• Investigating whether grouping neural networks into a model improves the forecast
performance of a single neural network in long-term forecasting.
1.2 Thesis Structure
This thesis consists of six chapters. Chapter 1 gives a brief introduction of the thesis.
Chapter 2 provides an overview of forecasting in general and time-series forecasting in
particular. Chapter 3 presents the fundamentals of the artificial neural network technique
and reviews existing multiple neural network approaches in the literatures. Chapter 4
contains details of the two case studies of developing neural network applications in
prediction of petroleum production and gas consumption. Chapter 5 provides some
discussions based on the case studies in chapter 4. Chapter 6 draws some conclusions for
this thesis and gives recommendations for further research work.
3
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Investigating the feasibility of using feed-forward neural networks and time series
modeling in two applications to forecast petroleum production and gas consumption,
and
• Investigating whether grouping neural networks into a model improves the forecast
performance of a single neural network in long-term forecasting.
1.2 Thesis Structure
This thesis consists of six chapters. Chapter 1 gives a brief introduction of the thesis.
Chapter 2 provides an overview of forecasting in general and time-series forecasting in
particular. Chapter 3 presents the fundamentals of the artificial neural network technique
and reviews existing multiple neural network approaches in the literatures. Chapter 4
contains details of the two case studies of developing neural network applications in
prediction of petroleum production and gas consumption. Chapter 5 provides some
discussions based on the case studies in chapter 4. Chapter 6 draws some conclusions for
this thesis and gives recommendations for further research work.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3
Chapter 2
Background on Forecasting and Time Series Forecasting
This chapter provides background literature on forecasting and time series forecasting.
Section 2.1 gives an overview on a framework for and a classification of forecasting
systems, and reviews some forecasting applications. Section 2.2 focuses on a type of
forecasting called time series modeling. Several time series forecasting techniques will be
reviewed and discussed.
2.1 Overview of Forecasting
According to [CNO3], forecasting and prediction belong to a sub-category within the
general taxonomy of tasks and share the objective of foretelling future events. They
involve making decisions when one does not know with certainty the effect of those
decisions due to for example, randomness of future events. Often, prediction makes use
of past and current data with known values to assign explicit values on some unknown or
future data. When expert opinion or heuristics are combined with historical data, it is
called forecasting. For example, a forecast or predictive model can be built using the
payment history of people to whom you have given loans to help identify people who are
likely to default on loans. In this study, the terms forecasting and prediction are used
interchangeably.
2.1.1 Forecasting System Framework
In general, development of a forecasting system consists of two main phases: modeling
(or training) and forecasting (or transfer) [121.195] [Pan At the modeling phase, a
4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 2
Background on Forecasting and Time Series Forecasting
This chapter provides background literature on forecasting and time series forecasting.
Section 2.1 gives an overview on a framework for and a classification of forecasting
systems, and reviews some forecasting applications. Section 2.2 focuses on a type of
forecasting called time series modeling. Several time series forecasting techniques will be
reviewed and discussed.
2.1 Overview of Forecasting
According to [CN03], forecasting and prediction belong to a sub-category within the
general taxonomy of tasks and share the objective of foretelling future events. They
involve making decisions when one does not know with certainty the effect of those
decisions due to for example, randomness of future events. Often, prediction makes use
of past and current data with known values to assign explicit values on some unknown or
future data. When expert opinion or heuristics are combined with historical data, it is
called forecasting. For example, a forecast or predictive model can be built using the
payment history of people to whom you have given loans to help identify people who are
likely to default on loans. In this study, the terms forecasting and prediction are used
interchangeably.
2.1.1 Forecasting System Framework
In general, development of a forecasting system consists of two main phases: modeling
(or training) and forecasting (or transfer) [Ru95] [Po89]. At the modeling phase, a
4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
forecasting model is constructed from available data and theory. In some cases, a theory
exists that can suggest particular models. In most situations, however, an empirical model
is built from historical data. At the forecasting phase, the model is used to forecast. The
stability of the forecasting model can be assessed by checking the forecasts against
observations. If forecast errors are high, it is possible that the forecast environment is
different from the model development environment. In this case, adaptation of the model
to the new situation is needed. The modeling and forecasting phases may not be explicitly
separated. They can in fact be combined by presenting unfamiliar stimuli at several
points during the training phase so that the model's knowledge of the patterns is tested as
it progresses in learning [Po89]. If the model is unsatisfactory, it has to be re-specified,
tested again and so on until an adequate model is found. Figure 2.1 illustrates the
conceptual framework of a forecasting system.
Modeling Phase
• Theory and/or Previous study
•
Forecasting Phase
• • •
Model Specification
A
C Data_
Model
Estimation
0
Yes
Forecast Generation
New Observation
0
Forecast Generation
Figure 2.1 Conceptual framework of a forecasting system [Ru95]
One approach for building a forecasting model is to use past data to construct a
function that can be used to make predictions under very general circumstances [Ru95].
5
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
forecasting model is constructed from available data and theory. In some cases, a theory
exists that can suggest particular models. In most situations, however, an empirical model
is built from historical data. At the forecasting phase, the model is used to forecast. The
stability of the forecasting model can be assessed by checking the forecasts against
observations. If forecast errors are high, it is possible that the forecast environment is
different from the model development environment. In this case, adaptation of the model
to the new situation is needed. The modeling and forecasting phases may not be explicitly
separated. They can in fact be combined by presenting unfamiliar stimuli at several
points during the training phase so that the model’s knowledge of the patterns is tested as
it progresses in learning [Po89]. If the model is unsatisfactory, it has to be re-specified,
tested again and so on until an adequate model is found. Figure 2.1 illustrates the
conceptual framework of a forecasting system.
Modeling Phase Forecasting Phase
No No
Adequate Stable
Yes Yes
D a ta
Estimation
ModelModelSpecification
ForecastGeneration
ForecastGeneration
NewObservation
Theory and/or Previous studv
Figure 2.1 Conceptual framework of a forecasting system [Ru95]
One approach for building a forecasting model is to use past data to construct a
function that can be used to make predictions under very general circumstances [Ru95],
5
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
However, this approach cannot always be carried out in practice. In some cases, the
underlying principles are unknown or poorly understood because the system of interest is
very complicated. Another problem with this approach is that even when the basic laws
are known, it is often not possible to forecast without detailed information about initial
values and boundary conditions. Forecasting models are often based on an assumption
that a well-defined relationship exists between the past and future values of a single
observable [Ru95].
2.1.2 Forecasting Applications
Forecasting is a key element of management decision-making. Since the ultimate
effectiveness of any decision depends upon a sequence of events following the decision,
the ability to predict the uncontrollable aspects of these events prior to making the
decision should permit an improved choice over that which would otherwise be made
[MJG90].
Examples of situations where forecasts are useful are production planning, financial
planning, staff scheduling and facilities planning. To plan the manufacture of a product
line, it could be necessary to forecast unit sales for each item by delivery period in future.
This forecast then can be converted to the requirements of materials, labor, facilities etc,
so that the entire manufacturing system can be scheduled and the required investment can
be justified. Forecasting also plays an important part in process control. By simulating the
future behavior of a process, it may be possible to determine the optimal time and the
level of control actions.
Forecasting technology, principles and application have been developed in many
different fields such as economics, meteorology, environmental management and control.
6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
However, this approach cannot always be carried out in practice. In some cases, the
underlying principles are unknown or poorly understood because the system of interest is
very complicated. Another problem with this approach is that even when the basic laws
are known, it is often not possible to forecast without detailed information about initial
values and boundary conditions. Forecasting models are often based on an assumption
that a well-defined relationship exists between the past and future values of a single
observable [Ru95],
2.1.2 Forecasting Applications
Forecasting is a key element of management decision-making. Since the ultimate
effectiveness of any decision depends upon a sequence of events following the decision,
the ability to predict the uncontrollable aspects of these events prior to making the
decision should permit an improved choice over that which would otherwise be made
[MJG90],
Examples of situations where forecasts are useful are production planning, financial
planning, staff scheduling and facilities planning. To plan the manufacture of a product
line, it could be necessary to forecast unit sales for each item by delivery period in future.
This forecast then can be converted to the requirements of materials, labor, facilities etc,
so that the entire manufacturing system can be scheduled and the required investment can
be justified. Forecasting also plays an important part in process control. By simulating the
future behavior of a process, it may be possible to determine the optimal time and the
level of control actions.
Forecasting technology, principles and application have been developed in many
different fields such as economics, meteorology, environmental management and control.
6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Some sample applications are listed as follows. Wu and Lu [WL93] forecasted the trend
of stock market performance. Tangang et al [THT97] used neural networks to forecast the
sea surface temperatures of the equatorial Pacific. Gardner and Dorling [GD99], Boznar
et al. [BLM93] and Yi and Prybutok [YP92] modeled and predicted short-term air
concentration and ozone concentration based on basic meteorological data. In Kao and
Huang [KHOO], a model was developed relating peak pollutant concentrations to
meteorological and emission variables and indices. Guhathakurta et al. [GRT99] and
Shahai et al. [SSS00] made forecasts on Indian summer monsoon rainfall crucial for
proper agriculture planning. Yasdi [Ya99] predicted daily road traffic flow in an effort to
assist the traffic control center. Swiercz et al. [SMKLS00] made predictions on
intracranial pressure, which provided valuable information on the condition of
neurosurgical patients. Utility demand forecasts are discussed in Lertpalangsunti and
Chan [LC98], Lertpalangsunti et al. [LCMT99] and Chiu et al. [CLC97].
2.1.3 Classification of Forecasting Models
Forecasting models can be broadly classified as qualitative or quantitative, depending
upon the extent to which mathematical and statistical methods are used. Quantitative
models belong to either time series or causal categories. Figure 2.2 illustrates a broad
classification of forecasting methods described in [0d83]. Each of the methods is
discussed as follows.
7
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Some sample applications are listed as follows. Wu and Lu [WL93] forecasted the trend
of stock market performance. Tangang et al [THT97] used neural networks to forecast the
sea surface temperatures of the equatorial Pacific. Gardner and Dorling [GD99], Boznar
et al. [BLM93] and Yi and Prybutok [YP92] modeled and predicted short-term air
concentration and ozone concentration based on basic meteorological data. In Kao and
Huang [KHOO], a model was developed relating peak pollutant concentrations to
meteorological and emission variables and indices. Guhathakurta et al. [GRT99] and
Shahai et al. [SSSOO] made forecasts on Indian summer monsoon rainfall crucial for
proper agriculture planning. Yasdi [Ya99] predicted daily road traffic flow in an effort to
assist the traffic control center. Swiercz et al. [SMKLSOO] made predictions on
intracranial pressure, which provided valuable information on the condition of
neurosurgical patients. Utility demand forecasts are discussed in Lertpalangsunti and
Chan [LC98], Lertpalangsunti et al. [LCMT99] and Chiu et al. [CLC97],
2.1.3 Classification of Forecasting Models
Forecasting models can be broadly classified as qualitative or quantitative, depending
upon the extent to which mathematical and statistical methods are used. Quantitative
models belong to either time series or causal categories. Figure 2.2 illustrates a broad
classification of forecasting methods described in [Od83]. Each of the methods is
discussed as follows.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Forecasting Methods
Qualitative M thods
uantitative Methods
Time Series Causal
Figure 2.2 A broad classification of forecasting methods
• Qualitative forecasting methods generally use the intuitive opinions of experts to
predict future events subjectively. These opinions may or may not depend on past
data that belong to this category. Usually someone else cannot reproduce these
forecasts because the forecaster does not specify explicitly how the available
information is incorporated into the forecast.
• Quantitative forecasting methods are based on mathematical or statistical models.
They involve the analysis of historical data in an attempt to predict future values of a
variable of interest. Once the underlying model has been chosen, the future forecasts
are determined automatically; they are fully reproducible by any forecaster. Basically,
quantitative forecasting models fall into two fairly well defined categories: the time
series model and the explanatory or causal model.
8
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Time Series
QualitativeM ethods
QuantitativeMethods
Forecasting
Figure 2.2 A broad classification of forecasting methods
• Qualitative forecasting methods generally use the intuitive opinions of experts to
predict future events subjectively. These opinions may or may not depend on past
data that belong to this category. Usually someone else cannot reproduce these
forecasts because the forecaster does not specify explicitly how the available
information is incorporated into the forecast.
• Quantitative forecasting methods are based on mathematical or statistical models.
They involve the analysis of historical data in an attempt to predict future values of a
variable of interest. Once the underlying model has been chosen, the future forecasts
are determined automatically; they are fully reproducible by any forecaster. Basically,
quantitative forecasting models fall into two fairly well defined categories: the time
series model and the explanatory or causal model.
8
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• In time series models, historical data on the predicted variable are analyzed in an
attempt to identify a data pattern. Then assuming that it will continue in the future,
this pattern is extrapolated to produce a forecast.
• Causal models relate the dependent variables to a number of independent variables.
After a model that describes the relationship between these variables has been
developed, it can be used to forecast the values of the dependent variables of interest.
The empirical evidence reported suggests that causal models do not provide
significantly more accurate forecasts than the time series models, even though the former
are more complex and expensive [HG93].
2.1.4 Selection of a Forecasting Model
The following are some of the main considerations in choosing a forecasting model
[Ru95].
• Required degree of accuracy
• Forecasting horizon
• Forecasting cost
• Degree of complexity
• Availability of data
Some techniques are better than others in making short-term or long-term forecasts.
Hence the forecasting horizon should be taken into consideration when the forecasting
techniques are determined.
In some cases only coarse forecasts are required, in others highly accurate forecasts
are essential. The degree of accuracy depends on the consequence of making wrong
forecasts.
9
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• In time series models, historical data on the predicted variable are analyzed in an
attempt to identify a data pattern. Then assuming that it will continue in the future,
this pattern is extrapolated to produce a forecast.
• Causal models relate the dependent variables to a number of independent variables.
After a model that describes the relationship between these variables has been
developed, it can be used to forecast the values of the dependent variables of interest.
The empirical evidence reported suggests that causal models do not provide
significantly more accurate forecasts than the time series models, even though the former
are more complex and expensive [HG93].
2.1.4 Selection of a Forecasting Model
The following are some of the main considerations in choosing a forecasting model
[Ru95],
• Required degree of accuracy
• Forecasting horizon
• Forecasting cost
• Degree of complexity
• Availability of data
Some techniques are better than others in making short-term or long-term forecasts.
Hence the forecasting horizon should be taken into consideration when the forecasting
techniques are determined.
In some cases only coarse forecasts are required, in others highly accurate forecasts
are essential. The degree of accuracy depends on the consequence of making wrong
forecasts.
9
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The purpose of forecasting is to reduce the risk in decision-making. Forecasts are
usually erroneous, but the magnitude of the errors depends upon the forecasting system
used. By investing more resources to forecasting, the forecasting accuracy may be
improved and thereby eliminate some of the loss due to uncertainty in the decision
making process. The optimal situation occurs when the total cost of the resource used for
forecasting and the loss due to bad forecasting is minimal.
However, additional resources devoted to forecasting do not always bring any
improvement in accuracy. In the case where two models give similar good results, the
less complex model should be chosen [KR94].
To construct accurate empirical forecasting models, suitable data should be available.
However, it is not always possible to obtain the necessary data with reasonable cost.
2.2 Overview of Time Series and Time Series
Forecasting
2.2.1 Time Series
A time series is a collection of observations made sequentially in time [Ch75]. In formal
terms, a time series is a sequence of vectors, depending on time t [Do96]:
At), t = 0,1,...
The components of the vectors can be any observable variable, such as the
temperature of a building, the total monthly production of an oil well, the gas
consumption in a given area, or the population of a certain country. Strictly speaking, the
time index t must be a non-negative integer but this restriction can be relaxed sometimes
and in some literature, t can have negative values.
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The purpose of forecasting is to reduce the risk in decision-making. Forecasts are
usually erroneous, but the magnitude of the errors depends upon the forecasting system
used. By investing more resources to forecasting, the forecasting accuracy may be
improved and thereby eliminate some of the loss due to uncertainty in the decision
making process. The optimal situation occurs when the total cost of the resource used for
forecasting and the loss due to bad forecasting is minimal.
However, additional resources devoted to forecasting do not always bring any
improvement in accuracy. In the case where two models give similar good results, the
less complex model should be chosen [KR94].
To construct accurate empirical forecasting models, suitable data should be available.
However, it is not always possible to obtain the necessary data with reasonable cost.
2.2 Overview of Time Series and Time Series
Forecasting
2.2.1 Time Series
A time series is a collection of observations made sequentially in time [Ch75], In formal
terms, a time series is a sequence of vectors, depending on time t [Do96]:
x(t),t = 0,1,...
The components of the vectors can be any observable variable, such as the
temperature of a building, the total monthly production of an oil well, the gas
consumption in a given area, or the population of a certain country. Strictly speaking, the
time index t must be a non-negative integer but this restriction can be relaxed sometimes
and in some literature, t can have negative values.
10
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A time series is said to be continuous when observations are made continuously in
time and discrete when observations are taken only at specific times, usually equally
spaced. Discrete time series can arise in several ways. One way is to sample from a
continuous time series at usually equal intervals of time. The result is called a sampled
time series. Another type of discrete series occurs when a variable does not have an
instantaneous value but we can aggregate or accumulate the values over equal intervals of
time.
Time series forecasting consists of estimating the unknown parameters in the
appropriate model and using these estimated parameters, projecting the model into the
future to obtain a forecast [MJG90]. Suppose .ic (t) , t = 0,1,...n is an observed time series.
The problem is to estimate xn±q. The prediction of xn" made at time n of the value q steps
ahead will be denoted as :i(n,q). The integer q is called the lead time.
To forecast time series, it is necessary to represent the behavior of the process with a
mathematical model that can be extended into the future. It is required that the model be a
good representation of the observations in any local segment of the time close to the
present.
If a time series can be predicted exactly, it is said to be deterministic. But most real-
world time series are stochastic in the sense that the future is only partly determined by
past values. Exact predictions are impossible for such time series. Unknown and
uncontrollable factors called noise account for the errors. In some study, the
characteristics of the noise are assumed in order to include noise in the modeling process.
The forecasting period is the basic unit of time for which the forecasts are made. For
example, when forecasts are made every week, the period is a week.
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A time series is said to be continuous when observations are made continuously in
time and discrete when observations are taken only at specific times, usually equally
spaced. Discrete time series can arise in several ways. One way is to sample from a
continuous time series at usually equal intervals of time. The result is called a sampled
time series. Another type of discrete series occurs when a variable does not have an
instantaneous value but we can aggregate or accumulate the values over equal intervals of
time.
Time series forecasting consists of estimating the unknown parameters in the
appropriate model and using these estimated parameters, projecting the model into the
future to obtain a forecast [MJG90]. Suppose x(t),t = 0,1,...n is an observed time series.
The problem is to estimate xn+q. The prediction of xn+q made at time n of the value q steps
ahead will be denoted as x (n ,q ) . The integer q is called the lead time.
To forecast time series, it is necessary to represent the behavior of the process with a
mathematical model that can be extended into the future. It is required that the model be a
good representation of the observations in any local segment of the time close to the
present.
If a time series can be predicted exactly, it is said to be deterministic. But most real-
world time series are stochastic in the sense that the future is only partly determined by
past values. Exact predictions are impossible for such time series. Unknown and
uncontrollable factors called noise account for the errors. In some study, the
characteristics of the noise are assumed in order to include noise in the modeling process.
The forecasting period is the basic unit of time for which the forecasts are made. For
example, when forecasts are made every week, the period is a week.
11
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The forecasting horizon is the number of periods in the future covered by the forecast.
When a forecast is required for the next 10 weeks, broken down by week, the period is a
week and the horizon is 10 weeks. Forecasts typically become less accurate with
increasing forecast horizon. Sometime the term lead time is used in place of forecast
horizon.
The forecasting interval is the frequency with which new forecast are prepared. In
most cases the forecasting interval is the same as forecasting period.
2.2.2 Decomposition of Time Series
Generally speaking, every real-life time series have fluctuations. The fluctuations in data
are caused by many diverse and complex factors. Decomposition is a basic method to
analyze a time series, which attempts to group these factors into categories. A time series
is usually regarded as the combination of four meaningful components:
1. Trend component
2. Cyclical component
3. Seasonal component
4. Irregular or random component
Trend refers to the general direction in which the plot of a time series appears to be
rising or falling over a long period of time. Trend is a result of factors that produce a
steady and gradual change over time. A linear trend can be removed to form a time series
by replacing it with a series x' consisting of the differences between subsequent
values.
.V(t) = — .X(t —1)
12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The forecasting horizon is the number of periods in the future covered by the forecast.
When a forecast is required for the next 10 weeks, broken down by week, the period is a
week and the horizon is 10 weeks. Forecasts typically become less accurate with
increasing forecast horizon. Sometime the term lead time is used in place of forecast
horizon.
The forecasting interval is the frequency with which new forecast are prepared. In
most cases the forecasting interval is the same as forecasting period.
2.2.2 Decomposition of Time Series
Generally speaking, every real-life time series have fluctuations. The fluctuations in data
are caused by many diverse and complex factors. Decomposition is a basic method to
analyze a time series, which attempts to group these factors into categories. A time series
is usually regarded as the combination of four meaningful components:
1. Trend component
2. Cyclical component
3. Seasonal component
4. Irregular or random component
Trend refers to the general direction in which the plot of a time series appears to be
rising or falling over a long period of time. Trend is a result of factors that produce a
steady and gradual change over time. A linear trend can be removed to form a time series
x by replacing it with a series x consisting of the differences between subsequent
values.
x'(t) = x(t) - x(t - 1)
12
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 2.3 shows the monthly production of basic iron from 1956 to 1995 in Australia
(source: http://www-personal.buseco.monash.edu.au/—hyndman/TSDL/production.html,
file BASIRON.DAT, downloaded April 2002). The time series exhibits a close-to-linear
rising trend.
Australian monthly production of basic iron 1956-1995
800 0 473 600
400 0. c 200 2
0 co Lo cn C I 0) CO
N CO 0 N. CO In
•Zr CO C.0 01 N- 0
months
Figure 2.3 Australian monthly production of basic iron
After removing the trend effect, the plot fluctuates around the X-axis as shown in
Figure 2.4.
Figure 2.4 The time series from the previous figure after removing trend effect
While the trend is moving slowly upward or downward, there are oscillations or
fluctuations in a wave-like manner above and below the long-term trend line. These
wave-like cycles are called cyclical effects. These cycles tend to be recurrent but not
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 2.3 shows the monthly production of basic iron from 1956 to 1995 in Australia
(source: http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/production.html,
file BASIRON.DAT, downloaded April 2002). The time series exhibits a close-to-linear
rising trend.
Australian monthly production of basic iron 1956-1995
c 800 1I 600 j| 400 |
c 200 I
months
Figure 2.3 Australian monthly production of basic iron
After removing the trend effect, the plot fluctuates around the X-axis as shown in
Figure 2.4.
Trend effect removed
month
Figure 2.4 The time series from the previous figure after removing trend effect
While the trend is moving slowly upward or downward, there are oscillations or
fluctuations in a wave-like manner above and below the long-term trend line. These
wave-like cycles are called cyclical effects. These cycles tend to be recurrent but not
13
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
periodic, i.e. they may or may not follow exactly the same pattern after equal intervals of
time. An important example of cyclical effect in a time series is the business cycles that
represent intervals of prosperity, recession, depression and recovery.
Seasonal effect is not only recurrent but also periodic, and therefore predictable.
Seasonal component refers to the identical or almost identical patterns that a time series
appears to follow during corresponding months of successive years. Such movements are
due to periodic influencing factors e.g. Christmas holiday influences gift sales. Seasonal
effect is also easy to eliminate by computing the differences between corresponding
sequence elements:
.X./(t) = i'(t)—.i'(t — s)
where s is the seasonal period. For example, the water consumption in a customer area
show similar patterns in corresponding days of the week, and s is 7 in this case.
0.00E+00
-2.00E-01
-4.00E-01
-6.00E-01
Figure 2.5 A time series with seasonal effect
All other kinds of fluctuations are grouped into a category called irregular or random
component. This refers to the erratic motions of time series due to the unusual and
unpredictable events, such as natural disasters or wars. Although the duration of the
irregulars is usually short, it may be severe in amplitude.
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
periodic, i.e. they may or may not follow exactly the same pattern after equal intervals of
time. An important example of cyclical effect in a time series is the business cycles that
represent intervals of prosperity, recession, depression and recovery.
Seasonal effect is not only recurrent but also periodic, and therefore predictable.
Seasonal component refers to the identical or almost identical patterns that a time series
appears to follow during corresponding months of successive years. Such movements are
due to periodic influencing factors e.g. Christmas holiday influences gift sales. Seasonal
effect is also easy to eliminate by computing the differences between corresponding
sequence elements:
x '(t) - x ( t) - x{t - s)
where s is the seasonal period. For example, the water consumption in a customer area
show similar patterns in corresponding days of the week, and s is 7 in this case.
6.00E-01
4.00E-01 -
2.00E-01 -
0.00E+00
-2.00E-01
-4.00E-01
-6.00E-01
Figure 2.5 A time series with seasonal effect
All other kinds of fluctuations are grouped into a category called irregular or random
component. This refers to the erratic motions of time series due to the unusual and
unpredictable events, such as natural disasters or wars. Although the duration of the
irregulars is usually short, it may be severe in amplitude.
14
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A time series may contain none or any possible combination of the four components.
The analysis of a time series consists of a description of the components present. There
are many ways to formulate a model of a time series. The two most common
mathematical models are:
• Additive Model: Y = T + C + S + R, and
• Multiplicative Model: Y=T x C x S x R,
where: T: the trend component
C: the cyclical component
S: the seasonal component
R: the random component
In practice, mixtures of multiplicative and addictive are also possible. The additive
model can be easier to handle but the multiplicative model may often be more
appropriate. In practice the decision as to which method of decomposition should be
assumed, depends on the degree of success achieved in applying the assumption [Ru95].
A multiplicative model may be handled within the additive framework by taking
logarithms of the components. The seasonal effect in Figure 2.4 is multiplicative.
Distinguishing between the components is usually not easy. Often the components are
so integrated that they are inseparable [Ru95].
The trend, cyclical and seasonal components are considered deterministic while the
random component is at best probabilistic. Accurate forecast of future values can be
expected only when the random variation, as measured by its variance, is small.
Otherwise, the fluctuations of the random variation over time may overwhelm the effect
of the other components or even cancel them out entirely [Ru95].
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A time series may contain none or any possible combination of the four components.
The analysis of a time series consists of a description of the components present. There
are many ways to formulate a model of a time series. The two most common
mathematical models are:
• Additive Model: Y = T + C + S + R, and
• Multiplicative Model: Y = T x C x S x R ,
where: T: the trend component
C: the cyclical component
S: the seasonal component
R: the random component
In practice, mixtures of multiplicative and addictive are also possible. The additive
model can be easier to handle but the multiplicative model may often be more
appropriate. In practice the decision as to which method of decomposition should be
assumed, depends on the degree of success achieved in applying the assumption [Ru95].
A multiplicative model may be handled within the additive framework by taking
logarithms of the components. The seasonal effect in Figure 2.4 is multiplicative.
Distinguishing between the components is usually not easy. Often the components are
so integrated that they are inseparable [Ru95].
The trend, cyclical and seasonal components are considered deterministic while the
random component is at best probabilistic. Accurate forecast of future values can be
expected only when the random variation, as measured by its variance, is small.
Otherwise, the fluctuations of the random variation over time may overwhelm the effect
of the other components or even cancel them out entirely [Ru95].
15
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.2.3 Performance Criteria
There are many different ways to define prediction error, each of them has advantages
and disadvantages and is used in different circumstances. Generally speaking, the closer
the forecasts yt are to the actual values yt of the series, the more accurate the forecasting
model is.
The most fundamental way to measure error is to calculate the difference between
actual and forecast values. The result is called absolute true error (ATE).
ATE = yt - 57t
The weakness of ATE is that it does not give any idea of how serious the error is relative
to the magnitude of the variable to be predicted. For example, an error of a meter is
unlikely to be a problem when estimating the dimensions of a wheat field, but it could be
significant when estimating the dimensions of a table.
Relative true error (RTE) gives us an idea of the magnitude of error. It is defined as a
fraction between an absolute true error and an actual value.
RTE = (yt - ST) yt
One drawback of using RTE arises when the actual value is extremely small, since the
division by this value will tend to seriously inflate RTE.
Based on the above two basic error measurements, the following are the most
commonly used measures of forecast accuracy.
• Mean Absolute Error (MAE) is defined as the average of the magnitudes of the
absolute true errors.
11 x---1MAE = — LlYt — Yil
n t=1
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.2.3 Performance Criteria
There are many different ways to define prediction error, each of them has advantages
and disadvantages and is used in different circumstances. Generally speaking, the closer
the forecasts yt are to the actual values yt of the series, the more accurate the forecasting
model is.
The most fundamental way to measure error is to calculate the difference between
actual and forecast values. The result is called absolute true error (ATE).
ATE = yt - yt
The weakness of ATE is that it does not give any idea of how serious the error is relative
to the magnitude of the variable to be predicted. For example, an error of a meter is
unlikely to be a problem when estimating the dimensions of a wheat field, but it could be
significant when estimating the dimensions of a table.
Relative true error (RTE) gives us an idea of the magnitude of error. It is defined as a
fraction between an absolute true error and an actual value.
RTE = (yt - yt) / yt
One drawback of using RTE arises when the actual value is extremely small, since the
division by this value will tend to seriously inflate RTE.
Based on the above two basic error measurements, the following are the most
commonly used measures of forecast accuracy.
• Mean Absolute Error (MAE) is defined as the average of the magnitudes of the
absolute true errors.
MAE = - ^ \ y , - y , \n ,=i
16
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Mean Absolute Percentage Error (MAPE) is defined as the average of the magnitudes
of the relative true error
1 x-In MAPE =— L
n
Yr - 9 t
Yt (100%)
• Mean Square Error (MSE) is defined as the mean square of any residual
MSE =— 2 i (y, — Yr) 2n 1.1
• Root Mean Square Error (RMSE) is defined as the positive square root of the mean
square error. It is also called the standard error of estimate.
RMSE = ÷ 9,)2n
The basic difference between MAE and MSE (or RMSE) is that the latter squares the
amount of the ATE and by doing so it penalizes large errors more heavily than the former
does. Thus, MAE is an appropriate measure of forecast accuracy when the costs of
forecast errors increase linearly with the size of error while the MSE and RMSE are
better if costs for large error are expensive [Ru95].
Whereas the MAE, MSE and RMSE have dimensions, the MAPE is unit-less.
Therefore, it is particularly useful for comparing the performance of a model on many
different time series. However, due to the drawback of the RTE component in the MAPE
equation, it is not advisable to use MAPE in the circumstances where a series has
extremely small terms.
17
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Mean Absolute Percentage Error (MAPE) is defined as the average of the magnitudes
of the relative true error
• Mean Square Error (MSE) is defined as the mean square of any residual
M S E = - j ^ ( y , - y , ) 1 n ,=i
• Root Mean Square Error (RMSE) is defined as the positive square root of the mean
square error. It is also called the standard error of estimate.
The basic difference between MAE and MSE (or RMSE) is that the latter squares the
amount of the ATE and by doing so it penalizes large errors more heavily than the former
does. Thus, MAE is an appropriate measure of forecast accuracy when the costs of
forecast errors increase linearly with the size of error while the MSE and RMSE are
better if costs for large error are expensive [Ru95].
Whereas the MAE, MSE and RMSE have dimensions, the MAPE is unit-less.
Therefore, it is particularly useful for comparing the performance of a model on many
different time series. However, due to the drawback of the RTE component in the MAPE
equation, it is not advisable to use MAPE in the circumstances where a series has
extremely small terms.
MAPE = - Y Jt (100%)n ,=i y t
RMSE
17
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.2.4 Time Series Forecast Techniques
There is no technique that can be applied in any situation. A few of the widely used
forecasting techniques for univariate time series are outlined below
[0d93][Ch75][CNO3].
2.2.4.1 Simple Exponential Smoothing
Smoothing techniques remove random variation and shows trends and cyclic
components. Simple exponential smoothing method can be applied only to stationary
time series, that is time series with trend and seasonal effects removed. In this technique,
a weighted average of recently observed values of the variable of interest is used as a
forecast.
i(n,l) = co xn + + c2xn _2 +... (2.1)
Weights { ci from the most recent to the older values are calculated as exponentially
decreasing values. Weights are expressed as below in order to have a total sum of one.
ci = i = 0,1,...
where a is a constant in the open range (0,1). The (2.1) becomes
= ax„ + a(1— a)xn_i + a(1— a) 2 X n-2 ± • • • (2.2)
or
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2.2.4 Time Series Forecast Techniques
There is no technique that can be applied in any situation. A few of the widely used
forecasting techniques for univariate time series are outlined below
[Od93] [Ch75] [CN03].
2.2.4.1 Simple Exponential Smoothing
Smoothing techniques remove random variation and shows trends and cyclic
components. Simple exponential smoothing method can be applied only to stationary
time series, that is time series with trend and seasonal effects removed. In this technique,
a weighted average of recently observed values of the variable of interest is used as a
forecast.
x(n, 1) = c0xn + cxxn_x + c2xn_ 2 +... (2.1)
Weights {ci} from the most recent to the older values are calculated as exponentially
decreasing values. Weights are expressed as below in order to have a total sum of one.
Ci= a (l-a ) \ i = 0,1,...
where a is a constant in the open range (0,1). The (2.1) becomes
x(n, 1) = axn + a ( 1 - a )x n_x + a ( 1 - a ) 2 xn_2 +... (2.2)
or
18
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
i(n,l) = °Lyn + (1— + a(1—a)x„_2 + ..1= can + (1— a)Sc(n —1,1) (2.3)
If we set .Z(1,1) = x1 , then equation (2.3) can be used recursively to compute forecasts. A
forecast is calculated based on the latest observation and the previous forecast. The
choice of a is made to minimize the MSE on past data.
This method takes little time to develop, requires minimal amount of data, and is
easily understood by users. It is fully automatic. There is no need for expert opinion in
developing such a model. Simple exponential smoothing is widely used in immediate and
short-term forecasting because this method is fast and relatively inexpensive.
2.2.4.2 Holt-Winters
This method is a more sophisticated and generalized version of exponential smoothing in
which allowance is made for trend and seasonal patterns in the data. The Holt-Winters
method has three updating equations to smooth three components: level or overall, trend
and seasonal effect. The equations are intended to give more weight to recent
observations and less weight to observations further in the past. These weights are
geometrically decreasing by a constant ratio. Each equation has coefficients based on
constants that range from 0 to 1.
The sets of equations are different for addictive and multiplicative models. For
multiplicative model, the basic equations are as follows.
int = oactist_s + (1 - a)(mt_i + rt-1) Overall smoothing
st = afixt/mt + (1 - Ast-s Seasonal smoothing
rt = y(mt_i + mt_i)+ (1 - y)rt_i Trend smoothing
.5(t,h)= (mi + hrt )s,_s_h„ Forecast
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
x(n, 1) = axn + { l - a ) [coc^ + a( 1 - a)xn_2 +... ] = axn + (1 - a)x(n -1,1) (2.3)
If we set x(l,l) = xp then equation (2.3) can be used recursively to compute forecasts. A
forecast is calculated based on the latest observation and the previous forecast. The
choice of a is made to minimize the MSE on past data.
This method takes little time to develop, requires minimal amount of data, and is
easily understood by users. It is fully automatic. There is no need for expert opinion in
developing such a model. Simple exponential smoothing is widely used in immediate and
short-term forecasting because this method is fast and relatively inexpensive.
2.2.4.2 Holt-Winters
This method is a more sophisticated and generalized version of exponential smoothing in
which allowance is made for trend and seasonal patterns in the data. The Holt-Winters
method has three updating equations to smooth three components: level or overall, trend
and seasonal effect. The equations are intended to give more weight to recent
observations and less weight to observations further in the past. These weights are
geometrically decreasing by a constant ratio. Each equation has coefficients based on
constants that range from 0 to 1.
The sets of equations are different for addictive and multiplicative models. For
multiplicative model, the basic equations are as follows.
mt = ooc/sts + (1 - cc)(mt.i + rt.j) Overall smoothing
st = oc(3xt/m t + (1 - Seasonal smoothing
r, = y(mt4 + mt-i) + (1 - y)rt.i Trend smoothing
x(t, h) = (mt + hrt )st_s+h Forecast
19
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
where
• x is the observation
• s is the smoothed observation
• r is the trend factor
• S is the seasonal index, or the number of observations covered by the seasonal period
• ic(t , h) is the forecast at h periods ahead
• t is an index denoting a time period
• a, 13 and y are constants that must be estimated in such a way that the MSE of the
error is minimized
This method has all the advantages of simple exponential smoothing, but it tends to
be more accurate.
2.2.4.3 Univariate Box-Jenkins
In Box-Jenkins approach, a class of models referred to as auto-regressive integrated with
moving average (ARIMA) is examined and an appropriate model is selected for
forecasting. In auto-regressive process, each observation is made up of a random error
component and a linear combination of prior observations. In moving average process,
each observation is made up of a random error component and a linear combination of
previous random error components. These two processes are independent from each
other.
The Box-Jenkins modeling procedure involved five steps: data preparation, model
selection, parameter estimation, model validation, and forecasting.
Step 1: Data preparation involves transformations and differencing.
Transformations operations such as square roots or logarithms can stabilize the variance
20
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
where
• x is the observation
• s is the smoothed observation
• r is the trend factor
• S is the seasonal index, or the number of observations covered by the seasonal period
• x(t, h) is the forecast at h periods ahead
• t is an index denoting a time period
• a , P and y are constants that must be estimated in such a way that the MSE of the
error is minimized
This method has all the advantages of simple exponential smoothing, but it tends to
be more accurate.
2.2.4.3 Univariate Box-Jenkins
In Box-Jenkins approach, a class of models referred to as auto-regressive integrated with
moving average (ARIMA) is examined and an appropriate model is selected for
forecasting. In auto-regressive process, each observation is made up of a random error
component and a linear combination of prior observations. In moving average process,
each observation is made up of a random error component and a linear combination of
previous random error components. These two processes are independent from each
other.
The Box-Jenkins modeling procedure involved five steps: data preparation, model
selection, parameter estimation, model validation, and forecasting.
Step 1: Data preparation involves transformations and differencing.
Transformations operations such as square roots or logarithms can stabilize the variance
20
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
in a series where the variation changes with the level. Then the data are differenced until
patterns such as trend or seasonality are totally removed from the data. Differencing
means taking the difference between consecutive observations or between observations a
time period apart. The stationary data are often easier to model than the original data.
Step 2: Model selection involves various graphs based on the transformed and
differenced data to try to identify potential ARIMA processes which might provide a
good fit to the data.
Step 3: Parameter estimation means finding the values of the model coefficients
that provide the best fit to the data. There are sophisticated computational algorithms
designed to do this.
Step 4: Model validation involves testing the assumptions of the model to identify
any areas where the model is inadequate. If the model is found to be inadequate, it is
necessary to go back to step 2 and try to identify a better model.
Step 5: Forecasting is the step after the model has been selected, estimated and
validated. In this step, forecasts are computed using the model.
The contribution of Box and Jenkins was in developing a systematic methodology for
identifying and estimating models that could incorporate both autoregressive and moving
average approaches. This makes Box-Jenkins models a powerful class of models. They
are quite flexible due to the inclusion of both autoregressive and moving average terms. It
is usually possible to find a process that provides an adequate description to the data
[Ch75]. This method provides the most accurate forecast for immediate and short-term
forecast [0d93].
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
in a series where the variation changes with the level. Then the data are differenced until
patterns such as trend or seasonality are totally removed from the data. Differencing
means taking the difference between consecutive observations or between observations a
time period apart. The stationary data are often easier to model than the original data.
Step 2: Model selection involves various graphs based on the transformed and
differenced data to try to identify potential ARIMA processes which might provide a
good fit to the data.
Step 3: Parameter estimation means finding the values of the model coefficients
that provide the best fit to the data. There are sophisticated computational algorithms
designed to do this.
Step 4: Model validation involves testing the assumptions of the model to identify
any areas where the model is inadequate. If the model is found to be inadequate, it is
necessary to go back to step 2 and try to identify a better model.
Step 5: Forecasting is the step after the model has been selected, estimated and
validated. In this step, forecasts are computed using the model.
The contribution of Box and Jenkins was in developing a systematic methodology for
identifying and estimating models that could incorporate both autoregressive and moving
average approaches. This makes Box-Jenkins models a powerful class of models. They
are quite flexible due to the inclusion of both autoregressive and moving average terms. It
is usually possible to find a process that provides an adequate description to the data
[Ch75]. This method provides the most accurate forecast for immediate and short-term
forecast [Od93].
21
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A disadvantage of Box-Jenkins approach is the requirement of experts' experience in
identifying a suitable model. Unlike Holt-Winters method, the Box-Jenkins process is not
automatic. Another disadvantage of ARIMA models is that there is not a convenient way
to update the model parameters when new observations arrive. This method also requires
a moderately long series to fit a model to the data. Montgomery [MLJ90] recommends at
least 50 and preferably 100 observations.
2.2.4.4 Memory Based Reasoning
Memory-Based Reasoning (MBR), also known as Case-Based Reasoning, is a form of K-
Nearest Neighbor (KNN) technique. MBR attempts to classify new data points by finding
their nearest neighbors in the state space. The concept of nearest neighbor is based on the
similarity between the pattern of interest and the patterns in the historical database.
A number of distance metrics such as Hamming and Euclidean can be used to
measure the level of similarity. There are several algorithms to reduce the search
complexity for the nearest neighbors. Two classical algorithms are bucketing and k-d
trees.
In bucketing algorithm, the space is divided into identical cells. The points in each
cell are stored in a list. The cells are examined in the order of increasing distance from
the query point. For each cell, the distance between the internal points and the query
point is computed. The search terminated when the distance from the query point to the
current cell is greater than the distance to the closest point already visited.
A k-d tree is a generation of a binary search tree in high dimensions. Each internal
node in a k-d tree is hyper-rectangle split by a hyper-plane orthogonal to one of the
coordinate axis. The hyper-plane divides the hyper-rectangle into two parts associated
22
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A disadvantage of Box-Jenkins approach is the requirement of experts’ experience in
identifying a suitable model. Unlike Holt-Winters method, the Box-Jenkins process is not
automatic. Another disadvantage of ARIMA models is that there is not a convenient way
to update the model parameters when new observations arrive. This method also requires
a moderately long series to fit a model to the data. Montgomery [MLJ90] recommends at
least 50 and preferably 100 observations.
2.2.4.4 Memory Based Reasoning
Memory-Based Reasoning (MBR), also known as Case-Based Reasoning, is a form of K-
Nearest Neighbor (KNN) technique. MBR attempts to classify new data points by finding
their nearest neighbors in the state space. The concept of nearest neighbor is based on the
similarity between the pattern of interest and the patterns in the historical database.
A number of distance metrics such as Hamming and Euclidean can be used to
measure the level of similarity. There are several algorithms to reduce the search
complexity for the nearest neighbors. Two classical algorithms are bucketing and k-d
trees.
In bucketing algorithm, the space is divided into identical cells. The points in each
cell are stored in a list. The cells are examined in the order of increasing distance from
the query point. For each cell, the distance between the internal points and the query
point is computed. The search terminated when the distance from the query point to the
current cell is greater than the distance to the closest point already visited.
A k-d tree is a generation of a binary search tree in high dimensions. Each internal
node in a k-d tree is hyper-rectangle split by a hyper-plane orthogonal to one of the
coordinate axis. The hyper-plane divides the hyper-rectangle into two parts associated
22
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
with the child nodes. The partitioning process continues until the number of points in
each hyper-rectangle is smaller than a given threshold. The purpose of k-d tree is to
partition the sample space according to the distribution of data. The partitioning is finer
where the density of points is higher. To locate the nearest neighbors of a query point,
first the tree was descended to find the data point that lie in the same hyper-rectangle as
the query point. Then it examines the surrounding cells if they overlap the sphere
centered at the query point and containing the closest data point so far.
23
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
with the child nodes. The partitioning process continues until the number of points in
each hyper-rectangle is smaller than a given threshold. The purpose of k-d tree is to
partition the sample space according to the distribution of data. The partitioning is finer
where the density of points is higher. To locate the nearest neighbors of a query point,
first the tree was descended to find the data point that lie in the same hyper-rectangle as
the query point. Then it examines the surrounding cells if they overlap the sphere
centered at the query point and containing the closest data point so far.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2 3
Node 1 A 0.
Node 2 I* 4 Node 3 4 -00-
Node 2
NodI 4
Node 5
Node 2
Node 4
4 lo 4 110-Node 6 Node 7
11 10-Node 3
Node 1
Node 5
Node 3
Node 6
Node 8
A
Node 9
7,
Node 7
Node 8
•
Node 7
Node 9
Figure 2.6 A sample k-d tree
MBR is considered a lazy learning algorithm because it defers the data processing
until it receives a request to classify an unlabeled point. No models are created. Also,
24
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Node 2
Node 1<------------------------------------- ►
Node 2 Node 3< ►
Node 5
< —X ---------- ►Node 6 Node 7
4ode 8
Node 7
’ 'lode 9
Node 3
Node
Node 1
Node 3
Node 7Node 4 Node 6
Node 2
Node 9
Figure 2.6 A sample k-d tree
MBR is considered a lazy learning algorithm because it defers the data processing
until it receives a request to classify an unlabeled point. No models are created. Also,
2 4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
confidence levels can be generated using relative distances to matching and non-
matching neighbors.
Farmer and Sidorowich [FS87] attempted to predict the behavior of a time series
generated by a chaotic system using MBR approach. The time series was transformed
into a reconstructed state space using a delay space embedding. In the delay space
embedding, each point in the state space is a vector X composed of time series values
corresponding to a sequence of d delay lags: x/(t) = x(t), x2(t) = x(t-tau), xd(t) = x(t -
(d-1) tau). The nearest k (> d) neighbors in the state space representation were then
located. A local linear map was created for the k neighbors and applied to the value to be
forecast. Although higher order mapping could be used, Farmer and Sidorowich did not
find significant improvements over the linear map. For chaotic time series, they found
this approach is more accurate than standard forecasting techniques such as global linear
autoregressive.
The advantages of MBR are its understandability and the low computational cost
during training. The disadvantages of this method are the requirement of more storage
space and higher computational cost on recall.
2.2.4.5 Artificial Neural Networks
The strategy of artificial neural networks is opposite to the lazy algorithm above.
Artificial neural networks compile the historical data into a model. It discards the training
data after a model has been built. Once a model has been developed, forecasts can be
computed quickly.
An artificial neural network (ANN) is a system composed of many interconnected
processing elements operating in parallel. Its function is determined by network structure,
25
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
confidence levels can be generated using relative distances to matching and non
matching neighbors.
Farmer and Sidorowich [FS87] attempted to predict the behavior of a time series
generated by a chaotic system using MBR approach. The time series was transformed
into a reconstructed state space using a delay space embedding. In the delay space
embedding, each point in the state space is a vector X composed of time series values
corresponding to a sequence of d delay lags: xj(t) = x(t), X2(t) = x(t-tau), ..., Xd(t) = x(t -
(d-1) tau). The nearest k (> d) neighbors in the state space representation were then
located. A local linear map was created for the k neighbors and applied to the value to be
forecast. Although higher order mapping could be used, Farmer and Sidorowich did not
find significant improvements over the linear map. For chaotic time series, they found
this approach is more accurate than standard forecasting techniques such as global linear
autoregressive.
The advantages of MBR are its understandability and the low computational cost
during training. The disadvantages of this method are the requirement of more storage
space and higher computational cost on recall.
2.2.4.5 Artificial Neural Networks
The strategy of artificial neural networks is opposite to the lazy algorithm above.
Artificial neural networks compile the historical data into a model. It discards the training
data after a model has been built. Once a model has been developed, forecasts can be
computed quickly.
An artificial neural network (ANN) is a system composed of many interconnected
processing elements operating in parallel. Its function is determined by network structure,
25
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
connection strengths, and the processing performed at computing elements or neurons.
Artificial neural networks are categorized based on their topology and learning rule.
ANN technique has been applied to an increasing number of real-world problems of
considerable complexity. They are especially prominent in solving problems that are too
complex for conventional technologies such as problems that do not have an algorithmic
solution or for which an algorithmic solution is too complex to be found. These problems
include pattern and trend recognition. With the remarkable ability to derive meaning from
complicated or imprecise data, neural networks can be used to extract patterns and detect
trends that are too complex for either humans or other computer techniques to notice. A
trained neural network can then be used to provide projections in new situations.
An ANN model correlates a dependent vector to an independent vector. Each vector
consists of one or more variables. In time series forecasting, the independent variables are
delay lags of the dependent variables. A number of adjoining data points of the time
series Xt-k+1, Xt-k+23 • • •, Xt form the input window or vector, and a point in the future,
Xt+m, is the output. Here k is the window size and m is the lead time. The lead time refers
to the period of time in the future for which a prediction is made. The value of m
depends on whether the forecasting is for long-term or short-term. If the prediction is
made two months ahead for example, then the lead time m is two months. The search for
k is based on domain knowledge and often requires several trial-and-error experiments.
ANN technique has the following advantages. Firstly, it has an ability to learn how to
do tasks based on training data. Secondly, it can capture non-linear structure. Thirdly,
parallel computations make it suitable for real time operation. Fourthly, it is fault tolerant
via redundant information coding. Finally, it is free from a priori selection of
26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
connection strengths, and the processing performed at computing elements or neurons.
Artificial neural networks are categorized based on their topology and learning rule.
ANN technique has been applied to an increasing number of real-world problems of
considerable complexity. They are especially prominent in solving problems that are too
complex for conventional technologies such as problems that do not have an algorithmic
solution or for which an algorithmic solution is too complex to be found. These problems
include pattern and trend recognition. With the remarkable ability to derive meaning from
complicated or imprecise data, neural networks can be used to extract patterns and detect
trends that are too complex for either humans or other computer techniques to notice. A
trained neural network can then be used to provide projections in new situations.
An ANN model correlates a dependent vector to an independent vector. Each vector
consists of one or more variables. In time series forecasting, the independent variables are
delay lags of the dependent variables. A number of adjoining data points of the time
series Xt-k+i, X^+2, •••, Xt form the input window or vector, and a point in the future,
Xt+m> is the output. Here k is the window size and m is the lead time. The lead time refers
to the period of time in the future for which a prediction is made. The value of m
depends on whether the forecasting is for long-term or short-term. If the prediction is
made two months ahead for example, then the lead time m is two months. The search for
k is based on domain knowledge and often requires several trial-and-error experiments.
ANN technique has the following advantages. Firstly, it has an ability to learn how to
do tasks based on training data. Secondly, it can capture non-linear structure. Thirdly,
parallel computations make it suitable for real time operation. Fourthly, it is fault tolerant
via redundant information coding. Finally, it is free from a priori selection of
26
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
mathematical models. Although it is necessary to select neural network parameters and
structures before training, this information is not as crucial as the mathematics model in
other statistical methods.
However, there are also some disadvantages of ANN technique. Firstly, the neural
interconnections and their logical meaning are complex and difficult to understand.
Secondly, the process of finding a suitable topology and identifying the network
parameters is empirical and often time-consuming. Thirdly, neural networks usually
require long training time on a serial computer simulation although the resultant network
can perform in real-time situations. Fourthly, a neural network can over-fit the data, i.e.
the network memorizes the training data but has low forecast ability. Finally, a neural
network can easily become stuck in a local minimum.
More details on ANN technique will be provided in the next chapter.
27
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
mathematical models. Although it is necessary to select neural network parameters and
structures before training, this information is not as crucial as the mathematics model in
other statistical methods.
However, there are also some disadvantages of ANN technique. Firstly, the neural
interconnections and their logical meaning are complex and difficult to understand.
Secondly, the process of finding a suitable topology and identifying the network
parameters is empirical and often time-consuming. Thirdly, neural networks usually
require long training time on a serial computer simulation although the resultant network
can perform in real-time situations. Fourthly, a neural network can over-fit the data, i.e.
the network memorizes the training data but has low forecast ability. Finally, a neural
network can easily become stuck in a local minimum.
More details on ANN technique will be provided in the next chapter.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2 7
Chapter 3
Methodology: Neural Networks and Multiple-Neural-Network
Framework
This chapter provides background literature on neural network methodology and
introduces the multiple neural network structure used in this project. Section 3.1 provides
some general ideas about neural networks. Section 3.2 focuses on back-propagation
learning procedure. Some practical considerations in neural network topology and
training are discussed in section 3.3. Literatures on multiple neural network approaches
are reviewed in section 3.4. Section 3.5 introduces the multiple-neural-network
framework including its motivation and structure. Lastly, session 3.6 discusses the tools
to develop neural network applications.
3.1 Background on Neural Networks
An artificial neural network (ANN), or neural network (NN) in short, is an information-
processing paradigm that is inspired by the way biological nervous systems, such as the
brain, process information. The key element of this paradigm is the novel structure of the
information processing system. It is composed of a large number of highly interconnected
processing elements called neurons working together to solve specific problems (Figure
3.1). Neural networks have the ability to learn by examples. An ANN is configured for a
specific application, such as pattern recognition or data classification, through a learning
process.
28
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 3
Methodology: Neural Networks and Multiple-Neural-Network
Framework
This chapter provides background literature on neural network methodology and
introduces the multiple neural network structure used in this project. Section 3.1 provides
some general ideas about neural networks. Section 3.2 focuses on back-propagation
learning procedure. Some practical considerations in neural network topology and
training are discussed in section 3.3. Literatures on multiple neural network approaches
are reviewed in section 3.4. Section 3.5 introduces the multiple-neural-network
framework including its motivation and structure. Lastly, session 3.6 discusses the tools
to develop neural network applications.
3.1 Background on Neural Networks
An artificial neural network (ANN), or neural network (NN) in short, is an information-
processing paradigm that is inspired by the way biological nervous systems, such as the
brain, process information. The key element of this paradigm is the novel structure of the
information processing system. It is composed of a large number of highly interconnected
processing elements called neurons working together to solve specific problems (Figure
3.1). Neural networks have the ability to learn by examples. An ANN is configured for a
specific application, such as pattern recognition or data classification, through a learning
process.
2 8
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Input Layer Hidden Layer Output Layer
Figure 3.1 A Multi-layer Artificial Neural Network
3.1.1 Artificial Neurons
Artificial neural networks typically consist of artificial neurons as shown in Figure 3.2.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Input Layer Hidden Layer Output Layer
Figure 3.1 A Multi-layer Artificial Neural Network
3.1.1 Artificial Neurons
Artificial neural networks typically consist of artificial neurons as shown in Figure 3.2.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
2 9
Dendrites Cell Body
Threshold
Summation
Axon
Figure 3.2 An Artificial Neuron
The artificial neuron is viewed as a node or cell body connected to other nodes via
links that correspond to axon-synapse-dendrite connections. Each link is associated with
a weight. Similar to a synapse in a biological neuron, the weight determines the influence
or strength of one node on another. If a weight is negative, then the connection is
inhibitory, i.e. decreasing the activity of the target unit; if it is positive, it has an
excitatory, i.e. activity enhancing, effect. The influence received from an input link is
called the weighted input and is the product of the corresponding input and the weight of
the link.
At each node, the following computational model is applied.
y = (wi x, +t)
30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
DendritesCell Body
Threshold
Axon
Summation
Figure 3.2 An Artificial Neuron
The artificial neuron is viewed as a node or cell body connected to other nodes via
links that correspond to axon-synapse-dendrite connections. Each link is associated with
a weight. Similar to a synapse in a biological neuron, the weight determines the influence
or strength of one node on another. If a weight is negative, then the connection is
inhibitory, i.e. decreasing the activity of the target unit; if it is positive, it has an
excitatory, i.e. activity enhancing, effect. The influence received from an input link is
called the weighted input and is the product of the corresponding input and the weight of
the link.
At each node, the following computational model is applied.
n
y = 6 ^ ( Wixi+ t)i =1
30
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Here wixi is a weighted input, t is the node's threshold or bias, y is the output and 0 is the
threshold function.
3.1.2 Transfer Functions
Each node combines the separate influences received on its input links into an overall
influence using a transfer function or activation function O. A transfer function is usually
non-linear.
One simple transfer function passes the sum of the weighted inputs through a
threshold function to determine the node's output. The output is either 0 or 1 depending
on whether the sum of the input is below or above the threshold value used by the
threshold function.
Other transfer functions include piece-wise linear, sigmoid and Gaussian, as showed
in Figure 3.3.
Threshold Piecewise Linear Sigmoid Gaussian
Figure 3.3 Activation functions
A linear activation function is often used for output units. For hidden units, the
logistic sigmoid function is by far probably the most frequently used in ANN. It is a
strictly increasing function that exhibits smoothness.
31
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Here W]X\ is a weighted input, t is the node’s threshold or bias, y is the output and 9 is the
threshold function.
3.1.2 Transfer Functions
Each node combines the separate influences received on its input links into an overall
influence using a transfer function or activation function 9. A transfer function is usually
non-linear.
One simple transfer function passes the sum of the weighted inputs through a
threshold function to determine the node’s output. The output is either 0 or 1 depending
on whether the sum of the input is below or above the threshold value used by the
threshold function.
Other transfer functions include piece-wise linear, sigmoid and Gaussian, as showed
in Figure 3.3.
Threshold Piecewise Linear Sigmoid Gaussian
Figure 3.3 Activation functions
A linear activation function is often used for output units. For hidden units, the
logistic sigmoid function is by far probably the most frequently used in ANN. It is a
strictly increasing function that exhibits smoothness.
31
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1 Y =
1+ e,„
In this equation, k is the slope factor.
This function has a desirable property that the gradient can be expressed as a simple
function of the output: y' (x) = ky(x)(1-y(x)). The gradient is used in the gradient descent
algorithm which is a part of back-propagation learning procedure.
3.2 Back-Propagation Learning Procedure
The backward-error-propagation procedure (back-propagation in short) is the most
widely used learning procedure for neural networks. This procedure is simple but
relatively efficient. The learning rule in this procedure is called the generalized delta rule.
Generalized delta rule does hill climbing by gradient descent.
3.2.1 Generalized Delta Rule and Gradient Descent
The delta rule developed by Widrow and Hoff (cited in [Ru95]) is one of the most
commonly used learning rules. For a given input vector, the output vector is compared to
the correct answer. The weights are then adjusted to reduce the difference if there is any.
It is an error correcting procedure. The change in weight from a unit in layer i to a unit in
layer j is given by
= = qgjoi
where ti is the learning rate, di is the desired output and oi is the actual output of the unit
in layer j, of represents the output of the unit in layer i, and 6 = dip; is sometimes called
the error at the unit in layer j.
The delta rule works well for neural networks without hidden layers. However, with
hidden layers, the desired outputs of the hidden units are not known, and in fact can only
32
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
13; = -j —1 + e
In this equation, k is the slope factor.
This function has a desirable property that the gradient can be expressed as a simple
function of the output: y ’(x) = ky(x)(l-y(xj). The gradient is used in the gradient descent
algorithm which is a part of back-propagation learning procedure.
3.2 Back-Propagation Learning Procedure
The backward-error-propagation procedure (back-propagation in short) is the most
widely used learning procedure for neural networks. This procedure is simple but
relatively efficient. The learning rule in this procedure is called the generalized delta rule.
Generalized delta rule does hill climbing by gradient descent.
3.2.1 Generalized Delta Rule and Gradient Descent
The delta rule developed by Widrow and Hoff (cited in [Ru95]) is one of the most
commonly used learning rules. For a given input vector, the output vector is compared to
the correct answer. The weights are then adjusted to reduce the difference if there is any.
It is an error correcting procedure. The change in weight from a unit in layer i to a unit in
layer j is given by
A w ^ j = r}(dj-Oj)oi = rjSjOi
where t] is the learning rate, d j is the desired output and oj is the actual output of the unit
in layer j, o, represents the output of the unit in layer i, and dj = dj-Oj is sometimes called
the error at the unit in layer j.
The delta rule works well for neural networks without hidden layers. However, with
hidden layers, the desired outputs of the hidden units are not known, and in fact can only
32
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
be computed after the best set of weights has been found, thus the weight adjustments
cannot be calculated. Rumelhart et al. (cited in [Ru95]) developed a generalized form of
the delta rule that is suited for networks with hidden layers. They showed that the method
works for the class of semi-linear activation functions which is non-decreasing and
differentiable. To get the error generated from the output of a middle layer, the back-
propagate procedure backtracks through the middle layer to the units that are responsible
for generating that output. The error generated from the middle layer could be used with
the delta rule to adjust the weights.
The back-propagation procedure is based on the hill climbing process. Hill climbing
is a process of making small changes toward a solution. Each change makes the solution
slightly better, until no further improvements are possible. One way to do hill climbing is
to measure the effects of changing one weight at a time while keeping all the other
weights constant. Then, only the weight that does the most good will be changed.
However, better performance can be obtained if the hill is a sufficiently smooth function
of the weights. In this case, it is possible to proceed in the direction of the most rapid
performance improvement by varying all the weights simultaneously in proportion to
how much good is done by individual changes. This procedure is called gradient descent.
The mathematical explainations for the back-propagation learning procedure are
provided in the section below which is based mostly on Winston [Wi92].
3.2.2 Back-Propagation Formulae
The purpose of making adjustments to the weights is to improve the performance of the
network. One way to measure the performance of a network is to calculate the negative of
total square error:
33
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
be computed after the best set of weights has been found, thus the weight adjustments
cannot be calculated. Rumelhart et al. (cited in [Ru95]) developed a generalized form of
the delta rule that is suited for networks with hidden layers. They showed that the method
works for the class of semi-linear activation functions which is non-decreasing and
differentiable. To get the error generated from the output of a middle layer, the back-
propagate procedure backtracks through the middle layer to the units that are responsible
for generating that output. The error generated from the middle layer could be used with
the delta rule to adjust the weights.
The back-propagation procedure is based on the hill climbing process. Hill climbing
is a process of making small changes toward a solution. Each change makes the solution
slightly better, until no further improvements are possible. One way to do hill climbing is
to measure the effects of changing one weight at a time while keeping all the other
weights constant. Then, only the weight that does the most good will be changed.
However, better performance can be obtained if the hill is a sufficiently smooth function
of the weights. In this case, it is possible to proceed in the direction of the most rapid
performance improvement by varying all the weights simultaneously in proportion to
how much good is done by individual changes. This procedure is called gradient descent.
The mathematical explainations for the back-propagation learning procedure are
provided in the section below which is based mostly on Winston [Wi92],
3.2.2 Back-Propagation Formulae
The purpose of making adjustments to the weights is to improve the performance of the
network. One way to measure the performance of a network is to calculate the negative of
total square error:
33
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
P sZ —o,z )2 ) Z
where
P is the measured performance,
s is an index that ranges over sample inputs,
z is an index that ranges over all output nodes,
d, is the desired output for sample input s at the zth node,
o, is the actual output for sample input s at the zth node.
Hidden Layer i Hidden Layer j Hidden Layer k Output Layer z
Figure 3.4 Layers in a feed-forward neural network
The gradient descent rule suggests that the best improvement in performance is
achieved when all the weights are altered in proportion to the corresponding partial
derivative.
The partial derivative of the performance with respect to a particular weight can be
computed by adding up the partial derivative for each input pattern separately. Thus, we
34
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
where
P is the measured performance,
s is an index that ranges over sample inputs,
z is an index that ranges over all output nodes,
dsz is the desired output for sample input s at the zth node,
osz is the actual output for sample input s at the zth node.
The gradient descent rule suggests that the best improvement in performance is
achieved when all the weights are altered in proportion to the corresponding partial
derivative.
The partial derivative of the performance with respect to a particular weight can be
computed by adding up the partial derivative for each input pattern separately. Thus, we
Hidden Layer i Hidden Layer j Hidden Layer k Output Layer z
Figure 3.4 Layers in a feed-forward neural network
3 4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
can focus on an input pattern one at a time and then at the end each weight will be
adjusted by summing the adjustments derived from each input pattern.
Consider, then, the partial derivative
aP
where the weight, wi,i is a weight of a link connecting the i layer of nodes to the j layer
of nodes.
The effect of wi,i on performance P is through the intermediate variable of, the output
of the j node. Using the change rule:
aP aP aoj aoj aPawi, j aw,1 awi, j a0;
Now consider the first quotient on the right hand side. We know that oi is the result of
passing the sum of weighted inputs to an activation function, of = f (E i Wi-->j) •
Treating the sum as an intermediate variable o and apply the chain rule again:
aoj (If (0- j) au./ df (a ) df(6.) of =0 i
aVVI. j j aWi j j j
Substitute the result to the previous equation we have key equation (i)
aP df (6; ) aP .0. aw,, d6; a0 J
(i)
Note that the last quotient on the right hand side can be expressed in terms of the
partial derivatives, aPiaok, in the next layer k.
aP = y aP a0k ,Ia0k aP
a0 z—ia0 a0 ao ao k k i k J k
35
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
can focus on an input pattern one at a time and then at the end each weight will be
adjusted by summing the adjustments derived from each input pattern.
Consider, then, the partial derivative
dP
dw^ j
where the weight, w ^ j is a weight of a link connecting the i layer of nodes to the j layer
of nodes.
The effect of Wi_*j on performance P is through the intermediate variable oj, the output
of the j node. Using the change rule:
dP _ dP doj _ doj dPdw^j doj dnUj dw^j doj
Now consider the first quotient on the right hand side. We know that oj is the result of
passing the sum of weighted inputs to an activation function, Oj = / ( ^ . 0 ,-w ^ .).
Treating the sum as an intermediate variable O) and apply the chain rule again:
doj dfitTj) d a j df((Tj) d f (<T■) — = J- o. = oi —dWi^j d<7j dw ^ j d(Tj ' 1 d a j
Substitute the result to the previous equation we have key equation (i)
dP d f (a j) dP
° ‘ dOj do}
Note that the last quotient on the right hand side can be expressed in terms of the
partial derivatives, 3P/dok, in the next layer k.
dP _ -yi dP dok dok dPdoj k dok doj k doj dok
35
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
We know that ok = f (E o w ) where f is the activation function. Treat the sum as
an intermediate variable, ak, and apply the chain rule:
aOk df (a k ) auk _ df (a k )
1
df (o- k ) •
do ao, dak w k dak
Substituting this result back to the equation for apiao; yields the following key
equation (ii)
ap df (elk) aP ao1 k do-k aok
The two key equations (i) and (ii) have two important consequences. First, the partial
derivative of performance with respect to a weight depends on a partial derivative of
performance with respect to the following output. Second, the partial derivative of
performance with respect to one output depends on the partial derivatives of performance
with respect to the outputs in the next layer. That is the reason why we need to compute
error backward from the last layer to the initial layer.
The partial derivative of performance with respect to each output in the final layer is:
ap (—(c1,-002) = 2(d, —or) ao, — ao,
Using equation (iii) to compute backward, the only unsolved factor in equation (i) and
(ii) is the derivative of the activation function. For logistic sigmoid f (a) = 1 , the 1+ e -ka
derivative can be computed easily.
df (a) d 1 = [ ] = k(1+ e -ka ) -2 e -ka = kf (6)(1— f (a)) = ko(1— o)
du du 1+ e -ka
36
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
We know that ok = where f is the activation function. Treat the sum as
an intermediate variable, Ok, and apply the chain rule:
dok _ d f ( a k) d a k _ d f (<7k) ^ d f ( a k)doj d a k doj d a k J k J k d a k
Substituting this result back to the equation for dP/0Oj yields the following key
equation (ii)
dP d f (<7k) dPdoj k d a k dok
The two key equations (i) and (ii) have two important consequences. First, the partial
derivative of performance with respect to a weight depends on a partial derivative of
performance with respect to the following output. Second, the partial derivative of
performance with respect to one output depends on the partial derivatives of performance
with respect to the outputs in the next layer. That is the reason why we need to compute
error backward from the last layer to the initial layer.
The partial derivative of performance with respect to each output in the final layer is:
dP d ( - { d - o zf )do, do
2(dz - o z) (iii)Z
Using equation (iii) to compute backward, the only unsolved factor in equation (i) and
(ii) is the derivative of the activation function. For logistic sigmoid /(e r) = ---- — , thel + e
derivative can be computed easily.
^ + e~kar 2e-ka - kf (a) ( 1 - f ( c7 )) = ko(l - o) dcr d a l + e
3 6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Finally, weight changes should depend on a learning rate parameter Denote
13„.aPiaor, and absorb the constants to the learning rate then we have the following set of
formulae.
(1) Awi, ; = rloi 0;(1-0,0 Rj
(2) 13; = wi_>k ok(1- ) Ok
(3) Pz = dz - oz
where
1 is the learning rate
oi, oi, ok are actual outputs of nodes in hidden layers i, j and k respectively
oz is the observed output of a node in output layer z
d, is the desired output of a node in output layer z
13„ is a factor that measures how beneficial a change is to a node in layer n
w;-> k is a weight of a link connecting a node from layer j to a node from layer k
Awi,i is the weight change from layer i to layer j
Formula (3) computes the benefit to an output node while formula (2) does the same
but for a hidden node. Formula (1) calculates the weight change from a node in layer i to
a node in layer j.
The back-propagation learning procedure using the above formulae is described in the
following session.
3.2.3 Back-Propagation Procedure
There are two phases in a back-propagation procedure. In the first phase called feed-
forward, output is calculated from an input pattern. In the second phase, it computes
changes to the weights in the final layer first, reuses much of the same computation to
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Finally, weight changes should depend on a learning rate parameter r\. Denote
Pn=0P/0on and absorb the constants to the learning rate then we have the following set of
formulae.
(1) AwM = riOiOj(l-Oj)Pj
(2) |3j = Sk wj_* ok(l-ok) pk
(3)pz= d z - o z
where
r\ is the learning rate
ok oj, ok are actual outputs of nodes in hidden layers i, j and k respectively
oz is the observed output of a node in output layer z
dz is the desired output of a node in output layer z
is a factor that measures how beneficial a change is to a node in layer n
Wj^k is a weight of a link connecting a node from layer j to a node from layer k
Aw,-)] is the weight change from layer i to layer j
Formula (3) computes the benefit to an output node while formula (2) does the same
but for a hidden node. Formula (1) calculates the weight change from a node in layer i to
a node in layer j.
The back-propagation learning procedure using the above formulae is described in the
following session.
3.2.3 Back-Propagation Procedure
There are two phases in a back-propagation procedure. In the first phase called feed
forward, output is calculated from an input pattern. In the second phase, it computes
changes to the weights in the final layer first, reuses much of the same computation to
37
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
compute changes to the weights in the previous layer, and ultimately returns to the initial
layer. This is what gives back propagation its name. Using formulae 1, 2 and 3 from
section 3.2.3, the procedure is described as follows.
• Let be the learning rate
• Set all weights and biases to small random values
• Until total error is small enough do
• For each input vector
• Do feed forward pass to get outputs
• Compute benefit 13 for output nodes using (3)
• Compute benefit 13 for hidden nodes, working from the last
layer to the first layer using (2)
• Compute and store weight changes for all weights using (1)
• Add up weight changes for all input vectors and change the weights
The procedure above is called batch back-propagation. In online back-propagation,
weights are updated after each input vector is fed instead of at the end of the whole cycle.
3.3 Considerations on Neural Network Topology and
Training Parameters
A major disadvantage of the neural network approach is that the process of finding a
suitable topology and identifying the network parameters is empirical and often time-
consuming. To develop the best neural network for a particular problem, both the
network topology and the network parameters need to be optimized. Bad choice of
38
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
compute changes to the weights in the previous layer, and ultimately returns to the initial
layer. This is what gives back propagation its name. Using formulae 1, 2 and 3 from
section 3.2.3, the procedure is described as follows.
• Let r| be the learning rate
• Set all weights and biases to small random values
• Until total error is small enough do
• For each input vector
• Do feed forward pass to get outputs
• Compute benefit (3 for output nodes using (3)
• Compute benefit (3 for hidden nodes, working from the last
layer to the first layer using (2)
• Compute and store weight changes for all weights using (1)
• Add up weight changes for all input vectors and change the weights
The procedure above is called batch back-propagation. In online back-propagation,
weights are updated after each input vector is fed instead of at the end of the whole cycle.
3.3 Considerations on Neural Network Topology and
Training Parameters
A major disadvantage of the neural network approach is that the process of finding a
suitable topology and identifying the network parameters is empirical and often time-
consuming. To develop the best neural network for a particular problem, both the
network topology and the network parameters need to be optimized. Bad choice of
38
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
parameters can cause network training to be stuck in a local minimum or oscillate around
a minimum.
The numbers of input and output variables in a causal model can often be determined
provided good domain knowledge is available. The outputs are usually the predicted
variables. The inputs are the parameters that influence the outputs and for which data are
either available or easy to obtain. Although it is feasible to use a neural network approach
without an analysis of the independent variables involved, it is generally desirable to
establish a priori the significant variables. This can simplify the data collection process
as well as the complexity of the networks, However, if data gathering and processing is
relatively inexpensive, a common approach is to input all available process parameters to
the network and then let it learn which variables are important. While it may take several
experiments involving different sets of input variables and a number of sensitivity tests
before the most significant input variables can be identified, this approach ensures all
possible variables are examined in the model. The number of hidden nodes is often
determined through trial-and-error process.
3.4 Literature Reviews: Multiple Neural Networks
Approaches
Multiple neural network method has been explored by several researchers. There are
various ways to integrate individual neural networks into one model, each way aims at a
different purpose. This section will review a few approaches that are not necessarily
related to time series modeling, but which explore different aspects related to neural
networks.
39
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
parameters can cause network training to be stuck in a local minimum or oscillate around
a minimum.
The numbers of input and output variables in a causal model can often be determined
provided good domain knowledge is available. The outputs are usually the predicted
variables. The inputs are the parameters that influence the outputs and for which data are
either available or easy to obtain. Although it is feasible to use a neural network approach
without an analysis of the independent variables involved, it is generally desirable to
establish a priori the significant variables. This can simplify the data collection process
as well as the complexity of the networks, However, if data gathering and processing is
relatively inexpensive, a common approach is to input all available process parameters to
the network and then let it learn which variables are important. While it may take several
experiments involving different sets of input variables and a number of sensitivity tests
before the most significant input variables can be identified, this approach ensures all
possible variables are examined in the model. The number of hidden nodes is often
determined through trial-and-error process.
3.4 Literature Reviews: Multiple Neural Networks
Approaches
Multiple neural network method has been explored by several researchers. There are
various ways to integrate individual neural networks into one model, each way aims at a
different purpose. This section will review a few approaches that are not necessarily
related to time series modeling, but which explore different aspects related to neural
networks.
39
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
For a given prediction problem, several neural network solutions can be obtained. The
network resulting in the least testing error is usually chosen. However, the network may
not be the optimum when it is applied to the whole population. Hashem et al. [HSY94]
proposed using optimal linear combinations of a number of trained neural networks
instead of using a single best network. Each component network can have a different
architecture and/or training parameters. Optimal linear combinations are constructed by
forming weighted sums of the corresponding outputs of the networks. The combination
weights are selected to minimize the mean squared error with respect to the distribution
of random inputs. Combining the trained networks may help integrate the knowledge
required by the component networks and thus improve model accuracy. From a neural
network perspective, combining the corresponding outputs of a number of trained
networks is similar to creating a large neural network in which the component neural
networks are sub-networks operating in parallel, and the combination weights are the
connection weights of the output layer.
Hashem et al. conducted an experiment to evaluate the effectiveness of using optimal
linear combination of neural networks in function approximation. Ten independent
training sets were generated using a chosen function. Each data set was used to train six
neural networks. At the end, ten replications of six trained neural networks were
produced. For each replication, combination weights were estimated. The results
demonstrated that the combinations of neural networks could substantially improve
model accuracy as compared to the approach using individual neural networks.
Another interesting observation in [HSY94] is that the model accuracy achieved by
combining the poorly trained component networks was better than that achieved by
40
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
For a given prediction problem, several neural network solutions can be obtained. The
network resulting in the least testing error is usually chosen. However, the network may
not be the optimum when it is applied to the whole population. Hashem et al. [HSY94]
proposed using optimal linear combinations of a number of trained neural networks
instead of using a single best network. Each component network can have a different
architecture and/or training parameters. Optimal linear combinations are constructed by
forming weighted sums of the corresponding outputs of the networks. The combination
weights are selected to minimize the mean squared error with respect to the distribution
of random inputs. Combining the trained networks may help integrate the knowledge
required by the component networks and thus improve model accuracy. From a neural
network perspective, combining the corresponding outputs of a number of trained
networks is similar to creating a large neural network in which the component neural
networks are sub-networks operating in parallel, and the combination weights are the
connection weights of the output layer.
Hashem et al. conducted an experiment to evaluate the effectiveness of using optimal
linear combination of neural networks in function approximation. Ten independent
training sets were generated using a chosen function. Each data set was used to train six
neural networks. At the end, ten replications of six trained neural networks were
produced. For each replication, combination weights were estimated. The results
demonstrated that the combinations of neural networks could substantially improve
model accuracy as compared to the approach using individual neural networks.
Another interesting observation in [HSY94] is that the model accuracy achieved by
combining the poorly trained component networks was better than that achieved by
40
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
combining the well-trained component networks. Hashem et al. interpreted this as an
indication that the effectiveness of the combined model does not depend on accuracy of
individual component neural networks. In the case where the component neural networks
were poorly trained, the combination weights play a significant role in the model. That is,
the poorly trained networks are given less weight. When the component networks were
well trained, the combination weights only assume the "fine-tuning" role.
The ensemble neural network system introduced by Hashem et al. can be used for
prediction problems only. Cho and Kim [CB95] presented a method using fuzzy integral
to combine multiple neural networks for classification problems. Unlike other methods
such as the majority votingl or the Borda count2, the proposed method not only combines
the results from the component networks but also considers the relative importance of
each network. For each new input datum, each trained component neural network
calculates the degree of certainty h that the object belongs to a class. Next, the degree of
importance g of each component network in the recognition of a class is calculated. The
fuzzy integral of each class is then computed based on the values of h and g. Finally, the
class with the largest integral value is chosen as the output class.
Cho and Kim conducted an experiment to recognize Arabic numerals, uppercase and
lowercase letters from handwriting characters. Three component neural networks with
different sizes of input vectors were trained. Each network reflects a different view of the
input from coarse to fine. The results show that the overall rates for correct classification
of the fuzzy integral is higher than those of the other methods, including the majority
voting or Borda count methods as well as individual networks. In some cases, the fuzzy
I The majority voting rule chooses the classification made by more than half of the networks.
41
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
combining the well-trained component networks. Hashem et al. interpreted this as an
indication that the effectiveness of the combined model does not depend on accuracy of
individual component neural networks. In the case where the component neural networks
were poorly trained, the combination weights play a significant role in the model. That is,
the poorly trained networks are given less weight. When the component networks were
well trained, the combination weights only assume the "fine-tuning" role.
The ensemble neural network system introduced by Hashem et al. can be used for
prediction problems only. Cho and Kim [CB95] presented a method using fuzzy integral
to combine multiple neural networks for classification problems. Unlike other methods
1 9such as the majority voting or the Borda count , the proposed method not only combines
the results from the component networks but also considers the relative importance of
each network. For each new input datum, each trained component neural network
calculates the degree of certainty h that the object belongs to a class. Next, the degree of
importance g of each component network in the recognition of a class is calculated. The
fuzzy integral of each class is then computed based on the values of h and g. Finally, the
class with the largest integral value is chosen as the output class.
Cho and Kim conducted an experiment to recognize Arabic numerals, uppercase and
lowercase letters from handwriting characters. Three component neural networks with
different sizes of input vectors were trained. Each network reflects a different view of the
input from coarse to fine. The results show that the overall rates for correct classification
of the fuzzy integral is higher than those of the other methods, including the majority
voting or Borda count methods as well as individual networks. In some cases, the fuzzy
1 T h e m ajority v o tin g rule c h o o se s the c la ss if ic a tio n m ade b y m ore than h a lf o f the netw orks.
41
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
integral made correct decisions although the partial decisions from individual component
networks were completely inconsistent.
Lee [Le96] focused more on the data and its distribution. Lee introduced a multiple
neural network approach in which each network handles a different set of the input data
with different weights. In this approach, a sub-network is created when the main network
has little confidence in its decision. The main network and sub-network share the same
input vector but each network has its own hidden and output layers. The main network
handles most of the cases while sub-networks handle more or less irregular parts of the
data. The algorithm works as follows. First a confidence level between 0 and 1 was
chosen. A system with the confidence level of 1.0 is equivalent to a general multilayer
neural network. After the main neural network is trained, the confidence level of each
training data point is evaluated. If it is unsatisfactory according to the chosen confidence
level, the data point is extracted and moved to the unsatisfactory data set. When all the
training data points have been examined, a sub-network is generated using the
unsatisfactory data set. The procedure is repeated until the unsatisfactory data set reached
the pre-defined minimum size, or until it reached the depth limit. The output of the
system is selected among multiple outputs from the neural networks using a preference
vector. A preference vector has the form P = (pi, ..., pa), where each pi is the preference
value for the network Ni and pi has the value of 0 or 1. The best output is chosen as the
one with the best confidence among all the outputs with its preference vector of value 1.
An application of letter recognition from phonemes was developed in [Le96] based
on the proposed approach. The results indicate that the performance of the proposed
2 The Borda count of a class is the sum of the number of classes ranked below that class by each network. The class of which the Borda count is the largest is chosen.
42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
integral made correct decisions although the partial decisions from individual component
networks were completely inconsistent.
Lee [Le96] focused more on the data and its distribution. Lee introduced a multiple
neural network approach in which each network handles a different set of the input data
with different weights. In this approach, a sub-network is created when the main network
has little confidence in its decision. The main network and sub-network share the same
input vector but each network has its own hidden and output layers. The main network
handles most of the cases while sub-networks handle more or less irregular parts of the
data. The algorithm works as follows. First a confidence level between 0 and 1 was
chosen. A system with the confidence level of 1.0 is equivalent to a general multilayer
neural network. After the main neural network is trained, the confidence level of each
training data point is evaluated. If it is unsatisfactory according to the chosen confidence
level, the data point is extracted and moved to the unsatisfactory data set. When all the
training data points have been examined, a sub-network is generated using the
unsatisfactory data set. The procedure is repeated until the unsatisfactory data set reached
the pre-defined minimum size, or until it reached the depth limit. The output of the
system is selected among multiple outputs from the neural networks using a preference
vector. A preference vector has the form P = (pi, ..., p n), where each p t is the preference
value for the network A, and p, has the value of 0 or 1. The best output is chosen as the
one with the best confidence among all the outputs with its preference vector of value 1.
An application of letter recognition from phonemes was developed in [Le96] based
on the proposed approach. The results indicate that the performance of the proposed
2 T h e B orda co u n t o f a c la ss is the su m o f the num ber o f c la sse s ranked b e lo w that c la ss b y ea ch netw ork. T h e c la ss o f w h ich the B ord a co u n t is the largest is ch o sen .
42
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
approach is better than a general multi-layer neural network. The proposed approach
achieved 10% improvement on the rate of correct classification of the test data. In the
experiment, a two level generation process was used. The preference vector of (1,1) gave
the best result. The confidence level of 0.9 shows the best performance in the
generalization. The new approach also improved the vowel recognizing ability, and the
vowel recognition error reduced from 27% to 15%, while the correctness rate increased
from 87% to 93%.
The advantages of the proposed approach include: (1) it provides a way to handle the
problem of local minima, (2) each network specializes in only a subset of the data
distribution while passing the unsatisfactory instances to the next sub-network to handle,
and (3) it reduces the training time compared with a neural network which has the same
number of hidden units.
Kadaba et al. [KNJK89] developed a multiple neural network system to improve
accuracy by decreasing the input and output cardinalities. They used back-propagation
self-organizing networks to compress data records and then used the concentrated low-
cardinality data records to feed a classification neural network. In their case study, a
multiple neural network system was developed to select the appropriate combination of
selection rules and insertion rules for the Traveling Salesman Problem (TSP). Since there
are 6 selection rules and 6 insertion rules, there are 36 combinations. Hence, if a single
neural network is used, there should be 36 output variables. Each input point is also
represented with a vector of length 30. Kadaba et al. used two self-organizing neural
networks to compress both the original input and output vectors into vectors of length 4.
The multiple neural network system achieved an accuracy rate of 94% in contrast to the
43
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
approach is better than a general multi-layer neural network. The proposed approach
achieved 10% improvement on the rate of correct classification of the test data. In the
experiment, a two level generation process was used. The preference vector of (1,1) gave
the best result. The confidence level of 0.9 shows the best performance in the
generalization. The new approach also improved the vowel recognizing ability, and the
vowel recognition error reduced from 27% to 15%, while the correctness rate increased
from 87% to 93%.
The advantages of the proposed approach include: (1) it provides a way to handle the
problem of local minima, (2) each network specializes in only a subset of the data
distribution while passing the unsatisfactory instances to the next sub-network to handle,
and (3) it reduces the training time compared with a neural network which has the same
number of hidden units.
Kadaba et al. [KNJK89] developed a multiple neural network system to improve
accuracy by decreasing the input and output cardinalities. They used back-propagation
self-organizing networks to compress data records and then used the concentrated low-
cardinality data records to feed a classification neural network. In their case study, a
multiple neural network system was developed to select the appropriate combination of
selection rules and insertion rules for the Traveling Salesman Problem (TSP). Since there
are 6 selection rules and 6 insertion rules, there are 36 combinations. Hence, if a single
neural network is used, there should be 36 output variables. Each input point is also
represented with a vector of length 30. Kadaba et al. used two self-organizing neural
networks to compress both the original input and output vectors into vectors of length 4.
The multiple neural network system achieved an accuracy rate of 94% in contrast to the
43
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
single high-dimension back-propagation neural networks, which only managed to give an
accuracy rate of 10-20%.
The work by Duhoux et al. [DSMV01] is most relevant to the study area of this paper.
Duhoux et al. compared two methods for long-term prediction with neural network
chains. The classical method makes predictions in one-step-ahead recursively. In this
method, only a single one-step-ahead neural network is trained and it is used iteratively p
times to predict p step ahead. The input is shifted correspondingly at each iteration step.
The proposed method, on the other hand, uses p different trained neural networks with
different sizes of input vectors ranging from 1 to p. The input of a network is the same as
that of the previous network plus the predicted output from the previous network.
Dohoux et al. conducted an experiment to predict the hot metal temperature in the
industrial furnace installation three hours in advance. Since each data sample was taken
every 15 minutes, a prediction has to be made 12 steps ahead in the three-hour interval.
Twelve neural networks were used which predict the temperature from 1 to 12 steps
ahead. Four out of 35 measured variables including the temperature itself were relevant
as input. Twelve previous samples of each of the four signals were used as input for the
first neural network. Each subsequent neural network adds an extra input which is the
output of the previous networks. Hence, the input of the 12th neural network includes 24
samples of temperature signal. The training set contains 1300 data points and the testing
set contains 500 data points. The proposed model gave much better results than the
recursive model. Duhoux et al. also reported on some disadvantages of the proposed
method to include (1) the model requires a large amount of neural networks and input
variables, and (2) priori knowledge about the signal tendencies is not used.
44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
single high-dimension back-propagation neural networks, which only managed to give an
accuracy rate of 10-20%.
The work by Duhoux et al. [DSMV01] is most relevant to the study area of this paper.
Duhoux et al. compared two methods for long-term prediction with neural network
chains. The classical method makes predictions in one-step-ahead recursively. In this
method, only a single one-step-ahead neural network is trained and it is used iteratively p
times to predict p step ahead. The input is shifted correspondingly at each iteration step.
The proposed method, on the other hand, uses p different trained neural networks with
different sizes of input vectors ranging from 1 to p. The input of a network is the same as
that of the previous network plus the predicted output from the previous network.
Dohoux et al. conducted an experiment to predict the hot metal temperature in the
industrial furnace installation three hours in advance. Since each data sample was taken
every 15 minutes, a prediction has to be made 12 steps ahead in the three-hour interval.
Twelve neural networks were used which predict the temperature from 1 to 12 steps
ahead. Four out of 35 measured variables including the temperature itself were relevant
as input. Twelve previous samples of each of the four signals were used as input for the
first neural network. Each subsequent neural network adds an extra input which is the
output of the previous networks. Hence, the input of the 12th neural network includes 24
samples of temperature signal. The training set contains 1300 data points and the testing
set contains 500 data points. The proposed model gave much better results than the
recursive model. Duhoux et al. also reported on some disadvantages of the proposed
method to include (1) the model requires a large amount of neural networks and input
variables, and (2) priori knowledge about the signal tendencies is not used.
44
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In summary, several approaches has been proposed to improve the power of neural
networks by integrating them. Hashem et al. [HSY94] and Cho and Kim [CB95] use
parallel neural networks to improve accuracy of prediction or classification. Lee [Le96]
uses neural networks in a several level hierarchy to deal with data distribution. Kadaba
[KNJK89] uses supplemental neural networks to compress input and output dimensions.
These methods could be useful in making a direct forecast but are not relevant to time
series forecasting. The work by Duhoux et al. [DSMV01] is most related to long term
forecasting and hence most relevant to our work. However, it is impossible to apply this
method into our application because it would require too many neural networks since the
number of steps ahead to be predicted is high. Nevertheless, this work suggests that we
could develop different neural networks for different prediction terms ahead.
Our neural network approach will be introduced in the next section.
3.5 The Proposed Multiple Neural Network Approach
This session presents our multiple neural networks including the motivation and the
structure.
3.5.1 Motivation
A method that is suited for short-term forecasting may not be suited for long-term
forecasting. Tang et al. [TAF91] examined the ability of Box-Jenkins and neural
networks in short-term and long-term forecasting. The results of the experiments on the
three sets of data show that with one-period-ahead and six-period-ahead forecasts, the
Box-Jenkins models outperform the neural networks, while with the 12-period-ahead and
24-period-ahead forecasts, the neural network is superior. The relative performance of the
45
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
In summary, several approaches has been proposed to improve the power of neural
networks by integrating them. Hashem et al. [HSY94] and Cho and Kim [CB95] use
parallel neural networks to improve accuracy of prediction or classification. Lee [Le96]
uses neural networks in a several level hierarchy to deal with data distribution. Kadaba
[KNJK89] uses supplemental neural networks to compress input and output dimensions.
These methods could be useful in making a direct forecast but are not relevant to time
series forecasting. The work by Duhoux et al. [DSMV01] is most related to long term
forecasting and hence most relevant to our work. However, it is impossible to apply this
method into our application because it would require too many neural networks since the
number of steps ahead to be predicted is high. Nevertheless, this work suggests that we
could develop different neural networks for different prediction terms ahead.
Our neural network approach will be introduced in the next section.
3.5 The Proposed Multiple Neural Network Approach
This session presents our multiple neural networks including the motivation and the
structure.
3.5.1 Motivation
A method that is suited for short-term forecasting may not be suited for long-term
forecasting. Tang et al. [TAF91] examined the ability of Box-Jenkins and neural
networks in short-term and long-term forecasting. The results of the experiments on the
three sets of data show that with one-period-ahead and six-period-ahead forecasts, the
Box-Jenkins models outperform the neural networks, while with the 12-period-ahead and
24-period-ahead forecasts, the neural network is superior. The relative performance of the
45
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
neural network improves as the forecast horizon increases, which suggests that neural
network is a better choice for long-term forecasting.
One common problem with time series forecasting model is low accuracy of long
term forecasts. The estimated value of a variable may be reasonably reliable for short
terms into the future, but for longer terms, the estimate is liable to become less accurate.
There are several reasons to account for this inaccuracy. One reason is that the
environment in which the model was developed has changed over time and therefore the
assumptions held valid during the development process are no longer true after some
time. Another reason is that the model itself was not well developed. The inaccuracy
arises due to immature training or the lack of appropriate data for training. The trained
model may cover the surrounding neighborhood but fail to a model cyclic change of trend
or seasonal patterns of data. The third cause of inaccuracy is propagation errors during
recursive model predictions. Usually a model is built to predict one unit of time ahead
and used recursively 10 times when a 10-unit-ahead prediction is required. Every model
is likely to be associated with an error. For short-term prediction, the error can be less
than an acceptable threshold, but for long-term prediction, this error is accumulated and
can increase beyond the threshold.
The multiple-neural-network (MNN) approach presented in this study attempts to
deal with the third problem by reducing the number of recursions necessary. In this
approach, several neural networks built to predict from short- to long-term are combined
into one model.
The assumptions in this approach are as follows.
46
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
neural network improves as the forecast horizon increases, which suggests that neural
network is a better choice for long-term forecasting.
One common problem with time series forecasting model is low accuracy of long
term forecasts. The estimated value of a variable may be reasonably reliable for short
terms into the future, but for longer terms, the estimate is liable to become less accurate.
There are several reasons to account for this inaccuracy. One reason is that the
environment in which the model was developed has changed over time and therefore the
assumptions held valid during the development process are no longer true after some
time. Another reason is that the model itself was not well developed. The inaccuracy
arises due to immature training or the lack of appropriate data for training. The trained
model may cover the surrounding neighborhood but fail to a model cyclic change of trend
or seasonal patterns of data. The third cause of inaccuracy is propagation errors during
recursive model predictions. Usually a model is built to predict one unit of time ahead
and used recursively 10 times when a 10-unit-ahead prediction is required. Every model
is likely to be associated with an error. For short-term prediction, the error can be less
than an acceptable threshold, but for long-term prediction, this error is accumulated and
can increase beyond the threshold.
The multiple-neural-network (MNN) approach presented in this study attempts to
deal with the third problem by reducing the number of recursions necessary. In this
approach, several neural networks built to predict from short- to long-term are combined
into one model.
The assumptions in this approach are as follows.
46
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• The patterns that repeatedly appear in historical data will appear again in the future.
This is an assumption in time series modeling.
• The short-term and long-term trends are different. While this assumption is not
necessary, the MNN approach is most useful to address this assumed situation.
3.5.2 Structure of a Multiple Neural Network Model
A MNN model is a group of ANNs working together to perform a task. Each ANN is
developed to predict a different time period ahead. The prediction terms are powers of 2,
that is, the first ANN predicts 1 unit ahead, the second predicts 2 units ahead, the third
predicts 4 units ahead, and so forth. Hereafter an ANN that predicts 2n units ahead is
referred to as an n-ordered ANN. There are two reasons to support the choice of binary
exponential. First, big gaps between two consecutive neural networks are not desirable.
The smaller the gaps are, the fewer steps the model needs to take in order to make a
forecast. Secondly, binary exponential does not introduce bias on the roles of networks. A
higher exponential model tends to use more lower-ordered neural networks in order to
propagate ahead.
A MNN prediction model can be viewed as a single partially connected ANN as
illustrated in Figure 3.5. However, unlike a complex single ANN requiring a long time to
train, a MNN breaks down the training into sub-ANNs and trains separately. Figure 3.5
shows a sample MNN with two sub-ANNs.
47
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• The patterns that repeatedly appear in historical data will appear again in the future.
This is an assumption in time series modeling.
• The short-term and long-term trends are different. While this assumption is not
necessary, the MNN approach is most useful to address this assumed situation.
3.5.2 Structure of a Multiple Neural Network Model
A MNN model is a group of ANNs working together to perform a task. Each ANN is
developed to predict a different time period ahead. The prediction terms are powers of 2,
that is, the first ANN predicts 1 unit ahead, the second predicts 2 units ahead, the third
predicts 4 units ahead, and so forth. Hereafter an ANN that predicts 2n units ahead is
referred to as an n-ordered ANN. There are two reasons to support the choice of binary
exponential. First, big gaps between two consecutive neural networks are not desirable.
The smaller the gaps are, the fewer steps the model needs to take in order to make a
forecast. Secondly, binary exponential does not introduce bias on the roles of networks. A
higher exponential model tends to use more lower-ordered neural networks in order to
propagate ahead.
A MNN prediction model can be viewed as a single partially connected ANN as
illustrated in Figure 3.5. However, unlike a complex single ANN requiring a long time to
train, a MNN breaks down the training into sub-ANNs and trains separately. Figure 3.5
shows a sample MNN with two sub-ANNs.
4 7
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
MN N
Figure 3.5 A sample MNN model
To make a prediction, the neural network with the highest possible order is used first.
For example, to predict 7 units ahead, a 2-ordered neural network is used first. Assume
that the time at present is t. The values of xt and xt_t are already known. We wish to
predict xt+7. Let us denote the function that a n-ordered network models as f n and assume
that every network has two input variables, then the value of the output 7 units ahead is
computed as follows.
Xt+7 = f2(Xt+3, Xt+2)
= f2(fl(Xt-F1, Xt), fl(Xt, Xt-1))
= f2(fl (f0(Xt, Xt-1), Xt), fl (Xt, Xt-1))
The training of individual component neural networks can be dependent or
independent on the training of the other networks. In the development of a MNN, it may
be necessary to implement multi-step validation. One-step-ahead validation does not take
into account the model's sensitivity to errors that arise due to multi-step predictions
[MSV99]. In our MNN tool, multi-step validation was implemented, but the validation
window for each ANN is different. The validation error of the n-ordered network is
calculated as the average of root mean square errors (RMSE) of the (2")-step-ahead to the
48
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
MNN
O -ordered ANN
> o ..-
Figure 3.5 A sample MNN model
To make a prediction, the neural network with the highest possible order is used first.
For example, to predict 7 units ahead, a 2-ordered neural network is used first. Assume
that the time at present is t. The values of xt and xt-i are already known. We wish to
predict xt+7 . Let us denote the function that a n-ordered network models as /„ and assume
that every network has two input variables, then the value of the output 7 units ahead is
computed as follows.
Xt+7 = f 2 ( x t+3, X t+2 )
= f2(fi(xt+i, Xt) , fi(xt, X u ) )
= f2(fl(fo(Xt, X u ) , X t) , fi(xt, X u ) )
The training of individual component neural networks can be dependent or
independent on the training of the other networks. In the development of a MNN, it may
be necessary to implement multi-step validation. One-step-ahead validation does not take
into account the model’s sensitivity to errors that arise due to multi-step predictions
[MSV99], In our MNN tool, multi-step validation was implemented, but the validation
window for each ANN is different. The validation error of the n-ordered network is
calculated as the average of root mean square errors (RMSE) of the (2n)-step-ahead to the
4 8
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2n+1-1)-step-ahead. In order to calculate these steps, a higher ordered ANN needs to use
the prediction values of the lower ordered ANNs.
3.6 Tools
This session discusses the usage of the two tools used in our projects to develop neural
network applications.
3.6.1 NeurOn-Line Tool-kit
NeurOnline (NOL) from Gensym Corporation is adopted as the tool kit. It is a complete
graphical, object-oriented software tool kit for building neural network applications
which can be applied to dynamic environments. NOL includes facilities for managing
data sets, training the network, testing the fit between model and data, and deploying the
application in the operation environment. Using the NOL tool kit to develop ANN
applications is straightforward and typically involves three steps: cloning blocks,
connecting them and configuring their behavior. NOL is an application layer built on top
of the G2* Expert System shell. Thus it can be deployed and integrated with G2. Hybrid
neural, expert and fuzzy logic systems are simple to configure in NOL.
However, NOL also has some limitations. NOL implements only four types of neural
networks: Back-propagation, Radial Basis Function Networks, Rho Networks and Auto-
associative Networks. Users are allowed to configure only a small number of parameters,
which limits the ability of users to control over behaviors of the network constructed.
Also, while familiarity with G2 is not neccessary for using NOL, it is a requirement for
full utilization of the tool.
* Trademark of Gensym Corp. USA
49
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2n+1-l)-step-ahead. In order to calculate these steps, a higher ordered ANN needs to use
the prediction values of the lower ordered ANNs.
3.6 Tools
This session discusses the usage of the two tools used in our projects to develop neural
network applications.
3.6.1 NeurOn-Line Tool-kit
NeurOnline (NOL) from Gensym Corporation is adopted as the tool kit. It is a complete
graphical, object-oriented software tool kit for building neural network applications
which can be applied to dynamic environments. NOL includes facilities for managing
data sets, training the network, testing the fit between model and data, and deploying the
application in the operation environment. Using the NOL tool kit to develop ANN
applications is straightforward and typically involves three steps: cloning blocks,
connecting them and configuring their behavior. NOL is an application layer built on top
of the G2* Expert System shell. Thus it can be deployed and integrated with G2. Hybrid
neural, expert and fuzzy logic systems are simple to configure in NOL.
However, NOL also has some limitations. NOL implements only four types of neural
networks: Back-propagation, Radial Basis Function Networks, Rho Networks and Auto-
associative Networks. Users are allowed to configure only a small number of parameters,
which limits the ability of users to control over behaviors of the network constructed.
Also, while familiarity with G2 is not neccessary for using NOL, it is a requirement for
full utilization of the tool.
* T radem ark o f G en sy m C orp. U S A
49
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.6.2 Multiple Neural Network Tool
A tool was created to assists the development of multiple neural network applications.
This session provides the implementation and usage details of the tool.
The multiple-neural-network (MNN) tool was written in Java language using
JBuilder-4 development tool. Refer to Appendix A for instruction on how to start the
tool. The MNN system consists of two main parts: the user interface and the neural
network system. The user interface includes a number of screens for receiving input and
displaying output. The neural network system includes several classes implementing
methods for training and testing neural networks, as well as for making forecasts.
Ul
Cupid
NN SYSTEM Synapse
Figure 3.6 Classes of the neural network system of the MNN tool
The main classes implemented in the neural network system are illustrated in Figure
3.6. The Synapse class calculates weighted input of a neuron and performs weight
changes. The Neuron class includes functions to connect with another neuron in the
network and to activate transfer functions. Several neurons are connected together to
form a back-propagation neural network (BPNN) which is also a multi-layer perceptron.
The BPNN class implements methods for training, testing and forecasting. If the
50
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3.6.2 Multiple Neural Network Tool
A tool was created to assists the development of multiple neural network applications.
This session provides the implementation and usage details of the tool.
The multiple-neural-network (MNN) tool was written in Java language using
JBuilder-4 development tool. Refer to Appendix A for instruction on how to start the
tool. The MNN system consists of two main parts: the user interface and the neural
network system. The user interface includes a number of screens for receiving input and
displaying output. The neural network system includes several classes implementing
methods for training and testing neural networks, as well as for making forecasts.
MMf-iapf Perv&pftrM
3.-J:
NNA r r a y da*
BPMM
has
N
Neuron
DataSet
has
NN SYSTEM Synaps-e
Figure 3.6 Classes of the neural network system of the MNN tool
The main classes implemented in the neural network system are illustrated in Figure
3.6. The Synapse class calculates weighted input of a neuron and performs weight
changes. The Neuron class includes functions to connect with another neuron in the
network and to activate transfer functions. Several neurons are connected together to
form a back-propagation neural network (BPNN) which is also a multi-layer perceptron.
The BPNN class implements methods for training, testing and forecasting. If the
50
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
multiple-step-ahead validation mode is triggered, an individual network communicates
with other lower-ordered neural networks during the training process. There is also a
class that manages all the neural networks in an array. This class communicates with the
Data-Set class and activates the necessary methods in the component networks to conduct
overall training, testing and forecasts. The Data-Set class reads time series' points from
text files and creates training and testing records. The size of these records depends on
user-input parameters and the topology of each component neural network.
All neural networks in the current MNN system are multi-layer neural perceptrons
with only one hidden layer. The user can determine the neural network structures by
setting the parameters including the number of input, output, hidden units, etc.. The
connection weights and biases are initialized with small random values. Users can train a
MNN, test an existing MNN or use a MNN to make forecasts. Training is the heaviest
task in developing a neural network application. For each task, the tool first asks the user
to enter necessary parameters. Then it executes the task and displays the results.
The component neural networks in the system are trained with the back-propagation
algorithm. Each training cycle in this algorithm has two phases. In the first phase,
historical data records are fed one by one into the neural network. The signals go forward
from input to output to compute the output at each hidden and output unit. In the second
phase, the output values are compared with the expected values. If there is a difference,
the algorithm adjusts the connection weights of the neural network to minimize the
prediction error. The system repeats the training cycles until one of the following
scenarios: the error reaches an acceptable threshold, the number of cycles reaches a pre-
set maximum value, or over-fitting occurs.
51
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
multiple-step-ahead validation mode is triggered, an individual network communicates
with other lower-ordered neural networks during the training process. There is also a
class that manages all the neural networks in an array. This class communicates with the
Data-Set class and activates the necessary methods in the component networks to conduct
overall training, testing and forecasts. The Data-Set class reads time series’ points from
text files and creates training and testing records. The size of these records depends on
user-input parameters and the topology of each component neural network.
All neural networks in the current MNN system are multi-layer neural perceptrons
with only one hidden layer. The user can determine the neural network structures by
setting the parameters including the number of input, output, hidden units, etc.. The
connection weights and biases are initialized with small random values. Users can train a
MNN, test an existing MNN or use a MNN to make forecasts. Training is the heaviest
task in developing a neural network application. For each task, the tool first asks the user
to enter necessary parameters. Then it executes the task and displays the results.
The component neural networks in the system are trained with the back-propagation
algorithm. Each training cycle in this algorithm has two phases. In the first phase,
historical data records are fed one by one into the neural network. The signals go forward
from input to output to compute the output at each hidden and output unit. In the second
phase, the output values are compared with the expected values. If there is a difference,
the algorithm adjusts the connection weights of the neural network to minimize the
prediction error. The system repeats the training cycles until one of the following
scenarios: the error reaches an acceptable threshold, the number of cycles reaches a pre
set maximum value, or over-fitting occurs.
51
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Over-fitting happens when a neural network learns the training patterns well but have
poor generalization ability. Over-fitting is usually detected by dividing the historical data
into two sets. The training set is used to train the neural net. The validation set is used to
determine the performance of a neural network on patterns that are not used during
learning. Training and validation occurs simultaneous. When the error from validation
runs starts to increase, training is stopped for over-fitting has begun. In our
implementation, the MNN system determines over-fitting has occurred when the
validation error monotonically increases for a certain number of cycles.
The user interface of the system consists of two parts for input and output. The input
screens are shown in Figure 3.7 to Figure 3.10 and the output screens are shown in Figure
3.11 to Figure 3.13. Details of the input parameters and output are described below.
52
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Over-fitting happens when a neural network learns the training patterns well but have
poor generalization ability. Over-fitting is usually detected by dividing the historical data
into two sets. The training set is used to train the neural net. The validation set is used to
determine the performance of a neural network on patterns that are not used during
learning. Training and validation occurs simultaneous. When the error from validation
runs starts to increase, training is stopped for over-fitting has begun. In our
implementation, the MNN system determines over-fitting has occurred when the
validation error monotonically increases for a certain number of cycles.
The user interface of the system consists of two parts for input and output. The input
screens are shown in Figure 3.7 to Figure 3.10 and the output screens are shown in Figure
3.11 to Figure 3.13. Details of the input parameters and output are described below.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5 2
DiEnter Training Parameters
Enter Training Parameters
Load Parameters from File 1
Lead Time
Number of Input Variables
Number of Neural Networks 17—
Minimum Number of Training Cycles
Maximum Number of Training Cycles 14 0
Validation Interval
Validation Window Size
Using Multi-Validation
Training Data Filename ialMelfortFlowTrain.txt
Validation Data Filename IlfortFlovNalidation.td
Training Output Filename ktFlowTrainOutputtd
Edit Each Neural NetwbrIcraraiteters ,
Train Cancel I
MEM
Figure 3.7 Screen for inputting training parameters
Users have two choices for input, either from a parameter file or manually. The
parameter file is a text file with a specified structure. The training parameter file contains
values for all the input items in Figures 3.7 and 3.8.
Most of the input items in the input screens are self-explanatory. Some additional
explanations are provided as follows.
Training Parameters (Figure 3.7)
• Lead time: the number of steps ahead to be forecasted. This is an obscure parameter
and is to be eliminated in the later versions.
53
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
t n | x |
Enter Training Parameters
Load P aram eters from File I
Lead T im e I1
N u m b e r o f Input V ariab les |6
N u m be r o f N eura l N e tw orks p
M in im um N u m be r o f T ra in ing Cycles |o
M axim um N u m be r o f T ra in ing Cycles pOOO
V alida tion Interval |4
V a lida tion W ind ow Size j s o
U sing M ulti-Validation (*
T ra in ing Data F ilenam e [a/M elfortF lowTra in txt q | ;
V a lida tion Data F ilenam e |lfortF loW Validation txt q |T ra in ing Output F ilenam e JrrtF low Tra inO utputtrt < a |
T E d it E a c h N e u ra i N elw o rk P a ra m ete rs j
T ra in | C ance l |
Figure 3.7 Screen for inputting training parameters
Users have two choices for input, either from a parameter file or manually. The
parameter file is a text file with a specified structure. The training parameter file contains
values for all the input items in Figures 3.7 and 3.8.
Most of the input items in the input screens are self-explanatory. Some additional
explanations are provided as follows.
Training Parameters (Figure 3.7)
• Lead time: the number of steps ahead to be forecasted. This is an obscure parameter
and is to be eliminated in the later versions.
53
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Number of input variables or the size of input vector
• Number of neural networks in the MNN to be trained
• Minimum number of training cycles: In most case this value is set to zero. However,
in some cases the minimum number of training cycles need to be set to a number
greater than zero for the following reason. The validation error often increases at the
beginning of the training process and then decreases eventually. However, the MNN
tool may mistake the increasing error as over-fitting and halt the training. Enabling
the user to set the minimum number of training cycles ensures the MNN tool would
continue training pass this period without stopping due to misperceived over-fitting.
• Maximum number of training cycles: When the neural network has not yet over-fitted
the data and the validation error is still higher than the threshold set by users, the
MNN system would continue training until the maximum number of cycles is
reached.
• Validation interval: The validation interval is the interval between two consecutive
validations in terms of training cycles. For example, if the validation interval is 4,
validation is performed every 4 training cycles. The default value for this parameter is
1. Setting this parameter with a higher value reduces necessary training time because
validation errors are calculated less frequently during the training process.
• Validation window size: The validation window size is the number of consecutive
non-decreasing validation errors needed before the system decides that over-fitting
has occurred.
• Using multi-step validation: Users can choose one-step validation or multi-step
validation by deselecting or selecting this radio button. In one-step validation, the
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Number of input variables or the size of input vector
• Number of neural networks in the MNN to be trained
• Minimum number of training cycles: In most case this value is set to zero. However,
in some cases the minimum number of training cycles need to be set to a number
greater than zero for the following reason. The validation error often increases at the
beginning of the training process and then decreases eventually. However, the MNN
tool may mistake the increasing error as over-fitting and halt the training. Enabling
the user to set the minimum number of training cycles ensures the MNN tool would
continue training pass this period without stopping due to misperceived over-fitting.
• Maximum number of training cycles: When the neural network has not yet over-fitted
the data and the validation error is still higher than the threshold set by users, the
MNN system would continue training until the maximum number of cycles is
reached.
• Validation interval: The validation interval is the interval between two consecutive
validations in terms of training cycles. For example, if the validation interval is 4,
validation is performed every 4 training cycles. The default value for this parameter is
1. Setting this parameter with a higher value reduces necessary training time because
validation errors are calculated less frequently during the training process.
• Validation window size: The validation window size is the number of consecutive
non-decreasing validation errors needed before the system decides that over-fitting
has occurred.
• Using multi-step validation: Users can choose one-step validation or multi-step
validation by deselecting or selecting this radio button. In one-step validation, the
54
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
lead times for validation and training are the same. Each neural network is validated
by itself. In multi-step validation, the lead times for validation spread over longer
ranges (Refer to section 3). In the latter case, the training of higher-ordered neural
networks requires the existence of trained lower-ordered neural networks to calculate
the predicted output for a validation input vector. Hence, the training quality of the
higher-ordered networks depends on the training quality of the lower-ordered neural
networks.
• Training data file name
• Validation data file name
• Training output file name
Neural Network's Parameters
Neural network number
6* Load neural netNorkfrorn tile
File name bpn_O
Train neural network
Number of hidden units
Error Threshold
Learning rate
Momentum
OK Next
5
lo.o
Cancel
Figure 3.8. Screen for inputting training parameters of component neural network
Figure 3.8 illustrates the input screen for setting the training parameters of each
individual neural network. This screen for 0-ordered neural network is opened when the
55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
lead times for validation and training are the same. Each neural network is validated
by itself. In multi-step validation, the lead times for validation spread over longer
ranges (Refer to section 3). In the latter case, the training of higher-ordered neural
networks requires the existence of trained lower-ordered neural networks to calculate
the predicted output for a validation input vector. Hence, the training quality of the
higher-ordered networks depends on the training quality of the lower-ordered neural
networks.
• Training data file name
• Validation data file name
Training output file name
Edit Neural Network Param eters
Neural Network's ParametersN eura l ne tw ork num ber |0
f*- Load neura l ne tw ork from file
File nam e )ts_bpn_o|
T ra in neura l ne tw ork
N u m be r o f h idden un its p
E rro rT h re sh o ld |°-5
Learn ing rate
M om entum
0.2
joTo
OK Next C ancel
Figure 3.8. Screen for inputting training parameters of component neural network
Figure 3.8 illustrates the input screen for setting the training parameters of each
individual neural network. This screen for 0-ordered neural network is opened when the
55
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
user clicks the "Edit each neural network parameter" button in the screen shown in Figure
3.7. After setting the parameters for one neural network, the user can clicks the "Next"
button to advance to the next neural network.
Neural network parameters for training (Figure 3.8)
• Load neural network from file: User can continue to train a previously trained neural
network by selecting this radio button and entering the name of the neural network
file.
• Neural network file name: To be entered only when the "load neural network" radio
button is checked.
• Train neural network: since there are several neural networks in a MNN system, it is
possible that the user wants to continue training only one component network. In this
case, the user can choose whether to train a component neural network by select or
deselect this radio button in the respective screen for that network.
• Number of hidden units
• Error threshold (in percentage): When validation error is less than or equals this
value, the training is stopped.
• Learning rate: The learning rate is a scaling factor that decides how fast an algorithm
should learn. A higher learning rate improves the learning speed but if it is too high,
the algorithm will exceed the optimum weights.
• Momentum: The momentum adds a contribution from the previous step when a
weight is updated.
56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
user clicks the "Edit each neural network parameter" button in the screen shown in Figure
3.7. After setting the parameters for one neural network, the user can clicks the "Next"
button to advance to the next neural network.
Neural network parameters for training (Figure 3.8)
• Load neural network from file: User can continue to train a previously trained neural
network by selecting this radio button and entering the name of the neural network
file.
• Neural network file name: To be entered only when the "load neural network" radio
button is checked.
• Train neural network: since there are several neural networks in a MNN system, it is
possible that the user wants to continue training only one component network. In this
case, the user can choose whether to train a component neural network by select or
deselect this radio button in the respective screen for that network.
• Number of hidden units
• Error threshold (in percentage): When validation error is less than or equals this
value, the training is stopped.
• Learning rate: The learning rate is a scaling factor that decides how fast an algorithm
should leam. A higher learning rate improves the learning speed but if it is too high,
the algorithm will exceed the optimum weights.
• Momentum: The momentum adds a contribution from the previous step when a
weight is updated.
56
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
MEI Et
Enter Testing Parameters
Load Parameters from File
Lead Time 124
Number of Input Variables
Number of Neural Networks
Test Data Filename ita/MelfortFlowTest.txt Q I
Test Output Filename lowTestOutput5NN.txt Q I
Load. Neural Networks
Test i Cancel
Figure 3.9. Screen for inputting testing parameters
Figure 3.9 illustrates the input screen for setting the testing parameters. Similar to the
training parameters, the user can set the testing parameters either with a parameter file or
manually.
Testing parameters (Figure 3.9)
• Lead time: the number of steps ahead to be forecasted.
• Number of input variables
• Number of neural networks
• Test data file name
• Test output file name
57
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Enter Testing ParametersLoad P aram eters from File |
Lead T im e P *
N um ber o f Input V ariab les P
N u m be r o f N eura l N e tw orks p
T es t Data F ilenam e fta/M elfortFIowTest.txt
T es t O utput F ilenam e |ow TestO utput5N N bft
Load N eura l Netw orks
T e s t | C ance l |
Figure 3.9. Screen for inputting testing parameters
Figure 3.9 illustrates the input screen for setting the testing parameters. Similar to the
training parameters, the user can set the testing parameters either with a parameter file or
manually.
Testing parameters (Figure 3.9)
• Lead time: the number of steps ahead to be forecasted.
• Number of input variables
• Number of neural networks
• Test data file name
• Test output file name
o j
q J
5 7
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Al
Enter Predicting Parameters
Load Parameters from File
Lead Time
Number of Input Variables
Number of Neural Networks
Prediction Data Filename . isiNlelfortFluyvTest.Ixt
Prediction Output Filenamef FlgyvPredictOutput.td
Load Neural Networks
Predict Cancel
Q,
Figure 3.10. Screen for inputting parameters for prediction
Figure 3.10 illustrates the input screen for setting the prediction parameters. The user
can set the prediction parameters either with a parameter file or manually.
Prediction parameters (Figure 3.10)
• Lead time: the number of steps ahead to be forecasted.
• Number of input variables
• Number of neural networks
• Prediction data file name: This file contains input vector values.
• Prediction output file name: This file contains predicted outputs.
58
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Enter Predicting ParametersLoad P aram eters from File |
Lead T im e P
N u m be r o f Input V ariab les |6
N u m be r o f N eura l N e tw orks |l
P red iction Data F ilenam e ^s /M elfo rtF IowTest.txt Q |
P red iction O utput F ile n a m e fF lo w P re d ic tO u tp u ttft ^ j
; Load N eura l N e tw orks j
Pred ict | C ance l |
Figure 3.10. Screen for inputting parameters for prediction
Figure 3.10 illustrates the input screen for setting the prediction parameters. The
can set the prediction parameters either with a parameter file or manually.
Prediction parameters (Figure 3.10)
• Lead time: the number of steps ahead to be forecasted.
• Number of input variables
• Number of neural networks
• Prediction data file name: This file contains input vector values.
• Prediction output file name: This file contains predicted outputs.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Training Results Log
Neural Network #I0
Training RMSE 10.057
Validation RMSE 10.009
Validation MAPE 12.561
Save Neural Network
Next I Close • Clear 1
Figure 3.11 Screen for training output
Test Results
RMSE 10.032
MAPE 110.676
OK
Log
Figure 3.12 Screen for inputting testing output
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Training Results
N eura l N e lw o rk # 0
T ra in ing RMSE 0.057
V a lida tion RMSE J0.009
V a lida tion MAPE |2.561
Log
Save N eura l N etw ork
Ned C lose ' . I " p [ e a r
Figure 3.11 Screen for training output
Test Results Log
RMSE 0 032
MAPE 10.676
OK
Figure 3.12 Screen for inputting testing output
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Prediction Results Log
Output written in file
D:iusersihanhithesisiprogram/BEN/misc
/Gas/predict/MelfortFlowPredictOutput
I. txt
OK
Figure 3.13 Screens for prediction output
Training, validation and testing results and errors are reported to the users. The output
screens for showing these are shown in Figure 3.11 to Figure 3.13. Figure 3.11 shows the
screen for training results that include the training and validation errors. Users are
recommended to save the trained neural networks to a file by clicking the "Save neural
network" button. Otherwise, the neural networks are saved in default temporary files that
are easily to be overwritten. Neural network files are binary Java-object files. Predicted
output values are written into external text files. Figure 3.12 shows the screen for the
testing errors as part of the test results that include testing errors. The predicted output
based on the testing data is recorded in separate text files. Figure 3.13 shows the screen to
inform the user that the prediction results have been written to an external file. Apart
from message dialogs, the MNN tool also outputs any errors or exceptions to the log
fields.
60
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Prediction Results
jOutput w r itten in f i l e p : /users/hanh/thesis/program /H U N/m isc i/G as/predic t/H e1f o r tFlowPredic tOutput !. tx tI
Figure 3.13 Screens for prediction output
Training, validation and testing results and errors are reported to the users. The output
screens for showing these are shown in Figure 3.11 to Figure 3.13. Figure 3.11 shows the
screen for training results that include the training and validation errors. Users are
recommended to save the trained neural networks to a file by clicking the "Save neural
network" button. Otherwise, the neural networks are saved in default temporary files that
are easily to be overwritten. Neural network files are binary Java-object files. Predicted
output values are written into external text files. Figure 3.12 shows the screen for the
testing errors as part of the test results that include testing errors. The predicted output
based on the testing data is recorded in separate text files. Figure 3.13 shows the screen to
inform the user that the prediction results have been written to an external file. Apart
from message dialogs, the MNN tool also outputs any errors or exceptions to the log
fields.
6 0
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 4
Case Study
In this chapter, the method and framework discussed in chapter 3 were applied on two
problem domains. The first application predicts monthly production of oil wells in
southwestern region of Saskatchewan province. The second application predicts hourly
flow rate at a gas station in Saskatchewan.
4.1 Petroleum Production Prediction
This session is compiled from three sources [NC00], [NCM02], and [CNO3].
Estimation of monthly production rate of in-fill wells is important for cost-effective
operations of the petroleum industry. In our project, the artificial neural network
technique was adopted to obtain functional relationships between production time series,
core analysis and drill stem test (DST) results. Such empirical correlations can be used to
assist petroleum engineers in designing production equipment and surface facilities,
planning future production, and making economic forecasts.
Reservoir engineers typically predict primary performance through curve fitting to
existing production data. Experience from past production, particularly from wells within
the same or similar pools (i.e. Pools with similar oil and geological characteristics) can
lead to reasonable predictions of primary performance. The decision to make the
transition to secondary and tertiary production requires a more time-consuming and
complex use of reservoir simulators that utilize reservoir characteristics based on core
and log analysis, as well as historical production. This also requires significant
61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 4
Case Study
In this chapter, the method and framework discussed in chapter 3 were applied on two
problem domains. The first application predicts monthly production of oil wells in
southwestern region of Saskatchewan province. The second application predicts hourly
flow rate at a gas station in Saskatchewan.
4.1 Petroleum Production Prediction
This session is compiled from three sources [NCOO], [NCM02], and [CN03],
Estimation of monthly production rate of in-fill wells is important for cost-effective
operations of the petroleum industry. In our project, the artificial neural network
technique was adopted to obtain functional relationships between production time series,
core analysis and drill stem test (DST) results. Such empirical correlations can be used to
assist petroleum engineers in designing production equipment and surface facilities,
planning future production, and making economic forecasts.
Reservoir engineers typically predict primary performance through curve fitting to
existing production data. Experience from past production, particularly from wells within
the same or similar pools (i.e. Pools with similar oil and geological characteristics) can
lead to reasonable predictions of primary performance. The decision to make the
transition to secondary and tertiary production requires a more time-consuming and
complex use of reservoir simulators that utilize reservoir characteristics based on core
and log analysis, as well as historical production. This also requires significant
61
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
computational capacity. Hence an alternative automation approach is desirable. The ANN
approach was therefore adopted.
Time series modeling is also applied in conjunction with ANN approach. There is a
huge amount of data readily available from within companies and from public sources
that is barely used to understand more about the production process, to optimize timing
for initiation of advanced recovery processes and, potentially, to identify candidate wells
for production or injection. Thus the set of time-series data can be used to build a model
to predict production and opportunities. The historical data are analyzed to identify data
patterns and, assuming the patterns will continue into the future, they are extrapolated in
order to produce forecasts.
The following are the key concepts from petroleum engineering relevant to this
project:
• Petroleum: Petroleum includes oil and gas. Oil can be categorized into four crude
types based on density: light, medium, heavy and bitumen. Since the oil type has high
influences on production as well as fluid parameters, petroleum experts suggested that
one model should be built for each crude type. In this study, only medium oil was
considered.
• Reservoir: A petroleum reservoir is a volume of porous sedimentary rock that has
been filled by petroleum and possibly other fluids. Oil, along with varying amounts of
water and gas, reside in the porous spaces of the rock.
• Well: Many wells can be drilled to recover fluids, including oil and gas, inside the
boundary of each reservoir. Wells can be categorized according to their usages. In
this study, the term 'well' means producer wells.
62
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
computational capacity. Hence an alternative automation approach is desirable. The ANN
approach was therefore adopted.
Time series modeling is also applied in conjunction with ANN approach. There is a
huge amount of data readily available from within companies and from public sources
that is barely used to understand more about the production process, to optimize timing
for initiation of advanced recovery processes and, potentially, to identify candidate wells
for production or injection. Thus the set of time-series data can be used to build a model
to predict production and opportunities. The historical data are analyzed to identify data
patterns and, assuming the patterns will continue into the future, they are extrapolated in
order to produce forecasts.
The following are the key concepts from petroleum engineering relevant to this
project:
• Petroleum: Petroleum includes oil and gas. Oil can be categorized into four crude
types based on density: light, medium, heavy and bitumen. Since the oil type has high
influences on production as well as fluid parameters, petroleum experts suggested that
one model should be built for each crude type. In this study, only medium oil was
considered.
• Reservoir: A petroleum reservoir is a volume of porous sedimentary rock that has
been filled by petroleum and possibly other fluids. Oil, along with varying amounts of
water and gas, reside in the porous spaces of the rock.
• Well: Many wells can be drilled to recover fluids, including oil and gas, inside the
boundary of each reservoir. Wells can be categorized according to their usages. In
this study, the term ‘well’ means producer wells.
62
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Horizon: Each horizon is a formation layer with unique geographical characteristics.
The data used in this study was taken from one horizon to ensure all wells have the
same geographical characteristics.
• Production history: The production rate fluctuates with time as shown in Figure
4.1.1. Fluctuation can be due to the following reasons. Activities such as well
stimulation can create fractures in the near well-bore area, which increases
production. Production decline is often caused by pressure decline in a reservoir or
deterioration of the mechanical condition of the production wells. An effective way of
slowing the decline may be supplemental recovery operations such as water flooding.
Another method is to shut down the well for a period of time to regain pressure.
While production drops to nil during the shut-in period, it usually goes up afterwards.
Eventually, however, the decline will recur [Di85].
Monthly Production
Pro
duct
ion (in
m3
)
12000 10000 8000 6000 4000 2000
0 ffnitraturria7,11EX11Elraltilli,c,Frm ilk „Ili „, u „ [11,4 I„
N CO 'I' IC) O N CO C) 0 N CO CO 0 CV 't CO CS)
Months
Figure 4.1.1 Well production history
1 -
C\I
N CO CO
NLO
N
63
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
• Horizon: Each horizon is a formation layer with unique geographical characteristics.
The data used in this study was taken from one horizon to ensure all wells have the
same geographical characteristics.
• Production history: The production rate fluctuates with time as shown in Figure
4.1.1. Fluctuation can be due to the following reasons. Activities such as well
stimulation can create fractures in the near well-bore area, which increases
production. Production decline is often caused by pressure decline in a reservoir or
deterioration of the mechanical condition of the production wells. An effective way of
slowing the decline may be supplemental recovery operations such as water flooding.
Another method is to shut down the well for a period of time to regain pressure.
While production drops to nil during the shut-in period, it usually goes up afterwards.
Eventually, however, the decline will recur [Di85].
Monthly Production
12000 rc 10000^ 8000 r .1 6000 ; I 4000■g 2000 4-
Months
Figure 4.1.1 Well production history
6 3
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The following sections present two ANN approaches for prediction of oil production
rate that has been developed and tested on four pools in the southeastern region of
Saskatchewan, Canada.
4.1.1 Data
4.1.1.1 Data Collection
Saskatchewan Energy and Mines supplied the data sets used in this study. The entire data
set contains 14538 production rates and 49 core analysis and pressure data points
recorded from 49 oil producer wells. These 49 wells are located in four independent
reservoirs in the southeastern region of Saskatchewan including Flat Lake, Hoffer,
Neptune and Skinner Lake that produce the same medium type of crude oil from the
same horizon of Ratcliffe.
The production data were collected in a period of about 30 years from the 1960's to
1995. While there were sufficient data to develop an accurate time series model, the total
number of available data patterns for core and DST analysis is only 49. Since insufficient
core and DST data may mean that meaningful correlation between production rate and
analysis data cannot be found, it would be desirable to increase the number of reservoirs
to enhance validity of the model.
4.1 .1 .2 Data Cleaning and Transformation
A number of preprocessing steps were taken as follows. The permeability and porosity
data are raw data obtained from core logs. Permeability was averaged from horizontal
and vertical permeability. For each well, permeability and porosity values are measured
from core samples taken from different depths. If those values fall below a cut-off value
64
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The following sections present two ANN approaches for prediction of oil production
rate that has been developed and tested on four pools in the southeastern region of
Saskatchewan, Canada.
4.1.1 Data
4.1.1.1 Data Collection
Saskatchewan Energy and Mines supplied the data sets used in this study. The entire data
set contains 14538 production rates and 49 core analysis and pressure data points
recorded from 49 oil producer wells. These 49 wells are located in four independent
reservoirs in the southeastern region of Saskatchewan including Flat Lake, Hoffer,
Neptune and Skinner Lake that produce the same medium type of crude oil from the
same horizon of Ratcliffe.
The production data were collected in a period of about 30 years from the 1960’s to
1995. While there were sufficient data to develop an accurate time series model, the total
number of available data patterns for core and DST analysis is only 49. Since insufficient
core and DST data may mean that meaningful correlation between production rate and
analysis data cannot be found, it would be desirable to increase the number of reservoirs
to enhance validity of the model.
4.1.1.2 Data Cleaning and Transformation
A number of preprocessing steps were taken as follows. The permeability and porosity
data are raw data obtained from core logs. Permeability was averaged from horizontal
and vertical permeability. For each well, permeability and porosity values are measured
from core samples taken from different depths. If those values fall below a cut-off value
64
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
set by the petroleum engineering expert, they are ignored. Then the remaining values
taken from the same wells are averaged. Reservoir angle was ignored in the calculation
since it is relatively small in the area selected.
Index Production 1 P1 2 P2 3 P3 4 P4 5 P5 6 P6 17 Pr? 18 P8 9 P9
10 P10 11 PD. 12 P12 13 P13 14 P14 15 P15
Original time series
V ar 1 V ar 2 V ar 3 Output 1 Output 2
P1 P2 P3 P4 P5 P2 Pz P4 P P6
I, I
P8 P9 P10 P11 P12 P9 P10 Pli P12 P13
PIO Pll P12 P13 P14 P11 P12 P13 P14 P15
Length-five records
Figure 4.1.2 Elimination of incomplete records
The monthly productions were scaled to 600 hours a month, which is the mean of the
number of production hours. All the months with zero production were eliminated. This
process of elimination has the drawback of producing discontinuity in the time series data
since individual months with missing production data are ignored. In this way, the
number of long-term records is significantly reduced. A record contains both the input
and the expected output. For example, if production of the three months of January,
February and March are to be used to predict production two months ahead, the record
should contain production from January to May. If there was no production in the 7th
month (P7 = 0), P7 was eliminated and therefore the following records of five-month
duration were also eliminated: P3-P7, Pa-Ps, P5-P9, P7-Pi 1, which are illustrated as
gray rows in Figure 4.1.2. The longer the duration of records is, the more likely the
65
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
set by the petroleum engineering expert, they are ignored. Then the remaining values
taken from the same wells are averaged. Reservoir angle was ignored in the calculation
since it is relatively small in the area selected.
Index Production1 Pi2 P23 Pj4 P+5 Pj6 P<5m P78 P89 Ps>10 Pio11 P ll12 P 1213 P l314 P l415 P l5
V ar 1 Var 2 Var 3 O u tpu t 1 O utput 2
P i P2 P3 P4 PiPa P3 P4 PaP* F* ■P> ,, __Pa 4Pi Ps Pe *P* F " V Ph Ir__j P - |
Ps Ps Pio P 11 P 12
Ps> Pio P ll P 12 P l3Pio P ll P 12 P l3 P l4P ll P 12 P l3 Pl4 P li
Length-five records
Original time series
Figure 4.1.2 Elimination of incomplete records
The monthly productions were scaled to 600 hours a month, which is the mean of the
number of production hours. All the months with zero production were eliminated. This
process of elimination has the drawback of producing discontinuity in the time series data
since individual months with missing production data are ignored. In this way, the
number of long-term records is significantly reduced. A record contains both the input
and the expected output. For example, if production of the three months of January,
February and March are to be used to predict production two months ahead, the record
should contain production from January to May. If there was no production in the 7
month (P7 = 0), P7 was eliminated and therefore the following records of five-month
duration were also eliminated: P 3- P 7, P 4- P 8, P 5- P 9, P e - P io , P 7- P 11, which are illustrated as
gray rows in Figure 4.1.2. The longer the duration of records is, the more likely the
th
6 5
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
number of records being eliminated would be high. In the future, this drawback can be
avoided if an attempt is made to repair the missing values based on neighboring values
instead of eliminating the entire record.
Since the sigmoid activation function was used which returns a number in the range
of [0, 1], all monthly productions should be normalized to this range. The following
equation is used for normalization
x.(x-min)/(max-min),
where min and max are the estimated minimum and maximum boundaries of monthly
productions and not the actual boundaries in the training data set.
4.1.1.3 Data Set Manipulation
The data set was divided into three subsets in the proportion of 60:20:20 for training,
validation and testing. The training set is used to train the neural net. The validation set is
used to determine the performance of a neural network on patterns that are not used
during learning. Training and validation occurs simultaneous, and the two sets of data are
used for exploration of parameter values of a network configuration. When the error from
validation runs starts to increase, training is stopped for over-fitting has begun. The test
set is used for finally checking the overall performance of a neural when parameter
values have been determined in the model.
4.1.2 Using NeurOn-Line
The initial modeling was conducted using NeurOn-Line (NOL from Gensym
Corporation, USA), a tool-kit for neural networks modeling. NOL tool is introduced in
chapter 3, section 3.6.1. NOL supports fast development of a neural network application
66
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
number of records being eliminated would be high. In the future, this drawback can be
avoided if an attempt is made to repair the missing values based on neighboring values
instead of eliminating the entire record.
Since the sigmoid activation function was used which returns a number in the range
of [0, 1], all monthly productions should be normalized to this range. The following
equation is used for normalization
x=(x-min)/(max-min),
where min and max are the estimated minimum and maximum boundaries of monthly
productions and not the actual boundaries in the training data set.
4.1.1.3 Data Set Manipulation
The data set was divided into three subsets in the proportion of 60:20:20 for training,
validation and testing. The training set is used to train the neural net. The validation set is
used to determine the performance of a neural network on patterns that are not used
during learning. Training and validation occurs simultaneous, and the two sets of data are
used for exploration of parameter values of a network configuration. When the error from
validation runs starts to increase, training is stopped for over-fitting has begun. The test
set is used for finally checking the overall performance of a neural when parameter
values have been determined in the model.
4.1.2 Using NeurOn-Line
The initial modeling was conducted using NeurOn-Line (NOL from Gensym
Corporation, USA), a tool-kit for neural networks modeling. NOL tool is introduced in
chapter 3, section 3.6.1. NOL supports fast development of a neural network application
66
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to enable rapid assessment on whether a meaningful model can be built on the available
data and whether the set of chosen variables are suitable for the task.
4.1.2.1 Development of a Model of Production Time Series and
Geoscience Parameters
The first neural network model developed on NOL includes geoscience parameters as
input. The factors that have been identified to influence production include permeability,
porosity, viscosity, density, fluid compressibility, oil saturation, pressure and well
location. However, since not all the associated parameters for a well are available, only
the following three parameters that are easily obtainable are included in the model:
I. Permeability (k) describes the relative ease with which fluids can move through the
reservoir and is therefore a factor in determining well productivity.
2. Porosity (0) is an expression of the volume of void space in the rocks and thus is
related to the volume of oil or gas that can be recovered from the reservoir.
3. First Shut-in Pressure (p) is used as a proxy variable for initial formation pressure
The permeability and porosity values are obtained from laboratory core analysis, and
the first shut-in pressure data are derived from drilled stem test (DST) analysis. In
addition to the above parameters, production time series data were used as a source of
input. The production rates of the three months prior to the target prediction month are
included as input variables. If Pt denotes the production of the target month t for which a
prediction is made, then the productions of the three previous months are Pt-3, Pt-2 and
Pt-i •
In the ANN model, the six conditional variables are permeability, porosity, pressure
and the oil production volumes of the three previous months and the consequent variable
67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
to enable rapid assessment on whether a meaningful model can be built on the available
data and whether the set of chosen variables are suitable for the task.
4.1.2.1 Development of a Model of Production Time Series and
Geoscience Parameters
The first neural network model developed on NOL includes geoscience parameters as
input. The factors that have been identified to influence production include permeability,
porosity, viscosity, density, fluid compressibility, oil saturation, pressure and well
location. However, since not all the associated parameters for a well are available, only
the following three parameters that are easily obtainable are included in the model:
1. Permeability (k) describes the relative ease with which fluids can move through the
reservoir and is therefore a factor in determining well productivity.
2. Porosity ((])) is an expression of the volume of void space in the rocks and thus is
related to the volume of oil or gas that can be recovered from the reservoir.
3. First Shut-in Pressure (p) is used as a proxy variable for initial formation pressure
The permeability and porosity values are obtained from laboratory core analysis, and
the first shut-in pressure data are derived from drilled stem test (DST) analysis. In
addition to the above parameters, production time series data were used as a source of
input. The production rates of the three months prior to the target prediction month are
included as input variables. If Pt denotes the production of the target month t for which a
prediction is made, then the productions of the three previous months are Pt_3, Pt-2 and
Pt-i.
In the ANN model, the six conditional variables are permeability, porosity, pressure
and the oil production volumes of the three previous months and the consequent variable
67
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is production of the current month. Since there are only six input variables and one output
variable, it was assumed that the number of hidden neurons is also small. In practice, it is
best to have as few hidden nodes as possible because fewer weights need to be
determined.
A scaled data set was used to select the best network configuration. The same training
and validation sets were applied to train and validate five different back-propagation
networks with 2, 3, 4, 5 and 6 hidden units respectively. As can be seen in Table 1, the
network with 4 hidden nodes produced the least validation error, and was the best model
found. The training error was 0.034 and the validation error was 0.032.
Table 4.1.1. Network configuration — model 1
# Hidden Units Training RMSE Validation RMSE 2 0.034 0.034 3 0.034 0.033
4 0.034 0.032
5 0.034 0.034 6 0.033 0.034
With the training and validation sets specified above, the back-propagation neural
network was trained three times with different initial weights. During training, the root
mean square error (RMSE) on the training set declined steadily but the amount of
decrease became insignificant after the first few cycles. The ANN was saved every 300
cycles. The validation error started to increase between cycles 300 and 600, which
indicated over-fitting had occurred. The saved 600-cycle ANN was the final model. We
interpreted the fact that three training runs gave similar results to indicate that the global
minimum had been reached. The training error was 0.029 while the validation error was
0.04.
68
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
is production of the current month. Since there are only six input variables and one output
variable, it was assumed that the number of hidden neurons is also small. In practice, it is
best to have as few hidden nodes as possible because fewer weights need to be
determined.
A scaled data set was used to select the best network configuration. The same training
and validation sets were applied to train and validate five different back-propagation
networks with 2, 3, 4, 5 and 6 hidden units respectively. As can be seen in Table 1, the
network with 4 hidden nodes produced the least validation error, and was the best model
found. The training error was 0.034 and the validation error was 0.032.
Table 4.1.1. Network configuration - model 1
# H idden Units T raining RMSE V alidation RM SE2 0.034 0.0343 0.034 0.0334 0.034 0.0325 0.034 0.0346 0.033 0.034
With the training and validation sets specified above, the back-propagation neural
network was trained three times with different initial weights. During training, the root
mean square error (RMSE) on the training set declined steadily but the amount of
decrease became insignificant after the first few cycles. The ANN was saved every 300
cycles. The validation error started to increase between cycles 300 and 600, which
indicated over-fitting had occurred. The saved 600-cycle ANN was the final model. We
interpreted the fact that three training runs gave similar results to indicate that the global
minimum had been reached. The training error was 0.029 while the validation error was
0.04.
68
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1.2.2 Development of a Model of Production Time Series Only
A sensitivity test was conducted to measure the impact of each input variable on the
output in the first ANN model developed on NOL described in section 4.2. The results
showed that all the three geoscience variables had less than 5% influence on the
production. This confirmed our concern earlier about the limited amount of geoscience
data. Hence a second model was developed which consists of the three conditional
variables of the oil production volumes of the three previous months, and the output or
consequent variable is production of the current month.
A similar preprocessing, configuration, training and validation process was conducted
as for the first model. As can be in Table 2, the network with 3 hidden nodes produced
the least test error, and was found to be the best model. The training error was 0.035
while the validation error was 0.03.
Table 4.1.2. Network configuration — model 2
# Hidden Units Training RMSE Validation RMSE 2 0.033 0.036 3 0.034 0.03 4 0.035 0.031 5 0.033 0.035 6 0.034 0.034
4.1.3 Using Multiple Neural Network
A question that confronts an engineer is how long it takes for a well to dry out. To answer
this question, forecasts of not only one but several months ahead need to be made. The
models presented in section 4.1.2 could not make long term prediction with reasonable
accuracy. Therefore we proposed the multiple neural network (MNN) approach for time
series modeling to make long term predictions.
69
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1.2.2 Development of a Model of Production Time Series Only
A sensitivity test was conducted to measure the impact of each input variable on the
output in the first ANN model developed on NOL described in section 4.2. The results
showed that all the three geoscience variables had less than 5% influence on the
production. This confirmed our concern earlier about the limited amount of geoscience
data. Hence a second model was developed which consists of the three conditional
variables of the oil production volumes of the three previous months, and the output or
consequent variable is production of the current month.
A similar preprocessing, configuration, training and validation process was conducted
as for the first model. As can be in Table 2, the network with 3 hidden nodes produced
the least test error, and was found to be the best model. The training error was 0.035
while the validation error was 0.03.
Table 4.1.2. Network configuration - model 2
# Hidden Units Training RMSE Validation RMSE2 0.033 0.0363 0.034 0.034 0.035 0.0315 0.033 0.0356 0.034 0.034
4.1.3 Using Multiple Neural Network
A question that confronts an engineer is how long it takes for a well to dry out. To answer
this question, forecasts of not only one but several months ahead need to be made. The
models presented in section 4.1.2 could not make long term prediction with reasonable
accuracy. Therefore we proposed the multiple neural network (MNN) approach for time
series modeling to make long term predictions.
69
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The MNN was trained with the following parameters
• Number of maximum training cycles: 3000
• Validation error threshold: 5%
• Number of hidden units for each ANN: 5
• Number of input variables for each ANN: 3
• Lead time: 100 months
• Number of ANNs: 7
• Initial learning rate: 0.7
• Momentum: 0.3
The number of ANNs in the MNN was deteimined based on the length of the
prediction term. Since 26<100<27, seven ANNs were used to predict 100 months ahead.
However, if not enough data are available for training high ordered ANNs, this number
can be set smaller. The weights of the first ANN were initialized with small random
values. The initial weights of subsequent ANNs were copied from the previously trained
ANNs.
Validation was done every four training cycles. Four is selected to minimize the
validation time spent. The training process halts under one of the following conditions:
• The number of cycles is equal to the maximum number of training cycles allowed
• The training and validation errors are smaller or equal to the validation error
threshold set by the user
• The values of the last n validation errors increase monotonically, which indicate over-
fitting is likely to have begun. In our experiments, n = 10 was used, and the ANN that
produces the least validation error is saved.
70
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The MNN was trained with the following parameters
• Number of maximum training cycles: 3000
• Validation error threshold: 5%
• Number of hidden units for each ANN: 5
• Number of input variables for each ANN: 3
• Lead time: 100 months
• Number of ANNs: 7
• Initial learning rate: 0.7
• Momentum: 0.3
The number of ANNs in the MNN was determined based on the length of the
prediction term. Since 26<100<27, seven ANNs were used to predict 100 months ahead.
However, if not enough data are available for training high ordered ANNs, this number
can be set smaller. The weights of the first ANN were initialized with small random
values. The initial weights of subsequent ANNs were copied from the previously trained
ANNs.
Validation was done every four training cycles. Four is selected to minimize the
validation time spent. The training process halts under one of the following conditions:
• The number of cycles is equal to the maximum number of training cycles allowed
• The training and validation errors are smaller or equal to the validation error
threshold set by the user
• The values of the last n validation errors increase monotonically, which indicate over
fitting is likely to have begun. In our experiments, n = 10 was used, and the ANN that
produces the least validation error is saved.
70
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The first component ANN which predicts one month ahead was used as the single
ANN in our comparison.
4.1.4 Results
4.1.4.1 NOL Models
The test data set was run with the saved ANN models. The testing error rates found was
0.04 for the first model developed on NOL that incorporates both time series and
geoscience data and 0.033 for the second model on NOL with only time series data.
Figure 4.1.3 and 4.1.4 show the predicted values of the two models (indicated by the line)
versus the target values (indicated by the dots).
1 .0
0.0 0.0 1.0
Figure 4.1.3. Predicted vs. target — model 1
71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The first component ANN which predicts one month ahead was used as the single
ANN in our comparison.
4.1.4 Results
4.1.4.1 NOL Models
The test data set was run with the saved ANN models. The testing error rates found was
0.04 for the first model developed on NOL that incorporates both time series and
geoscience data and 0.033 for the second model on NOL with only time series data.
Figure 4.1.3 and 4.1.4 show the predicted values of the two models (indicated by the line)
versus the target values (indicated by the dots).
1.0
0.00
Figure 4.1.3. Predicted vs. target - model 1
71
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.1.4 Predicted vs. target — model 2
Sensitivity tests were conducted over the two trained models to identify input
variables that have strong influence on an output variable, or inputs that have little or no
influence on the output variable. Sensitivity testing is useful for understanding the
correlations in the data, which may lead to a greater understanding of the physical
causality of the process. Sensitivities (or influences of parameters) are obtained by taking
the average of the local derivative information. In our experiments, the sensitivities were
calculated with the NOL tool. They are calculated via the following process [Ge95]:
1. Select a random data point from the data series.
2. Generate the outputs for the data point using the model.
3. Derange the j th input by a small amount, and recalculate the output.
4. For each output, estimate of the local derivative at the selected data point
output — output,
input. —input./
where the prime in the indices indicates the deranged input and output.
5. Repeat from step 3 for each input.
72
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.1.4 Predicted vs. target - model 2
Sensitivity tests were conducted over the two trained models to identify input
variables that have strong influence on an output variable, or inputs that have little or no
influence on the output variable. Sensitivity testing is useful for understanding the
correlations in the data, which may lead to a greater understanding of the physical
causality of the process. Sensitivities (or influences of parameters) are obtained by taking
the average of the local derivative information. In our experiments, the sensitivities were
calculated with the NOL tool. They are calculated via the following process [Ge95]:
1 . Select a random data point from the data series.
2. Generate the outputs for the data point using the model.
3. Derange the f 1 input by a small amount, and recalculate the output.
4. For each output, estimate of the local derivative at the selected data point
output.. — output ■
input, - input .
where the prime in the indices indicates the deranged input and output.
5. Repeat from step 3 for each input.
7 2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6. Repeat for another random data point.
The sensitivity value of output i with respect to input j is then calculated by taking the
average of the absolute of Sij values, over the sample of randomly selected data points.
Finally, each sensitivity is normalized by dividing the sensitivity by the standard
deviation of the respective input variable.
In the first model, the sensitivities of the six inputs over production are as follows.
Table 4.1.3 Sensitivities — model 1
k cia P Pt-3 Pt-2 Pt-1
3.5% 1.3% 0.6% 9.6% 21.9% 63%
As can be seen in Table 4.1.3, the influence of the core (k and t) and DST (P)
analysis on the production is very small (less than 5%). The production of the most recent
month has the strongest effect at 63%, and the productions of the previous two months
are also significant at 9.6% and 21.9%. From this result, it was decided the second model
should include only production time series data.
The sensitivities of the three input variables over the output variable in the second
model are similar to the previous model as can be seen in Table 4.1.4.
Table 4.1.4 Sensitivities — model 2
Pt-3 Pt-2 Pt-1
11.3% 27.6% 61.2%
4.1.4.2 Multiple-ANN and Single-ANN Models
In order to facilitate a comparison between a MNN and a single ANN, the same test
set of data was applied to the MNN and the single ANN to predict monthly production up
to 100 months ahead. The average RMSE was 0.053.
73
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6 . Repeat for another random data point.
The sensitivity value of output i with respect to input j is then calculated by taking the
average of the absolute of Sij values, over the sample of randomly selected data points.
Finally, each sensitivity is normalized by dividing the sensitivity by the standard
deviation of the respective input variable.
In the first model, the sensitivities of the six inputs over production are as follows.
Table 4.1.3 Sensitivities - model 1
k 0 P Pt-3 Pt-2 Pt-i3.5% 1.3% 0 .6 % 9.6% 21.9% 63%
As can be seen in Table 4.1.3, the influence of the core (k and O) and DST (P)
analysis on the production is very small (less than 5%). The production of the most recent
month has the strongest effect at 63%, and the productions of the previous two months
are also significant at 9.6% and 21.9%. From this result, it was decided the second model
should include only production time series data.
The sensitivities of the three input variables over the output variable in the second
model are similar to the previous model as can be seen in Table 4.1.4.
Table 4.1.4 Sensitivities - model 2
Pt-3 Pt-2 Pt-i11.3% 27.6% 61.2%
4.1.4.2 Multiple-ANN and Single-ANN Models
In order to facilitate a comparison between a MNN and a single ANN, the same test
set of data was applied to the MNN and the single ANN to predict monthly production up
to 100 months ahead. The average RMSE was 0.053.
73
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.1.5 illustrates the errors for different period from 1 to 100 months ahead.
Figure 4.1.5 Test errors for MNN and Single ANN for different prediction periods
Figure 4.1.5 indicates that the MNN generally performs slightly better than the single
first ordered ANN. As the prediction term increases, the difference is more significant.
This indicates a MNN performs better than a single ANN in long term forecast.
Desired vs. Predicted
E 3000 .c 2500 • 2000 p 1500 0 1000 • 500 -0 2 0 0. CO (0 T CO T (0 17_
N 1.0 CO CO 0) 0
Record #
CO
"&71
— Desired
Predicted by ANN
Predicted by MNN
Figure 4.1.6 Desired vs. predicted outputs
Figure 4.1.6 illustrates the desired and predicted outputs from MNN and ANN for a
prediction term of 100 months. With the exception of approximately the first 100 values
in the graph, the predictions from the ANN and MNN model are quite close to the desired
results.
74
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.1.5 illustrates the errors for different period from 1 to 100 months ahead.
2.5o
•■—MNN Error ANN Error
months
Figure 4.1.5 Test errors for MNN and Single ANN for different prediction periods
Figure 4.1.5 indicates that the MNN generally performs slightly better than the single
first ordered ANN. As the prediction term increases, the difference is more significant.
This indicates a MNN performs better than a single ANN in long term forecast.
Desired vs. Predicted
E 3000 £ 2500 'T 2000 o 1500 Z 1000 ■g 500 o 0Q.
Record #
Figure 4.1.6 Desired vs. predicted outputs
Figure 4.1.6 illustrates the desired and predicted outputs from MNN and ANN for a
prediction term of 100 months. With the exception of approximately the first 100 values
in the graph, the predictions from the ANN and MNN model are quite close to the desired
results.
74
I• u
( O t- C O t- C O t- C O t- C Dn N O ^ S r ^ f f l r- i - c v - ' ^ L n c o c o c n o c M
■ Desired
Predicted by ANNPredicted by MNN
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1.5 Discussions
The results of the models developed on NOL indicate that the production time series
model works compatibly with the mixed causal and time series model. The fact that
geoscience data has insignificant influence on production rates can be explained as
follows. Firstly, core analysis is taken at well bore and may not represent the real
permeability and porosity values over the entire well. Secondly, pressure usually changes
over a well's lifecycle but only information about the initial pressure is available for the
study. Thirdly, the time series data may already incorporate all the information of the
core and DST because there are correlations between previous productions and a well's
parameters. Lastly, there may not be sufficient core and DST analysis data points to study
the influence of these parameters on the production.
The NOL tool-kit is a convenient and generic tool to develop and deploy an ANN
application. However, modifying network structures or deployment in an environment
that involves non-Gensym products are more complicated. The MNN tool is a program
developed for the specific purpose of combining ANNs into a MNN. Currently, only
back-propagation ANN is included. However, it is easy to add more network types into
the system as Java is an object-oriented language.
It is observed that a MNN approach has some disadvantages. First, MNN is more
complex than a single ANN although it is only linearly more complex than a single ANN.
Secondly, a high-ordered ANN requires more data to train and validate.
75
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1.5 Discussions
The results of the models developed on NOL indicate that the production time series
model works compatibly with the mixed causal and time series model. The fact that
geoscience data has insignificant influence on production rates can be explained as
follows. Firstly, core analysis is taken at well bore and may not represent the real
permeability and porosity values over the entire well. Secondly, pressure usually changes
over a well’s lifecycle but only information about the initial pressure is available for the
study. Thirdly, the time series data may already incorporate all the information of the
core and DST because there are correlations between previous productions and a well’s
parameters. Lastly, there may not be sufficient core and DST analysis data points to study
the influence of these parameters on the production.
The NOL tool-kit is a convenient and generic tool to develop and deploy an ANN
application. However, modifying network structures or deployment in an environment
that involves non-Gensym products are more complicated. The MNN tool is a program
developed for the specific purpose of combining ANNs into a MNN. Currently, only
back-propagation ANN is included. However, it is easy to add more network types into
the system as Java is an object-oriented language.
It is observed that a MNN approach has some disadvantages. First, MNN is more
complex than a single ANN although it is only linearly more complex than a single ANN.
Secondly, a high-ordered ANN requires more data to train and validate.
7 5
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1.6 Conclusion and Future Works
This section presents two ANN approaches for prediction of petroleum production. The
results show that ANN can be used for petroleum prediction. The models are efficient
and adaptable.
Another remark from the experiment is that core analysis and DST data have little
contribution to the model output of petroleum production and an univariate time series is
sufficient to develop a meaningful model.
The MNN model shows superior performance over the single ANN model in long
term prediction. Aside from ANN, it is possible to use other numerical prediction
techniques in a multiple-order model to perform long term prediction.
Future research includes comparing the MNN technique with other statistical curve
fitting techniques.
4.2 Hourly Gas Flow Prediction
The second case study is to predict future hourly gas flow through the Melfort
compressor station. This station is a part of the gas pineline distribution system at St.
Louis East.
76
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.1.6 Conclusion and Future Works
This section presents two ANN approaches for prediction of petroleum production. The
results show that ANN can be used for petroleum prediction. The models are efficient
and adaptable.
Another remark from the experiment is that core analysis and DST data have little
contribution to the model output of petroleum production and an univariate time series is
sufficient to develop a meaningful model.
The MNN model shows superior performance over the single ANN model in long
term prediction. Aside from ANN, it is possible to use other numerical prediction
techniques in a multiple-order model to perform long term prediction.
Future research includes comparing the MNN technique with other statistical curve
fitting techniques.
4.2 Hourly Gas Flow Prediction
The second case study is to predict future hourly gas flow through the Melfort
compressor station. This station is a part of the gas pineline distribution system at St.
Louis East.
7 6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Nipawin Consumption Area
St. Louis Station
Melfort Station
St. Brieux Consumption Area
Hudson Bay Consumption Area
Figure 4.2.1. Schematic of St. Louis East system
Figure 4.2.1 illustrates the gas stations and their service areas of the St. Louis East gas
pineline distribution system. The system consists of two stations located at Melfort and
St. Louis. The Melfort station receives gas from the St. Louis station and transmits it to
the surrounding consumption areas of Nipawin and Hudson Bay. It is important to ensure
that customer demand is fulfilled. This means that there should be sufficient number of
compressors running at the Melfort station and sufficient gas input to the Melfort station.
Dispatchers at the Melfort station need to make decisions to turn compressors on or off,
or to adjust the compression level in order to reach the necessary pressure while not
wasting resources. The decision has a significant impact on the effectiveness of the
natural gas pipeline operation. When the customer demand increases, a dispatcher adds
compression to the pipeline system by turning on one or more compressors. On the other
hand, the dispatcher turns off one or more compressors to reduce compression in the
pipeline system when the customer demand decreases. Incorrect decisions made by the
dispatcher will cause substantial economic loss.
The purpose of this study is to aid the dispatcher in optimizing natural gas pipeline
operations in order to satisfy customer demand with minimal operating costs. A
77
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
MelfortStation
St. Louis Station
Hudson Bay Consumption Area
St. Brieux Consumption Area
Nipawin Consumption Area
Figure 4.2.1. Schematic of St. Louis East system
Figure 4.2.1 illustrates the gas stations and their service areas of the St. Louis East gas
pineline distribution system. The system consists of two stations located at Melfort and
St. Louis. The Melfort station receives gas from the St. Louis station and transmits it to
the surrounding consumption areas of Nipawin and Hudson Bay. It is important to ensure
that customer demand is fulfilled. This means that there should be sufficient number of
compressors running at the Melfort station and sufficient gas input to the Melfort station.
Dispatchers at the Melfort station need to make decisions to turn compressors on or off,
or to adjust the compression level in order to reach the necessary pressure while not
wasting resources. The decision has a significant impact on the effectiveness of the
natural gas pipeline operation. When the customer demand increases, a dispatcher adds
compression to the pipeline system by turning on one or more compressors. On the other
hand, the dispatcher turns off one or more compressors to reduce compression in the
pipeline system when the customer demand decreases. Incorrect decisions made by the
dispatcher will cause substantial economic loss.
The purpose of this study is to aid the dispatcher in optimizing natural gas pipeline
operations in order to satisfy customer demand with minimal operating costs. A
77
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
dispatcher needs to know ahead of time when the largest volume requirement will occur
and to be ready for it. Otherwise, the system pressures at Nipawin and Hudson Bay will
be below the required minimum. Since consumption is only available monthly from
billing records, it cannot be used for the task of predicting hourly demand. Therefore, we
use the flow rate at the Melfort station as a substitute variable for the demand.
Figure 4.2.2. Hourly flow during a day
The flow rate at Melfort station more or less reflects the consumption patterns of
customers at Nipawin and Hudson Bay. As illustrated in Figure 4.2.2, the natural gas
flow rate fluctuates during a day. For example, the demand is usually low at night. In the
morning, the demand is higher because residential customers start cooking and industrial
customers start their machines. In the afternoon, the demand decreases since the facilities
are already heated up. After work hours, industrial customers' demand becomes lower
while the residential customers' demand gets higher. The demand for natural gas also
fluctuates depending on the season. In the winter, the demand for natural gas is usually
higher than in the summer. Special occasions such as public holidays are also a factor that
affects demand patterns.
78
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
dispatcher needs to know ahead of time when the largest volume requirement will occur
and to be ready for it. Otherwise, the system pressures at Nipawin and Hudson Bay will
be below the required minimum. Since consumption is only available monthly from
billing records, it cannot be used for the task of predicting hourly demand. Therefore, we
use the flow rate at the Melfort station as a substitute variable for the demand.
Gas Flow at Melfort on 10/3/02
500
400 4
300Flow ra te
200 -
100 -
O CO CO cn
Figure 4.2.2. Hourly flow during a day
The flow rate at Melfort station more or less reflects the consumption patterns of
customers at Nipawin and Hudson Bay. As illustrated in Figure 4.2.2, the natural gas
flow rate fluctuates during a day. For example, the demand is usually low at night. In the
morning, the demand is higher because residential customers start cooking and industrial
customers start their machines. In the afternoon, the demand decreases since the facilities
are already heated up. After work hours, industrial customers’ demand becomes lower
while the residential customers’ demand gets higher. The demand for natural gas also
fluctuates depending on the season. In the winter, the demand for natural gas is usually
higher than in the summer. Special occasions such as public holidays are also a factor that
affects demand patterns.
78
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.1 Data Collection and Preprocessing
The data was obtained from SaskEnergy/Transgas. Hourly flow rates in the period from
December 2001 to mid August 2002 were collected with an interruption from March 14th
to May 27th. Fall (from September to November) and spring (from March to May) data
was not available. This is a disadvantage since we could not divide the data set into four
seasonal data sets for separate treatments. There were several hourly flow rates with
values of zero in the data set. Those are either missing or abnormal data. All such values
were eliminated from the data set. The total number of data points is 3500 approximately.
Since the sigmoid activation function was used which returns a number in the range
of [0, 1], all hourly flow rates should be normalized to this range. The following equation
was used for normalization
x=(x-min)/(max-min),
where min and max are the estimated minimum and maximum boundaries of monthly
productions and not the actual boundaries in the training data set. By examining the plot
of the historical data set, the min and max values are estimated as 0 and 600 (103m3).
The data set was divided into three subsets for training, validation and testing in the
proportion of 5:1:1. The training set contains approximately 2500 data points. The
validation and test sets contain only around 500 data point each.
4.2.2 Training and Validation
The chosen input size was six as six hours are a quarter of a day. Shorter period may not
contain enough information to predict 24 hours ahead while longer period may make the
neural networks too complex and therefore require more data to train.
79
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4.2.1 Data Collection and Preprocessing
The data was obtained from SaskEnergy/Transgas. Hourly flow rates in the period from
December 2001 to mid August 2002 were collected with an interruption from March 14th
to May 27th. Fall (from September to November) and spring (from March to May) data
was not available. This is a disadvantage since we could not divide the data set into four
seasonal data sets for separate treatments. There were several hourly flow rates with
values of zero in the data set. Those are either missing or abnormal data. All such values
were eliminated from the data set. The total number of data points is 3500 approximately.
Since the sigmoid activation function was used which returns a number in the range
of [0, 1], all hourly flow rates should be normalized to this range. The following equation
was used for normalization
x=(x-min)/(max-min),
where min and max are the estimated minimum and maximum boundaries of monthly
productions and not the actual boundaries in the training data set. By examining the plot
of the historical data set, the min and max values are estimated as 0 and 600 ( 1 0 m ).
The data set was divided into three subsets for training, validation and testing in the
proportion of 5:1:1. The training set contains approximately 2500 data points. The
validation and test sets contain only around 500 data point each.
4.2.2 Training and Validation
The chosen input size was six as six hours are a quarter of a day. Shorter period may not
contain enough information to predict 24 hours ahead while longer period may make the
neural networks too complex and therefore require more data to train.
7 9
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The number of ANNs in the MNN was determined based on the length of the
expected prediction term. Since 24<24<25, a maximum of four ANNs were used to
predict 24 hours ahead. However, this number can be set smaller based on validation
errors produced by different combinations of ANNs.
The weights of the first ANN were initialized with small values in the range from 0 to
0.5. The following ANNs were initialized with the previous ANN' s weights in order to
reduce training time.
Validation was done every four training cycles. Single step validation showed better
results than multiple step validation.
After five neural networks had been trained, five combinations of them that include 1,
2, 3, 4 and 5 neural networks was validated on the validation set to predict 24 hours
ahead. The one with lowest error rate out of the least four was chosen as the final MNN.
The one with only one neural network was the single ANN.
Figure 4.2.3. Validated RMSE of 5 models for 24 hour period
As can be seen, the MNNs with 4 and 5 neural networks consistently performed
better than the single ANN model. Meanwhile, the MNNs with 2 and 3 neural networks
gave good results at first and became less and less effective when the prediction period
got longer. The model with 5 neural networks performs only marginally better than the
80
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The number of ANNs in the MNN was determined based on the length of the
expected prediction term. Since 24<24<25, a maximum of four ANNs were used to
predict 24 hours ahead. However, this number can be set smaller based on validation
errors produced by different combinations of ANNs.
The weights of the first ANN were initialized with small values in the range from 0 to
0.5. The following ANNs were initialized with the previous ANN’s weights in order to
reduce training time.
Validation was done every four training cycles. Single step validation showed better
results than multiple step validation.
After five neural networks had been trained, five combinations of them that include 1,
2, 3, 4 and 5 neural networks was validated on the validation set to predict 24 hours
ahead. The one with lowest error rate out of the least four was chosen as the final MNN.
The one with only one neural network was the single ANN.
RMSE on Validation Data
0.4 t -
0.3 1 NN2 NN3 NN
0.20.1
5 NNto 03CM
Lead
Figure 4.2.3. Validated RMSE of 5 models for 24 hour period
As can be seen, the MNNs with 4 and 5 neural networks consistently performed
better than the single ANN model. Meanwhile, the MNNs with 2 and 3 neural networks
gave good results at first and became less and less effective when the prediction period
got longer. The model with 5 neural networks performs only marginally better than the
80
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
one with 4 neural networks. The average MAPEs for the five models with 1 to 5 neural
networks were 11.7%, 40.3%, 11.02%, 8.84% and 8.76% relatively. The one with 5
neural networks was chosen as the final MNN for testing.
4.2.3 Testing
In order to facilitate a comparison between a MNN and a single ANN, the same test
set of data was applied to the MNN and the single ANN to predict hourly flow rate for
different leads from 1 to 24 hours. Figure 4.2.4 summarizes the results.
0.07 0.06 0.05 0.04 0.03 0.02 0.01
Figure 4.2.4. Test errors for MNN and single ANN for 24 hour period
Figure 4.2.4 indicates that the MNN consistently performs better than the single first
ordered ANN. As the prediction term increases, the difference is more significant. This
indicates a MNN performs better than a single ANN in long term forecast.
Average MAPEs were calculated as follows, where MAPE(i) is the mean absolute
percentage error for lead i.
1 24
Average _MAPE = —MAPE(i) 24 l=1
81
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
one with 4 neural networks. The average MAPEs for the five models with 1 to 5 neural
networks were 11.7%, 40.3%, 11.02%, 8.84% and 8.76% relatively. The one with 5
neural networks was chosen as the final MNN for testing.
4.2.3 Testing
In order to facilitate a comparison between a MNN and a single ANN, the same test
set of data was applied to the MNN and the single ANN to predict hourly flow rate for
different leads from 1 to 24 hours. Figure 4.2.4 summarizes the results.
RMSE on Test Data
(O 0.04 | 0.03
0.02 -
0.01
•RMSE by ANN •RMSE by MNN
Lead
Figure 4.2.4. Test errors for MNN and single ANN for 24 hour period
Figure 4.2.4 indicates that the MNN consistently performs better than the single first
ordered ANN. As the prediction term increases, the difference is more significant. This
indicates a MNN performs better than a single ANN in long term forecast.
Average MAPEs were calculated as follows, where MAPE(i) is the mean absolute
percentage error for lead i.
1 24Average _ MAPE = — MAPE(i)
81
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
For 24 hours, the average errors were 12.38% with the single ANN and 8.736% with
the MNN. Figure 4.2.5 illustrates the desired output and predicted outputs from the MNN
and ANN model for a prediction lead of 24 hours.
Predicted vs. Actual (24 hours ahead)
400
300
200 -
100
0
53 105 157 209 261 313 365 417
—Actual
— Predicted by MNN
Predicted by ANN
Figure 4.2.5. Predicted vs. actual for 24 hours ahead
As can be seen, neither model's performance was reasonably good. The prediction by
the MNN shaped like a delayed version of the actual outputs and the prediction by the
single ANN was rather random.
For the first 6 hours, the average errors were 5.75% with the single ANN and 4.971%
with the MNN. An error of 5% or less was considered acceptable. Figure 4.2.6 illustrates
the desired output and predicted outputs from MNN and ANN for a prediction lead of 6
hours.
400
300 -
200 -
100
0
Predicted vs. Actual (6 hours ahead)
—Actual
— Predicted by MNN
Predicted by ANN
53 105 157 209 261 313 365 417
Figure 4.2.6. Predicted vs. actual for 6 hours ahead
82
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
For 24 hours, the average errors were 12.38% with the single ANN and 8.736% with
the MNN. Figure 4.2.5 illustrates the desired output and predicted outputs from the MNN
and ANN model for a prediction lead of 24 hours.
Predicted vs. Actual (24 hours ahead)
300
200
1 0 0
0
------ Actual
------ P redicted byMNN
Predicted by
1 53 105 157 209 261 313 365 417ANN
Figure 4.2.5. Predicted vs. actual for 24 hours ahead
As can be seen, neither model’s performance was reasonably good. The prediction by
the MNN shaped like a delayed version of the actual outputs and the prediction by the
single ANN was rather random.
For the first 6 hours, the average errors were 5.75% with the single ANN and 4.971%
with the MNN. An error of 5% or less was considered acceptable. Figure 4.2.6 illustrates
the desired output and predicted outputs from MNN and ANN for a prediction lead of 6
hours.
Predicted vs. Actual (6 hours ahead)
400
300 --
2001 0 0
1 53 105 157 209 261 313 365 417
■Actual
- Predicted by MNNPredicted by ANN
Figure 4.2.6. Predicted vs. actual for 6 hours ahead
82
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
As can be seen, both the predicted lines are quite close to the actual line but the MNN
predicted better than the single ANN.
4.2.4 Discussions
The poor performance of both ANN and MNN models on the 24-hour prediction can be
explained as follows. Firstly, there may be not enough data points for training. The data
used in this study was collected in less than a year. Secondly, special occasions such as
holidays and weekends have not been considered. The gas usage patterns in such
occasions may be different from that of a normal day. Thirdly, seasonal effects may play
a role. A network trained on a data set for summer may not generalize well in winter.
4.2.5 Conclusion and Future Works
The case study indicates that a MNN model shows superior performance over a single
ANN model in long-term prediction. However, if the period is too long, neither model
can predict well. Incorporating more neural networks also do not guarantee to lower the
error. In the study above, the model with two neural networks showed less satisfactory
performance overall than the single neural network. Using more than five neural
networks to predict 24 period ahead is unavailing because the neural networks with
orders greater than or equal to five predict 32 periods or above.
Future research can include collecting more data and subdividing the problem into
sub-problems. Classification can be based on seasons, weekends or weekdays.
83
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
As can be seen, both the predicted lines are quite close to the actual line but the MNN
predicted better than the single ANN.
4.2.4 Discussions
The poor performance of both ANN and MNN models on the 24-hour prediction can be
explained as follows. Firstly, there may be not enough data points for training. The data
used in this study was collected in less than a year. Secondly, special occasions such as
holidays and weekends have not been considered. The gas usage patterns in such
occasions may be different from that of a normal day. Thirdly, seasonal effects may play
a role. A network trained on a data set for summer may not generalize well in winter.
4.2.5 Conclusion and Future Works
The case study indicates that a MNN model shows superior performance over a single
ANN model in long-term prediction. However, if the period is too long, neither model
can predict well. Incorporating more neural networks also do not guarantee to lower the
error. In the study above, the model with two neural networks showed less satisfactory
performance overall than the single neural network. Using more than five neural
networks to predict 24 period ahead is unavailing because the neural networks with
orders greater than or equal to five predict 32 periods or above.
Future research can include collecting more data and subdividing the problem into
sub-problems. Classification can be based on seasons, weekends or weekdays.
83
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5
Observations and Discussions
This chapter presents some observations and discussions derived from development of
the case studies in chapter 4.
5.1 Discussions on Suitability of Time Series Modeling
in Forecasting
The two case studies presented in the chapter 4 are two satisfying applications of time
series modeling in forecasting. However, not all time series can be used to build a
meaningful forecasting model.
Sometimes crucial information is missing from a time series. For example,
temperature is a factor that influences gas consumption but it is not coded in the gas
consumption time series. In literature, there are two solutions for this kind of problem.
One is to use multivariate time series modeling [Ch94] (cited in [Ru95]). In this
approach, the temperature time series is included in the model as an independent variable.
The other approach is to classify the time series into several classes and apply univariate
time series modeling to each class [LC98], [LCMT99]. For example, based on the date
that the gas consumption was recorded, a data point can be classified to a hot season class
or cold season class. Separate models will be built using each of these classes. However,
these two methods may require data which is not always obtainable.
84
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 5
Observations and Discussions
This chapter presents some observations and discussions derived from development of
the case studies in chapter 4.
5.1 Discussions on Suitability of Time Series Modeling
in Forecasting
The two case studies presented in the chapter 4 are two satisfying applications of time
series modeling in forecasting. However, not all time series can be used to build a
meaningful forecasting model.
Sometimes crucial information is missing from a time series. For example,
temperature is a factor that influences gas consumption but it is not coded in the gas
consumption time series. In literature, there are two solutions for this kind of problem.
One is to use multivariate time series modeling [Ch94] (cited in [Ru95]). In this
approach, the temperature time series is included in the model as an independent variable.
The other approach is to classify the time series into several classes and apply univariate
time series modeling to each class [LC98], [LCMT99]. For example, based on the date
that the gas consumption was recorded, a data point can be classified to a hot season class
or cold season class. Separate models will be built using each of these classes. However,
these two methods may require data which is not always obtainable.
8 4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.2 Discussions on Using the NOL Tool-kit
In my opinion, NOL is a useful tool-kit for industrial users who have little knowledge on
neural network structures and algorithms but who wish to develop a neural network
application. NOL allows quick and simple development of a neural network application.
Parameters such as learning rate are adjusted automatically during the training process. It
requires only a little training to be able to use the basic features of the tool-kit. However,
in order to utilize the tool-kit fully, knowledge of G2 is mandatory and it could take some
effort to find out the meaning and usage of various NOL blocks.
There are a number of other disadvantages. Since the source code is not available, it
is difficult to modify or improve a neural network's parameters and algorithms, or to
deploy the neural network in an environment that involves other than Gensym's products.
While most other simulator allows users to run a neural network after a few
configurations, NOL requires users to build a neural network by connecting its
components together.
Nevertheless, NOL is still useful for developers to build prototypes of neural network
model or investigate the feasibility of building a neural network application. NOL also
includes a facility to conduct sensibility testing which is very useful in selecting input
variables.
5.3 Discussions on Using the MNN Tool
5.3.1 Reusing weights of lower-ordered ANNs
It is not determined whether to re-use the weights of low-ordered ANN to initialize
higher-ordered ANN is desirable. On the positive side, reusing the weights reduces the
85
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.2 Discussions on Using the NOL Tool-kit
In my opinion, NOL is a useful tool-kit for industrial users who have little knowledge on
neural network structures and algorithms but who wish to develop a neural network
application. NOL allows quick and simple development of a neural network application.
Parameters such as learning rate are adjusted automatically during the training process. It
requires only a little training to be able to use the basic features of the tool-kit. However,
in order to utilize the tool-kit fully, knowledge of G2 is mandatory and it could take some
effort to find out the meaning and usage of various NOL blocks.
There are a number of other disadvantages. Since the source code is not available, it
is difficult to modify or improve a neural network’s parameters and algorithms, or to
deploy the neural network in an environment that involves other than Gensym’s products.
While most other simulator allows users to run a neural network after a few
configurations, NOL requires users to build a neural network by connecting its
components together.
Nevertheless, NOL is still useful for developers to build prototypes of neural network
model or investigate the feasibility of building a neural network application. NOL also
includes a facility to conduct sensibility testing which is very useful in selecting input
variables.
5.3 Discussions on Using the MNN Tool
5.3.1 Reusing weights of lower-ordered ANNs
It is not determined whether to re-use the weights of low-ordered ANN to initialize
higher-ordered ANN is desirable. On the positive side, reusing the weights reduces the
85
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
time necessary to train higher-ordered ANNs. It could also benefit when the amount of
data available is low or the data has high level of discontinuity due to missing data. In
this case, the number of data records for training higher-ordered ANN may not be
sufficient to train the ANN from scratch. Since it is expected that the weights of lower-
ordered ANNs be more or less close to the optimal weights of higher-ordered ANNs, it
could be better to reuse the weights. However, this is possible only when the higher-
ordered and lower-ordered ANNs have the same number of hidden units. A disadvantage
of reusing weights is that the training of the higher-ordered ANN can easily be stuck in a
local minimum close to the place where the training of the lower-ordered ANN stops.
Therefore, in the case where there is sufficient data, it is recommended to initialize all
ANNs with random weights. Training the ANNs separately can increase the diversity
among the ANNs, which could be a factor that can increase the generalization ability of a
MNN.
5.3.2 Using multi-step validation
Similarly, multi-step validation does not always improve the training results. In the case
where the data is discontinuous due to missing values, multi-step validation reduces the
number of validation records. For example, for a 2nd-ordered network, we need records of
length 4 to perform single-step validation and records of length 7 for multi-step
validation (Refer to 3.5.2). Moreover, if one low-ordered neural network is not trained
well, the following higher-ordered networks are also affected because the training is
dependent on the previous network. Therefore, it is recommended to well train the ANNs
one by one from the lowest order to the highest order.
86
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
time necessary to train higher-ordered ANNs. It could also benefit when the amount of
data available is low or the data has high level of discontinuity due to missing data. In
this case, the number of data records for training higher-ordered ANN may not be
sufficient to train the ANN from scratch. Since it is expected that the weights of lower-
ordered ANNs be more or less close to the optimal weights of higher-ordered ANNs, it
could be better to reuse the weights. However, this is possible only when the higher-
ordered and lower-ordered ANNs have the same number of hidden units. A disadvantage
of reusing weights is that the training of the higher-ordered ANN can easily be stuck in a
local minimum close to the place where the training of the lower-ordered ANN stops.
Therefore, in the case where there is sufficient data, it is recommended to initialize all
ANNs with random weights. Training the ANNs separately can increase the diversity
among the ANNs, which could be a factor that can increase the generalization ability of a
MNN.
5.3.2 Using multi-step validation
Similarly, multi-step validation does not always improve the training results. In the case
where the data is discontinuous due to missing values, multi-step validation reduces the
number of validation records. For example, for a 2nd-ordered network, we need records of
length 4 to perform single-step validation and records of length 7 for multi-step
validation (Refer to 3.5.2). Moreover, if one low-ordered neural network is not trained
well, the following higher-ordered networks are also affected because the training is
dependent on the previous network. Therefore, it is recommended to well train the ANNs
one by one from the lowest order to the highest order.
86
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.3.3 Setting training parameters
Size of Input Vector
There are two ways to choose the size of an input vector. One is based on domain
experts' opinions. The other is by fixing the number of hidden units and varying the size
of input vector to choose the one with the lowest error.
Number of hidden units
After the size of an input vector has been chosen, the number of hidden units of each
ANN should be determined by trial and error. In the beginning, the number of hidden
units should be initialized with a small value. If the performance is poor then increase this
number. On the other hand, if there is evidence of over-fitting, then decrease the number
of hidden units.
Maximum number of training cycles
Users set the maximum number of training cycles before training. This number should be
large enough to reduce users' interaction. On the other hand, this number should be small
enough so that the users can update the parameters such as learning rate and momentum
when necessary (Refer to 5.3.4).
Choosing the size of validation windows
Validation window is a window of validation errors that is use to detect over-fitting. If
the chosen size is too small, the training process could stop at the first local minimum
that it reaches. On the other hand, if the validation window is too large, the training can
easily miss a minimum located near a second higher minimum. Since the condition for
claiming over-fitting is that the errors in the validation window increase monotonically,
the training process in Figure 5.1 skips the first minimum. The method using validation
87
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5.3.3 Setting training parameters
Size of Input Vector
There are two ways to choose the size of an input vector. One is based on domain
experts’ opinions. The other is by fixing the number of hidden units and varying the size
of input vector to choose the one with the lowest error.
Number of hidden units
After the size of an input vector has been chosen, the number of hidden units of each
ANN should be determined by trial and error. In the beginning, the number of hidden
units should be initialized with a small value. If the performance is poor then increase this
number. On the other hand, if there is evidence of over-fitting, then decrease the number
of hidden units.
Maximum number of training cycles
Users set the maximum number of training cycles before training. This number should be
large enough to reduce users’ interaction. On the other hand, this number should be small
enough so that the users can update the parameters such as learning rate and momentum
when necessary (Refer to 5.3.4).
Choosing the size of validation windows
Validation window is a window of validation errors that is use to detect over-fitting. If
the chosen size is too small, the training process could stop at the first local minimum
that it reaches. On the other hand, if the validation window is too large, the training can
easily miss a minimum located near a second higher minimum. Since the condition for
claiming over-fitting is that the errors in the validation window increase monotonically,
the training process in Figure 5.1 skips the first minimum. The method using validation
87
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
window in this tool is just a coarse-grained solution to detect over-fitting and it needs to
be improved in the future.
First minimur Second minimum
Va ow
Figure 5.1 Side effect of large validation window
5.3.4 Updating training parameters
Using the MNN tool to train neural networks is only semi automatic as the users are still
responsible for choosing the best values for training parameters. However, once the
parameters have been set, the tool will automatically detect over-fitting or stop training
when the error goes below a threshold set by the user. Users do not need to eye error
graphs during a training process. However, it is recommended that users examine the
error trend every time the maximum number of training cycles is reached and update the
parameters if necessary to get better training performance.
Parameters that can be updated are the learning rate and the momentum. A large
learning rate allows fast convergence but also can cause the model to oscillate around a
minimum. Momentum factor tends to keep the weight changes moving in the same
direction, hence allows the algorithm to skip over small local minima. It can also improve
the speed of learning. However, as in the case of learning rate, a large momentum factor
88
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
window in this tool is just a coarse-grained solution to detect over-fitting and it needs to
be improved in the future.
First minimum Second minimum
Valic ation win dow
Figure 5.1 Side effect of large validation window
5.3.4 Updating training parameters
Using the MNN tool to train neural networks is only semi automatic as the users are still
responsible for choosing the best values for training parameters. However, once the
parameters have been set, the tool will automatically detect over-fitting or stop training
when the error goes below a threshold set by the user. Users do not need to eye error
graphs during a training process. However, it is recommended that users examine the
error trend every time the maximum number of training cycles is reached and update the
parameters if necessary to get better training performance.
Parameters that can be updated are the learning rate and the momentum. A large
learning rate allows fast convergence but also can cause the model to oscillate around a
minimum. Momentum factor tends to keep the weight changes moving in the same
direction, hence allows the algorithm to skip over small local minima. It can also improve
the speed of learning. However, as in the case of learning rate, a large momentum factor
88
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
may cause the network to skip too much. In the studies in this thesis, momentum was
fixed and learning rate was adjusted based on trial and error.
89
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
may cause the network to skip too much. In the studies in this thesis, momentum
fixed and learning rate was adjusted based on trial and error.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 6
Conclusion and Future Works
6.1 Concluding Summary
One of the research objectives is to develop neural network models for two prediction
applications. The first application predicts monthly oil production of a well and the
second predicts hourly gas consumption.
The first step of the research project was to determine if time series alone is sufficient
to develop a good model. For the first application, two neural network models were
developed, which have different input vectors. One includes only time series lags as
input, the other has other additional variables. The results show that the more
sophisticated model did not perform better than the univariate time series model.
Sensitivity testing on the mixed model also confirmed that the time series lags had higher
influence on the output than the other additional variables. The reasonable errors also
suggest that neural network is a promising technique for a problem that petroleum
engineering has not successfully dealt with using conventional techniques.
As a next step in the research project on developing neural network models for the
two applications, the models for both applications were extended to predict longer term
ahead. We proposed a multiple neural network structure in an attempt to reduce the error
that accumulates in the recursive propagation process. The multiple-neural-network
model propagates ahead in different-length steps to make forecasts. The experimental
results were in favor of the proposed structure. A disadvantage of multiple neural
90
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 6
Conclusion and Future Works
6.1 Concluding Summary
One of the research objectives is to develop neural network models for two prediction
applications. The first application predicts monthly oil production of a well and the
second predicts hourly gas consumption.
The first step of the research project was to determine if time series alone is sufficient
to develop a good model. For the first application, two neural network models were
developed, which have different input vectors. One includes only time series lags as
input, the other has other additional variables. The results show that the more
sophisticated model did not perform better than the univariate time series model.
Sensitivity testing on the mixed model also confirmed that the time series lags had higher
influence on the output than the other additional variables. The reasonable errors also
suggest that neural network is a promising technique for a problem that petroleum
engineering has not successfully dealt with using conventional techniques.
As a next step in the research project on developing neural network models for the
two applications, the models for both applications were extended to predict longer term
ahead. We proposed a multiple neural network structure in an attempt to reduce the error
that accumulates in the recursive propagation process. The multiple-neural-network
model propagates ahead in different-length steps to make forecasts. The experimental
results were in favor of the proposed structure. A disadvantage of multiple neural
90
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
network techniques is that it requires a data set involving longer and more continuous
time series in order to build a model.
The contribution of this work is modification of the recursive neural network
approach and the successful application of this method in the two industrial problem
domains. The proposed multiple neural network method generated results with a higher
accuracy in long term forecasting than the single neural network.
The idea of using multiple neural networks is not new. Several methods for
combining evidences produced by multiple sources into one final result have been
developed [HSY94][CB95]. Multiple neural network methods have also been applied in
time series modeling to improve the accuracy of long-term forecast [DSMV01]. The
novelty of the approach proposed in this thesis is the use of different exponential
prediction terms for the component neural networks. The variety of prediction terms for
the various component networks allows the combined model to cover both short term and
long term trends. Moreover, this method does not require many component networks.
Since the prediction terms of the component networks increase in an exponential manner
and the length of the longest prediction term determines the exponent of the highest-
ordered component network, the number of component networks is not high even when
the prediction term is long.
Since neural networks have the ability to encode redundant and missing information,
it is expected that the application models are robust and reliable. The models are also
reusable in a changing situation since the training of neural networks has generation
property. After a training process, a generation of a neural network is created. When the
situation changes, the neural network can resume the training with new data to generate
91
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
network techniques is that it requires a data set involving longer and more continuous
time series in order to build a model.
The contribution of this work is modification of the recursive neural network
approach and the successful application of this method in the two industrial problem
domains. The proposed multiple neural network method generated results with a higher
accuracy in long term forecasting than the single neural network.
The idea of using multiple neural networks is not new. Several methods for
combining evidences produced by multiple sources into one final result have been
developed [HSY94][CB95]. Multiple neural network methods have also been applied in
time series modeling to improve the accuracy of long-term forecast [DSMV01], The
novelty of the approach proposed in this thesis is the use of different exponential
prediction terms for the component neural networks. The variety of prediction terms for
the various component networks allows the combined model to cover both short term and
long term trends. Moreover, this method does not require many component networks.
Since the prediction terms of the component networks increase in an exponential manner
and the length of the longest prediction term determines the exponent of the highest-
ordered component network, the number of component networks is not high even when
the prediction term is long.
Since neural networks have the ability to encode redundant and missing information,
it is expected that the application models are robust and reliable. The models are also
reusable in a changing situation since the training of neural networks has generation
property. After a training process, a generation of a neural network is created. When the
situation changes, the neural network can resume the training with new data to generate
91
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
another generation that can adapt to the new situation. The new generation therefore
inherits the characteristics of the previous generation.
We observed some limitations that cause forecasting inaccuracy no matter how well a
model is trained.
• There is genuine random noise in the data due to error in the process of recording the
data. In both time series, we noticed several non-zero but abnormal data points. It is
not known whether they were incorrectly recorded or they are irregularities in the
data.
• The sample data set does not evenly spread in the feature space and it is only a partial
representation of the population. In the time series context, this means that the length
of time series under investigation is insufficient to represent all patterns in the
problem space. For example, the gas consumption time series does not contain
enough seasons to be able to include the seasonal factor.
• The factors that significantly influence the variable to be forecasted are unavailable
either completely or partially within the examined time span. An example of such
parameters is the temperature. When temperature declines, customers tend to use
more gas than when it is hot. However, future temperature itself is hard to predict.
The parameter of permeability in the petroleum production application is another
example. Since it is costly to measure this parameter, only one or two data points are
available during a well's life.
6.2 Future Works
Despite the satisfactory performance of the MNN application in petroleum production
prediction, some experts commented that the model should be built on individual wells
92
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
another generation that can adapt to the new situation. The new generation therefore
inherits the characteristics of the previous generation.
We observed some limitations that cause forecasting inaccuracy no matter how well a
model is trained.
• There is genuine random noise in the data due to error in the process of recording the
data. In both time series, we noticed several non-zero but abnormal data points. It is
not known whether they were incorrectly recorded or they are irregularities in the
data.
• The sample data set does not evenly spread in the feature space and it is only a partial
representation of the population. In the time series context, this means that the length
of time series under investigation is insufficient to represent all patterns in the
problem space. For example, the gas consumption time series does not contain
enough seasons to be able to include the seasonal factor.
• The factors that significantly influence the variable to be forecasted are unavailable
either completely or partially within the examined time span. An example of such
parameters is the temperature. When temperature declines, customers tend to use
more gas than when it is hot. However, future temperature itself is hard to predict.
The parameter of permeability in the petroleum production application is another
example. Since it is costly to measure this parameter, only one or two data points are
available during a well’s life.
6.2 Future Works
Despite the satisfactory performance of the MNN application in petroleum production
prediction, some experts commented that the model should be built on individual wells
92
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
because each well has unique characteristics and that the results obtained in the
experiments described were good by chance. The problem with developing an individual
model for each well is the serious lack of data. A well could last up to 30 years. If a
model is built after 5 years then there are only 60 monthly productions. A portion of this
set should be withdrawn for testing, which leave us with about 40 data points. If this
approach is followed, an attempt could be made using a combination of techniques such
as in [Ru95]. In the study, Rumantir used a statistical technique to model the trend and
seasonal factors and neural network technique to model the irregularities. Since the
statistical technique requires little data, the problem of insufficient data is overcome.
For the problem of predicting gas consumption, it is difficult to make any
improvement unless more data is collected. Further research can include dividing the
problem into sub-problems based on season of the year. We estimate that two to five
years of data is necessary in order to build accurate seasonal models.
A weakness in the reported work is that only the simple hold out validation method is
employed in the current systems. The data set was divided into two portions for training
and testing. However, this method of evaluation can have a high variance. The evaluation
may depend heavily on which data points end up in the training set and which end up in
the test set, and thus the evaluation may be significantly different depending on how the
division is made.
Future improvement to the systems can include cross validation. K-fold cross
validation is one way to improve over the hold out method. The data set is divided into k
subsets, and the holdout method is repeated k times. Each time, one of the k subsets is
used as the test set and the other k-1 subsets are put together to form a training set. Then
93
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
because each well has unique characteristics and that the results obtained in the
experiments described were good by chance. The problem with developing an individual
model for each well is the serious lack of data. A well could last up to 30 years. If a
model is built after 5 years then there are only 60 monthly productions. A portion of this
set should be withdrawn for testing, which leave us with about 40 data points. If this
approach is followed, an attempt could be made using a combination of techniques such
as in [Ru95]. In the study, Rumantir used a statistical technique to model the trend and
seasonal factors and neural network technique to model the irregularities. Since the
statistical technique requires little data, the problem of insufficient data is overcome.
For the problem of predicting gas consumption, it is difficult to make any
improvement unless more data is collected. Further research can include dividing the
problem into sub-problems based on season of the year. We estimate that two to five
years of data is necessary in order to build accurate seasonal models.
A weakness in the reported work is that only the simple hold out validation method is
employed in the current systems. The data set was divided into two portions for training
and testing. However, this method of evaluation can have a high variance. The evaluation
may depend heavily on which data points end up in the training set and which end up in
the test set, and thus the evaluation may be significantly different depending on how the
division is made.
Future improvement to the systems can include cross validation. K-fold cross
validation is one way to improve over the hold out method. The data set is divided into k
subsets, and the holdout method is repeated k times. Each time, one of the k subsets is
used as the test set and the other k-1 subsets are put together to form a training set. Then
93
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the average error over all k test set is computed. The advantage of the k-fold cross
validation method is that it matters less how the data gets divided. Every data point gets
to be in a test set exactly once, and gets to be in a training set k-1 times. As k is increased,
the variance of the evaluation decreases. The disadvantage of this method is that the
training algorithm has to be rerun k times, which means it takes k times as much
computation to make an evaluation.
The current MNN tool still needs much improvement. Considering the serious loss in
the number of data records when a missing data point is eliminated, a method to fill the
missing point should be applied. A simple approach could be replacing the missing data
with the average of neighboring points.
A future topic to investigate would be a more automatic strategy for training which
can reduce users' efforts. A number of methods to adapting the learning rate such as bold
driver [Sa99] and annealing [BA98] have been proposed in the literature. In bold driver
neural network, after each training cycle, the training error is compared to its previous
value. If the error has decreased, the learning rate is increased slightly. If the error has
increased significantly, the last weight changes are discarded and the learning rate is
decreased sharply. The bold driver method keeps growing learning rate slowly until it
finds itself taking a step that has clearly gone too far up onto the opposite slope of the
error function. The annealing method gradually lowers the global learning rate.
Another more ambitious improvement to the MNN system could be implementing
different kinds of training algorithms for the component neural networks.
94
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the average error over all k test set is computed. The advantage of the k-fold cross
validation method is that it matters less how the data gets divided. Every data point gets
to be in a test set exactly once, and gets to be in a training set k-1 times. As k is increased,
the variance of the evaluation decreases. The disadvantage of this method is that the
training algorithm has to be rerun k times, which means it takes k times as much
computation to make an evaluation.
The current MNN tool still needs much improvement. Considering the serious loss in
the number of data records when a missing data point is eliminated, a method to fill the
missing point should be applied. A simple approach could be replacing the missing data
with the average of neighboring points.
A future topic to investigate would be a more automatic strategy for training which
can reduce users’ efforts. A number of methods to adapting the learning rate such as bold
driver [Sa99] and annealing [BA98] have been proposed in the literature. In bold driver
neural network, after each training cycle, the training error is compared to its previous
value. If the error has decreased, the learning rate is increased slightly. If the error has
increased significantly, the last weight changes are discarded and the learning rate is
decreased sharply. The bold driver method keeps growing learning rate slowly until it
finds itself taking a step that has clearly gone too far up onto the opposite slope of the
error function. The annealing method gradually lowers the global learning rate.
Another more ambitious improvement to the MNN system could be implementing
different kinds of training algorithms for the component neural networks.
9 4
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Bibliography
[BA98] Bos S. and Amari S., Annealed online learning in multilayer neural networks,
1998, downloaded from citeseer.nj.nec.com/147107.html on October 2002
[BLM93] Bozna M, Lesjak M. and Mlakar P., A Neural Network Based Method for
Short-Term Predictions of Ambient SO2 Concentrations in Highly Polluted Industrial
Areas of Complex Terrain, Atmospheric Environment, vol. 27B, no. 2, 1993, pp. 221-230
[CLC97] Chih-Chou Chiu, Ling-Jing Kao and Cook D. F., Combining a Neural Network
with a rule-Based Expert System Approach for Short-Term Power Load Forecasting in
Taiwan, Expert System With Applications, vol. 13, no. 4, 1997, pp. 299-305
[CNO3] Chan C.W. and Nguyen H.H., Artificial Intelligence Techniques in Forecasting
Applications, In Leondes C.T, ed, Intelligent systems: Technology and Applications,
CRC Press, Boca Raton, London, New York, Washington D.C., 2003, vol. 5, ch. 5, pp.
115-152
[Ch94] Chakraborty K. et.al., Forecasting the behavior of multivariate time series using
neural networks, in V. Rao Vemuri and Robert D. Rogers, eds, Artificial Neural
Networks : Forecasting Time Series, IEEE Computer Society Press, Los Alamitos,
California, 1994, pp. 51-60
[Ch75] Chatfield C., The Analysis of Time Series: Theory and Practice, London:
Chapman & Hall, New York: Halsted Press, 1975
[CK95] Cho S.B. and Kim J.H., Combining Multiple Neural Networks by Fuzzy Integral
for Robust Classification, IEEE Transactions on Systems, Man, and Cybernetics, vol. 25,
no. 2, 1995, pp. 380-384
95
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Bibliography
[BA98] Bos S. and Amari S., Annealed online learning in multilayer neural networks,
1998, downloaded from citeseer.nj.nec.com/147107.html on October 2002
[BLM93] Bozna M, Lesjak M. and Mlakar P., A Neural Network Based Method for
Short-Term Predictions of Ambient SO2 Concentrations in Highly Polluted Industrial
Areas of Complex Terrain, Atmospheric Environment, vol. 27B, no. 2, 1993, pp. 221-230
[CLC97] Chih-Chou Chiu, Ling-Jing Kao and Cook D. F., Combining a Neural Network
with a rule-Based Expert System Approach for Short-Term Power Load Forecasting in
Taiwan, Expert System With Applications, vol. 13, no. 4, 1997, pp. 299-305
[CN03] Chan C.W. and Nguyen H.H., Artificial Intelligence Techniques in Forecasting
Applications, In Leondes C.T, ed, Intelligent systems: Technology and Applications,
CRC Press, Boca Raton, London, New York, Washington D.C., 2003, vol. 5, ch. 5, pp.
115-152
[Ch94] Chakraborty K. et.al., Forecasting the behavior of multivariate time series using
neural networks, in V. Rao Vemuri and Robert D. Rogers, eds, Artificial Neural
Networks : Forecasting Time Series, IEEE Computer Society Press, Los Alamitos,
California, 1994, pp. 51-60
[Ch75] Chatfield C., The Analysis o f Time Series: Theory and Practice, London:
Chapman & Hall, New York: Halsted Press, 1975
[CK95] Cho S.B. and Kim J.H., Combining Multiple Neural Networks by Fuzzy Integral
for Robust Classification, IEEE Transactions on Systems, Man, and Cybernetics, vol. 25,
no. 2,1995, pp. 380-384
95
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[Di85] Dikkers A. J., Geology in Petroleum Production, Amsterdam, New York:
Elsevier, 1985
[Do96] Dorffner G., Neural Networks for Time Series Processing, Neural Network
World, vol. 6, no.4, 1996, pp. 447-468
[DSMV01] Duhoux M, Suykens J.A.K, De Moor B., and Vandewalle J., Improved Long-
Term Temperature Prediction by Chaining of Neural Networks, International Journal of
Neural Systems, vol. 11, no. 1, 2001, pp. 1-10
[FS87] Farmer J.D. and Sidorowich J.J, Predicting Chaotic Time-Series, Physical Review
Letters, vol. 59, no. 8, 1987, pp. 845-848
[GD99] Gardner M.W, Dorling S.R., Neural Network Modeling and Prediction of Hourly
NOx and NO2 Concentrations in Urban Air in London, Atmospheric Environment, vol.
33, no. 5, 1999, pp. 709-719
[Ge95] Gensym Corporation, NeurOn-Line Reference Manual 1.1, 1995
[GRT99] Guhaathakurta P., Rajeevan M. and Thapliyal V., Long Range Forecasting
Indian Summer Monsoon Rainfall by a Hybrid Principle Component Neural Network
Model, Meteorology and Atmospheric Physics, vol. 71, 1999, pp. 255-266
[HG93] Harrison H.C. and Gong Qizhong, An Intelligent Business Forecasting Systems,
AMC Conference on Computer Science, Indianapolis, IN, USA, 1993, pp. 229-236
[HSY94] Hashem S., Schemeiser B. and Yih Y., Optimal Linear Combinations of Neural
Networks: An Overview, Proceedings of the 1994 IEEE International Conference on
Neural Networks (ICNN'94), vol. 3, Orlando, FL, 1994, pp. 1507-1512
96
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[Di85] Dikkers A. J., Geology in Petroleum Production, Amsterdam, New York:
Elsevier, 1985
[Do96] Dorffner G., Neural Networks for Time Series Processing, Neural Network
World, vol. 6, no.4, 1996, pp. 447-468
[DSMV01] Duhoux M, Suykens J.A.K, De Moor B., and Vandewalle J., Improved Long-
Term Temperature Prediction by Chaining of Neural Networks, International Journal of
Neural Systems, vol. 11, no. 1, 2001, pp. 1-10
[FS87] Farmer J.D. and Sidorowich J.J, Predicting Chaotic Time-Series, Physical Review
Letters, vol. 59, no. 8, 1987, pp. 845-848
[GD99] Gardner M.W, Dorling S.R., Neural Network Modeling and Prediction of Hourly
NOx and NO2 Concentrations in Urban Air in London, Atmospheric Environment, vol.
33, no. 5, 1999, pp. 709-719
[Ge95] Gensym Corporation, NeurOn-Line Reference Manual 1.1, 1995
[GRT99] Guhaathakurta P., Rajeevan M. and Thapliyal V., Long Range Forecasting
Indian Summer Monsoon Rainfall by a Hybrid Principle Component Neural Network
Model, Meteorology and Atmospheric Physics, vol. 71, 1999, pp. 255-266
[HG93] Harrison H.C. and Gong Qizhong, An Intelligent Business Forecasting Systems,
AMC Conference on Computer Science, Indianapolis, IN, USA, 1993, pp. 229-236
[HSY94] Hashem S., Schemeiser B. and Yih Y., Optimal Linear Combinations of Neural
Networks: An Overview, Proceedings o f the 1994 IEEE International Conference on
Neural Networks (ICNN’94), vol. 3, Orlando, FL, 1994, pp. 1507-1512
9 6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[KHOO] Kao J.J and Huang S.S, Forecasts Using Neural Network versus Box-Jenkins
Methodology for Ambient Air Quality Monitoring Data, Journal of the Air & Waste
Management Association, vol. 50, 2000, pp. 219-226
[KNJK89] Kadaba N, Nygard K.E., Juell P.L., and Kangas L., Modular Back-
Propagation Neural Networks for Large Domain Pattern Classification, Proceedings of
the International Joint Conference on Neural Networks IJCNN'89, Washington DC, USA,
1989, vol. 2, pp. 607-610
[KR94] Kolarik T. and Rudorfer G., Time Series Forecasting Using Neural Networks,
ACM SIGAPL APL Quote Quad, vol. 25 no.1, 1994, pp. 86-94
[LC98] Lertpalangsunti N. and Chan C.W, An Architectural Framework for Construction
of Hybrid Intelligent Forecasting Systems: Application for Electricity Demand
Prediction, Engineering Applications of Artificial Intelligence, vol. 11, 1998, pp. 549-565
[LCMT99] Lertpalangsunti N., Chan C.W., Mason R., and Tontiwachwuthikul P., A
toolset for construction of hybrid intelligent forecasting systems: application for water
demand prediction, Artificial Intelligence in Engineering, vol. 13, no. 1, 1999, pp. 21-42
[Le96] Lee B. J, Applying Parallel Learning Models of Aritificial Neural Networks to
Letters Recognition from Phonemes, Proceedings of the Conference on Integrating
Multiple Learned Models for Improving and Scaling Machine Learning Algorithms,
Portland, Oregon, 1996, pp.66-71
[MJG90] Montgomery D. C., Johnson L. A., Gardiner J. S., Forecasting & Time Series
Analysis, 2nd ed., New York: McGraw-Hill Inc., 1990
97
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[KHOO] Kao J J and Huang S.S, Forecasts Using Neural Network versus Box-Jenkins
Methodology for Ambient Air Quality Monitoring Data, Journal o f the A ir & Waste
Management Association, vol. 50, 2000, pp. 219-226
[KNJK89] Kadaba N, Nygard K.E., Juell P.L., and Kangas L., Modular Back-
Propagation Neural Networks for Large Domain Pattern Classification, Proceedings of
the International Joint Conference on Neural Networks IJCNN’89, Washington DC, USA,
1989, vol. 2, pp. 607-610
[KR94] Kolarik T. and Rudorfer G., Time Series Forecasting Using Neural Networks,
ACM SIGAPL APL Quote Quad, vol. 25 no .l, 1994, pp. 86-94
[LC98] Lertpalangsunti N. and Chan C.W, An Architectural Framework for Construction
of Hybrid Intelligent Forecasting Systems: Application for Electricity Demand
Prediction, Engineering Applications o f Artificial Intelligence, vol. 11, 1998, pp. 549-565
[LCMT99] Lertpalangsunti N., Chan C.W., Mason R., and Tontiwachwuthikul P., A
toolset for construction of hybrid intelligent forecasting systems: application for water
demand prediction, Artificial Intelligence in Engineering, vol. 13, no. 1, 1999, pp. 21-42
[Le96] Lee B. J, Applying Parallel Learning Models of Aritificial Neural Networks to
Letters Recognition from Phonemes, Proceedings of the Conference on Integrating
Multiple Learned Models for Improving and Scaling Machine Learning Algorithms,
Portland, Oregon, 1996, pp.66-71
[MJG90] Montgomery D. C., Johnson L. A., Gardiner J. S., Forecasting & Time Series
Analysis, 2nd ed., New York: McGraw-Hill Inc., 1990
9 7
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[MSV99] McNames J., Suykens J.A.K. and Vandewalle, Winning Entry of the K. U.
Leuven Time Series Prediction Competition, International Journal of Bifurcation and
Chaos, vol. 9, no. 8, 1999, pp. 1485-1500
[NC00] Nguyen H.H. and Chan C.W., Petroleum Production Prediction: A Neural
Network Approach, International Joint Conference on Engineering Design and
Automation 2001 (EDA 2001), 5-8 August 2001, Las Vegas, USA, pp. 85-90
[NCM02] Nguyen H.H., Chan C.W., and Malcolm W., Prediction of Oil Well Production
using Multi-Neural Network, Proceedings of the 2002 IEEE Canadian Conference on
Electrical & Computer Engineering, Winnipeg, Canada, May 2002, pp. 798-802
[0d83] O'Donovan T. M., Short Term Forecasting: An Introduction to the Box-Jenkins
Approach, Chichester, New York: John Wiley & Sons, 1983
[Po89] Posner M.I., Foundation of Cognitive Science, The MIT Press, Cambridge, 1989
[Ru95] Rumantir G.W., A Hybrid Statistical and Feedforward Network Model for
Forecasting with a Limited Amount of Data: Average Monthly Water Demand Time-
series, Minor Master Thesis in Computer Science, RMIT (Royal Melbourne Institute of
Technology) University, 1995
[Sa99] Sarkar D., Methods to speed up error back-propagation learning algorithm, ACM
Computing Surveys (CSUR), vol. 27, no. 4, 1995, pp. 519-542
[SMKLS00] Swircz M, Mariak Z., Krejza J. and Szydlik P., Intracranial Pressure
Processing with Artificial Neural Networks: Prediction of ICP Trends, Acta
Neurochirurgica, vol. 142, 2000, pp. 401-406
98
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[MSV99] McNames J., Suykens J.A.K. and Vandewalle, Winning Entry of the K. U.
Leuven Time Series Prediction Competition, International Journal o f Bifurcation and
Chaos, vol. 9, no. 8, 1999, pp. 1485-1500
[NC00] Nguyen H.H. and Chan C.W., Petroleum Production Prediction: A Neural
Network Approach, International Joint Conference on Engineering Design and
Automation 2001 (EDA 2001), 5-8 August 2001, Las Vegas, USA, pp. 85-90
[NCM02] Nguyen H.H., Chan C.W., and Malcolm W., Prediction of Oil Well Production
using Multi-Neural Network, Proceedings o f the 2002 IEEE Canadian Conference on
Electrical & Computer Engineering, Winnipeg, Canada, May 2002, pp. 798-802
[Od83] O ’Donovan T. M., Short Term Forecasting: An Introduction to the Box-Jenkins
Approach, Chichester, New York: John Wiley & Sons, 1983
[Po89] Posner M.I., Foundation o f Cognitive Science, The MIT Press, Cambridge, 1989
[Ru95] Rumantir G.W., A Hybrid Statistical and Feedforward Network Model for
Forecasting with a Limited Amount of Data: Average Monthly Water Demand Time-
series, Minor Master Thesis in Computer Science, RMIT (Royal Melbourne Institute of
Technology) University, 1995
[Sa99] Sarkar D., Methods to speed up error back-propagation learning algorithm, ACM
Computing Surveys (CSUR), vol. 27, no. 4, 1995, pp. 519-542
[SMKLS00] Swircz M, Mariak Z., Krejza J. and Szydlik P., Intracranial Pressure
Processing with Artificial Neural Networks: Prediction of ICP Trends, Acta
Neurochirurgica, vol. 142, 2000, pp. 401-406
9 8
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[SSS00] Sahai A.K, Soman M.K and Satyan V, All India Summer Monsoon Rainfall
Prediction Using an Artificial Neural Network, Climate Dynamics, vol. 16, 2000, pp.
291-302
[TAF91] Tang Z., Almeida C. and Fishwick P. A., Time Series Forecasting Using Neural
Networks vs. Box-Jenkins Methodology, Simulation, vol. 57, no. 5, 1991, pp. 303-310
[THT97] Tangang F.T., Hsieh W.W and Tang B, Forecasting the Equatorial Pacific Sea
Surface Temperatures by Neural Network Models, Climate Dynamics, vol. 13, 1997, pp.
135-147
[Th95] Thearling K., Massively Parallel Architectures and Algorithms for Time Series
Analysis, In Nadel L. and Stein D., eds., 1993 Lectures in Complex Systems, Redwood
City, California: Addison-Wesley, 1995, pp.381-396
[Wa01] Walczak S., An Empirical Analysis of Data Requirements for Financial
Forecasting with Neural Networks, Journal of Management Information Systems, vol. 17,
no. 4, pp. 203-222
[Wi92] Winston P. H., Artificial Intelligence, 3rd ed., Reading, Mass.:Addison-Wesley
Publishing Company, 1992.
[WL93] Wu S. and Lu R.P, Combining Artificial Neural Networks and Statistics for
Stock-Market, Proceedings of the 1993 ACM conference on Computer science,
Indianapolis, Indiana, United States, 1993, pp. 257-264
[Ya99] Yasdi R, Prediction of Road Traffic Using a Neural Network Approach, Neural
Computing & Applications, vol. 8, 1999, pp. 135-142
99
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[SSSOO] Sahai A.K, Soman M.K and Satyan V, All India Summer Monsoon Rainfall
Prediction Using an Artificial Neural Network, Climate Dynamics, vol. 16, 2000, pp.
291-302
[TAF91] Tang Z., Almeida C. and Fishwick P. A., Time Series Forecasting Using Neural
Networks vs. Box-Jenkins Methodology, Simulation, vol. 57, no. 5, 1991, pp. 303-310
[THT97] Tangang F.T., Hsieh W.W and Tang B, Forecasting the Equatorial Pacific Sea
Surface Temperatures by Neural Network Models, Climate Dynamics, vol. 13, 1997, pp.
135-147
[Th95] Thearling K., Massively Parallel Architectures and Algorithms for Time Series
Analysis, In Nadel L. and Stein D., eds., 1993 Lectures in Complex Systems, Redwood
City, California: Addison-Wesley, 1995, pp.381-396
[WaOl] Walczak S., An Empirical Analysis of Data Requirements for Financial
Forecasting with Neural Networks, Journal o f Management Information Systems, vol. 17,
no. 4, pp. 203-222
[Wi92] Winston P. H., Artificial Intelligence, 3rd ed., Reading, Mass.:Addison-Wesley
Publishing Company, 1992.
[WL93] Wu S. and Lu R.P, Combining Artificial Neural Networks and Statistics for
Stock-Market, Proceedings o f the 1993 ACM conference on Computer science,
Indianapolis, Indiana, United States, 1993, pp. 257-264
[Ya99] Yasdi R, Prediction of Road Traffic Using a Neural Network Approach, Neural
Computing & Applications, vol. 8, 1999, pp. 135-142
9 9
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[YP96] Yi J. and Prybutok V.R, A Neural Network Model Forecasting for Prediction of
Dailly Maximum Ozone Concentration in an Industrialized Urban Area, Environmental
Pollution, vol. 92, no. 3, 1996, pp. 349-357
100
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[YP96] Yi J. and Prybutok V.R, A Neural Network Model Forecasting for Prediction of
Dailly Maximum Ozone Concentration in an Industrialized Urban Area, Environmental
Pollution, vol. 92, no. 3, 1996, pp. 349-357
100
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX A - RUNNING THE MNN TOOL
The classes of the MNN tool and the necessary libraries are bundled into a JAR (Java
Archive) file called mnn.jar. Java Runtime Environment is required for the MNN tool to
function.
To start the tool, from the directory that contains the archive, type "java - jar mnn.jar"
from a DOS command line. The main screen will appear.
Multiple Neural Networks
Multiple Neural Networks
Train Neural Networks
Test Existing Neural Networks
Predict Using Existing Neural Networks
Exit
Figure A.1 Main screen of the MNN tool
When the user chooses to train or test neural networks, or to use existing neural
networks for prediction, a window will be open for the user to enter necessary
parameters. Refer to section 3.6.2 for descriptions of these parameters.
101
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX A - RUNNING THE MNN TOOL
The classes of the MNN tool and the necessary libraries are bundled into a JAR (Java
Archive) file called mnn.jar. Java Runtime Environment is required for the MNN tool to
function.
To start the tool, from the directory that contains the archive, type "java - jar mnn.jar"
from a DOS command line. The main screen will appear.
Multiple Neuial Networks nix]
Multiple Neural Networks
Train N eura l N e tw orks
T es t Existing N eura l N e tw orks
Predict U s ing Existing N eura l N e tw orks
Exit
Figure A .l Main screen of the MNN tool
When the user chooses to train or test neural networks, or to use existing neural
networks for prediction, a window will be open for the user to enter necessary
parameters. Refer to section 3.6.2 for descriptions of these parameters.
101
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX B - FORMATS OF PARAMETER AND DATA
FILES FOR THE MNN TOOL
Symbol 0+ indicates the term inside the bracket is repeated one or more times.
Training, testing and validation data file format
Estimated lower bound of data values
Estimated upper bound of data values
A data point
// Note: the data point in this loop must be continuous in time
)+
Prediction input data file format
Estimated lower bound of data values
Estimated upper bound of data values
A input vector, separated by space
)+
102
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX B - FORMATS OF PARAMETER AND DATA
FILES FOR THE MNN TOOL
Symbol ()+ indicates the term inside the bracket is repeated one or more times.
Training, testing and validation data file format
Estimated lower bound of data values
Estimated upper bound of data values
#
(
(
A data point
// Note: the data point in this loop must be continuous in time
}+
#
)+
Prediction input data file format
Estimated lower bound of data values
Estimated upper bound of data values
(
A input vector, separated by space
)+
102
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Training parameter file format
Lead
Size of input vector
Number of neural networks
Minimum number of training cycles
Maximum number of training cycles
Validation interval
Size of validation windows
Using multi-validation or not? (Y/N)
Path of training data file path
Path of validation data file
Path of training output file
(
Load this neural network from file or not? (Y/N)
Path of the neural network file
Train this neural network or not? (Y?N)
Number of hidden neurons
MAPE threshold
Learning rate
Momentum
)+
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Training parameter file format
Lead
Size of input vector
Number of neural networks
Minimum number of training cycles
Maximum number of training cycles
Validation interval
Size of validation windows
Using multi-validation or not? (Y/N)
Path of training data file path
Path of validation data file
Path of training output file
(
#
Load this neural network from file or not? (Y/N)
Path of the neural network file
Train this neural network or not? (Y?N)
Number of hidden neurons
MAPE threshold
Learning rate
Momentum
)+
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Testing parameter file format
Lead
Size of input vector
Number of neural networks
Path of test data file
Path of test output file
Path of neural network file
)+
Prediction parameter file format
Lead
Size of input vector
Number of neural networks
Path of prediction input file
Path of prediction output file
Path of neural network file
)+
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Testing parameter file format
Lead
Size of input vector
Number of neural networks
Path of test data file
Path of test output file
(
Path of neural network file
)+
Prediction parameter file format
Lead
Size of input vector
Number of neural networks
Path of prediction input file
Path of prediction output file
(
Path of neural network file
)+
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX C - SAMPLE DATA
Sample Petroleum Data
Table C.1 Sample of oil production data
Location Production Date Hours on Production Oil Volume
01/11-25-001-15W2/0 196506 336 6941 01/11-25-001-15W2/0 196507 336 4345
01/11-25-001-15W2/0 196508 240 3275
01/11-25-001-15W2/0 196509 216 2868
01/11-25-001-15W2/0 196510 216 2800 01/11-25-001-15W2/0 196511 624 7750 01/11-25-001-15W2/0 196512 528 7992 01/11-25-001-15W2/0 196601 744 10432 01/11-25-001-15W2/0 196602 672 6012
01/11-25-001-15W2/0 196603 624 9533 01/11-25-001-15W2/0 196604 360 5298 01/11-25-001-15W2/0 196605 624 8227
01/11-25-001-15W2/0 196606 384 7973 01/11-25-001-15W2/0 196607 504 6797
01/11-25-001-15W2/0 196608 672 9349 01/11-25-001-15W2/0 196609 600 8211
01/11-25-001-15W2/0 196610 600 7323 01/11-25-001-15W2/0 196611 600 8958 01/11-25-001-15W2/0 196612 744 9284 01/11-25-001-15W2/0 196701 720 9325 01/11-25-001-15W2/0 196702 552 6903 01/11-25-001-15W2/0 196703 216 3145 01/11-25-001-15W2/0 196704 600 8610 01/11-25-001-15W2/0 196705 648 8252 01/11-25-001-15W2/0 196706 648 8464 01/11-25-001-15W2/0 196707 552 6903 01/11-25-001-15W2/0 196708 696 8451 01/11-25-001-15W2/0 196709 456 5516 01/11-25-001-15W2/0 196710 624 7844 01/11-25-001-15W2/0 196711 720 10002 01/11-25-001-15W2/0 196712 648 7434 01/11-25-001-15W2/0 196801 336 7731 01/11-25-001-15W2/0 196802 696 9978 01/11-25-001-15W2/0 196803 384 5058 01/11-25-001-15W2/0 196804 576 8154 01/11-25-001-15W2/0 196805 504 6987 01/11-25-001-15W2/0 196806 624 9075
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
APPENDIX C - SAMPLE DATA
Sample Petroleum Data
Table C .l Sample of oil production data
Location Production Date Hours on Production Oil Volume01/11-25-001-15W2/0 196506 336 694101/11-25-001-15W2/0 196507 336 434501/11-25-001-15W2/0 196508 240 327501/11 -25-001-15W2/0 196509 216 286801/11 -25-001-15W2/0 196510 216 280001/11-25-001-15W2/0 196511 624 775001/11-25-001-15W2/0 196512 528 799201/11-25-001-15W2/0 196601 744 1043201/11-25-001-15W2/0 196602 672 601201/11-25-001-15W2/0 196603 624 953301/11 -25-001-15W2/0 196604 360 529801/11-25-001-15W2/0 196605 624 822701/11-25-001-15W2/0 196606 384 797301/11-25-001-15W2/0 196607 504 679701/11-25-001-15W2/0 196608 672 934901/11-25-001-15W2/0 196609 600 821101/11 -25-001 -15W2/0 196610 600 732301/11-25-001-15W2/0 196611 600 895801/11 -25-001-15W2/0 196612 744 928401/11 -25-001-15W2/0 196701 720 932501/11-25-001-15W2/0 196702 552 690301/11-25-001-15W2/0 196703 216 314501/11-25-001-15W2/0 196704 600 861001/11-25-001-15W2/0 196705 648 825201/11-25-001-15W2/0 196706 648 846401/11 -25-001-15W2/0 196707 552 690301/11-25-001-15W2/0 196708 696 845101/11-25-001-15W2/0 196709 456 551601/11-25-001-15W2/0 196710 624 784401/11 -25-001-15W2/0 196711 720 1000201/11-25-001-15W2/0 196712 648 743401/11 -25-001-15W2/0 196801 336 773101/11-25-001-15W2/0 196802 696 997801/11-25-001-15W2/0 196803 384 505801/11-25-001-15W2/0 196804 576 815401/11 -25-001-15W2/0 196805 504 698701/11 -25-001-15W2/0 196806 624 9075
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01/11-25-001-15W2/0 196807 552 8181 01/11-25-001-15W2/0 196808 480 7623 01/11-25-001-15W2/0 196809 672 9490 01/11-25-001-15W2/0 196810 432 6027 01/11-25-001-15W2/0 196811 576 8675 01/11-25-001-15W2/0 196812 528 7248 01/11-25-001-15W2/0 196901 720 10032 01/11-25-001-15W2/0 196902 600 8305 01/11-25-001-15W2/0 196903 504 6528 01/11-25-001-15W2/0 196904 480 6884 01/11-25-001-15W2/0 196905 528 6882 01/11-25-001-15W2/0 196906 600 8314 01/11-25-001-15W2/0 196907 696 8572
01/11-25-001-15W2/0 196908 576 10763 01/11-25-001-15W2/0 196909 504 5908 01/11-25-001-15W2/0 196910 744 9052 01/11-25-001-15W2/0 196911 456 5473 01/11-25-001-15W2/0 196912 648 7809 01/11-25-001-15W2/0 197001 720 8287 01/11-25-001-15W2/0 197002 672 8228 01/11-25-001-15W2/0 197003 552 6730 01/11-25-001-15W2/0 197004 384 4766 01/11-25-001-15W2/0 197005 624 9280 01/11-25-001-15W2/0 197006 720 10334 01/11-25-001-15W2/0 197007 720 11399 01/11-25-001-15W2/0 197008 672 7539 01/11-25-001-15W2/0 197009 720 11600 01/11-25-001-15W2/0 197010 744 10995 01/11-25-001-15W2/0 197011 720 10892 01/11-25-001-15W2/0 197012 744 17687 01/11-25-001-15W2/0 197101 744 16752 01/11-25-001-15W2/0 197102 672 14219 01/11-25-001-15W2/0 197103 744 16249 01/11-25-001-15W2/0 197104 720 14084 01/11-25-001-15W2/0 197105 744 14787 01/11-25-001-15W2/0 197106 720 13142 01/11-25-001-15W2/0 197107 744 13836 01/11-25-001-15W2/0 197108 720 15066 01/11-25-001-15W2/0 197109 720 16523 01/11-25-001-15W2/0 197110 744 19001 01/11-25-001-15W2/0 197111 576 15235 01/11-25-001-15W2/0 197112 624 14605 01/11-25-001-15W2/0 197201 744 16867 01/11-25-001-15W2/0 197202 696 15600 01/11-25-001-15W2/0 197203 600 14078 01/11-25-001-15W2/0 197204 720 13650
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01/11-25-001-15W2/0 196807 552 818101/11-25-001-15W2/0 196808 480 762301/11-25-001-15W2/0 196809 672 949001/11 -25-001-15W2/0 196810 432 602701/11-25-001-15W2/0 196811 576 867501/11-25-001-15W2/0 196812 528 724801/11-25-001-15W2/0 196901 720 1003201/11-25-001-15W2/0 196902 600 830501/11-25-001-15W2/0 196903 504 652801/11 -25-001-15W2/0 196904 480 688401/11 -25-001-15W2/0 196905 528 688201/11-25-001-15W2/0 196906 600 831401/11-25-001-15W2/0 196907 696 857201/11-25-001-15W2/0 196908 576 1076301/11-25-001-15W2/0 196909 504 590801/11 -25-001-15W2/0 196910 744 905201/11-25-001-15W2/0 196911 456 547301/11-25-001-15W2/0 196912 648 780901/11-25-001-15W2/0 197001 720 828701/11 -25-001-15W2/0 197002 672 822801/11-25-001-15W2/0 197003 552 673001/11-25-001-15W2/0 197004 384 476601/11-25-001-15W2/0 197005 624 928001/11-25-001-15W2/0 197006 720 1033401/11 -25-001-15W2/0 197007 720 1139901/11-25-001-15W2/0 197008 672 753901/11-25-001-15W2/0 197009 720 1160001/11 -25-001-15W2/0 197010 744 1099501/11-25-001-15W2/0 197011 720 1089201/11-25-001-15W2/0 197012 744 1768701/11-25-001-15W2/0 197101 744 1675201/11-25-001-15W2/0 197102 672 1421901/11 -25-001-15W2/0 197103 744 1624901/11-25-001-15W2/0 197104 720 1408401/11-25-001-15W2/0 197105 744 1478701/11 -25-001-15W2/0 197106 720 1314201/11-25-001-15W2/0 197107 744 1383601/11-25-001-15W2/0 197108 720 1506601/11-25-001-15W2/0 197109 720 1652301/11-25-001-15W2/0 197110 744 1900101/11-25-001-15W2/0 197111 576 1523501/11-25-001-15W2/0 197112 624 1460501/11-25-001-15W2/0 197201 744 1686701/11-25-001-15W2/0 197202 696 1560001/11-25-001-15W2/0 197203 600 1407801/11-25-001-15W2/0 197204 720 13650
1 0 6
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01/11-25-001-15W2/0 197205 672 11020 01/11-25-001-15W2/0 197206 360 15023 01/11-25-001-15W2/0 197207 744 24237 01/11-25-001-15W2/0 197208 600 15362 01 /1 1-25-001-15W2/0 197209 648 14057 01/11-25-001-15W2/0 197210 648 12662 01/11-25-001-15W2/0 197211 672 12527 01/11-25-001-15W2/0 197212 744 12231 01/11-25-001-15W2/0 197301 744 13242 01/11-25-001-15W2/0 197302 600 9277 01/11-25-001-15W2/0 197303 744 11443 01/11-25-001-15W2/0 197304 672 9651 01/11-25-001-15W2/0 197305 504 6266 01/11-25-001-15W2/0 197306 528 6758 01/11-25-001-15W2/0 197307 480 6242 01/11-25-001-15W2/0 197308 216 2926 01/11-25-001-15W2/0 197309 384 4578
Table C.2 Sample of raw core analysis data
Location Sample # Formulations Horizontal
Permeability (mD)
Vertical Permeability
(mD) Porosity
21/07-02-001-16W2/0 2 RATCLIFF 0.06 0.08 0.034 21/07-02-001-16W2/0 3 RATCLIFF 0.05 0 0.025 21/07-02-001-16W2/0 4 RATCLIFF 0.21 0.08 0.074 21/07-02-001-16W2/0 5 RATCLIFF 0.41 0.8 0.082 21/07-02-001-16W2/0 6 RATCLIFF 0.09 0 0.042 21/07-02-001-16W2/0 7 RATCLIFF 1.2 5.2 0.101 21/07-02-001-16W2/0 8 RATCLIFF 0.91 0.48 0.156 21/07-02-001-16W2/0 9 RATCLIFF 0.13 0.06 0.065 21/07-02-001-16W2/0 10 RATCLIFF 0.18 0.15 0.066 21/07-02-001-16W2/0 11 RATCLIFF 0 0 0 21/07-02-001-16W2/0 12 RATCLIFF 0.43 0.52 0.13 21/07-02-001-16W2/0 13 RATCLIFF 3 0.47 0.191 21/07-02-001-16W2/0 14 RATCLIFF 3.6 1.7 0.182 21/07-02-001-16W2/0 15 RATCLIFF 1.1 0.28 0.198 21/07-02-001-16W2/0 16 RATCLIFF 3.6 1.6 0.152 21/07-02-001-16W2/0 17 RATCLIFF 2.6 0.65 0.126 21/07-02-001-16W2/0 18 RATCLIFF 0.8 0.5 0.117 21/07-02-001-16W2/0 19 RATCLIFF 0.32 0.08 0.137 21/07-02-001-16W2/0 20 RATCLIFF 0.87 0.37 0.115 21/07-02-001-16W2/0 21 RATCLIFF 0.36 0.23 0.074
107
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
01/11-25-001-15W2/0 197205 672 1102001/11-25-001-15W2/0 197206 360 1502301/11-25-001-15W2/0 197207 744 2423701/11-25-001-15W2/0 197208 600 1536201/11 -25-001-15W2/0 197209 648 1405701/11-25-001-15W2/0 197210 648 1266201/11-25-001-15W2/0 197211 672 1252701/11-25-001-15W2/0 197212 744 1223101/11-25-001-15W2/0 197301 744 1324201/11-25-001-15W2/0 197302 600 927701/11-25-001-15W2/0 197303 744 1144301/11-25-001-15W2/0 197304 672 965101/11-25-001-15W2/0 197305 504 626601/11-25-001-15W2/0 197306 528 675801/11-25-001-15W2/0 197307 480 624201/11-25-001-15W2/0 197308 216 292601/11-25-001-15W2/0 197309 384 4578
Table C.2 Sample of raw core analysis data
Location Sample # FormulationsHorizontal
Permeability(mD)
VerticalPermeability
(mD)Porosity
21/07-02-001-16W2/0 2 RATCLIFF 0.06 0.08 0.03421 /07-02-001 -16W 2/0 3 RATCLIFF 0.05 0 0.02521/07-02-001-16W2/0 4 RATCLIFF 0.21 0.08 0.07421/07-02-001-16W 2/0 5 RATCLIFF 0.41 0.8 0.08221/07-02-001-16W2/0 6 RATCLIFF 0.09 0 0.04221/07-02-001-16W2/0 7 RATCLIFF 1.2 5.2 0.10121/07-02-001-16W 2/0 8 RATCLIFF 0.91 0.48 0.15621/07-02-001-16W2/0 9 RATCLIFF 0.13 0.06 0.06521/07-02-001-16W2/0 10 RATCLIFF 0.18 0.15 0.06621 /07-02-001 -16W2/0 11 RATCLIFF 0 0 021/07-02-001-16W2/0 12 RATCLIFF 0.43 0.52 0.1321/07-02-001-16W2/0 13 RATCLIFF 3 0.47 0.19121/07-02-001-16W2/0 14 RATCLIFF 3.6 1.7 0.18221/07-02-001-16W2/0 15 RATCLIFF 1.1 0.28 0.19821/07-02-001-16W2/0 16 RATCLIFF 3.6 1.6 0.15221 /07-02-001-16W 2/0 17 RATCLIFF 2.6 0.65 0.12621/07-02-001-16W2/0 18 RATCLIFF 0.8 0.5 0.11721/07-02-001-16W2/0 19 RATCLIFF 0.32 0.08 0.13721 /07-02-001 -16W2/0 20 RATCLIFF 0.87 0.37 0.11521/07-02-001-16W2/0 21 RATCLIFF 0.36 0.23 0.074
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table C.3 Sample of pressure data
Location First Shut-in Pressure 21/05-02-001-16W2/0 19512 21/05-03-001-16W2/0 19209 21/07-03-001-16W2/0 19691 01/10-03-001-16W2/0 19443 01/12-03-001-16W2/0 19167 01/02-04-001-16W2/0 18478 01/02-04-001-16W2/0 18768 01/02-04-001-16W2/0 19864 01/04-04-001-16W2/0 19671 01/04-04-001-16W2/0 20009 01/10-04-001-16W2/0 18554 01/12-04-001-16W2/0 18947 01/06-05-001-16W2/0 10004 01/06-05-001-16W2/0 9818 01/08-05-001-16W2/0 19753 01/08-05-001-16W2/0 19836 01/16-05-001-16W2/0 20043 01/10-08-001-16W2/0 20202 01/10-08-001-16W2/0 20250 01/02-09-001-16W2/0 19781
Gas Consumption Data
Table C.4 Sample of flow rate data at Melfort station
Date Time Melfort Flow 12/3/01 7:02:48 313.498 12/3/01 8:02:48 302.869 12/3/01 9:02:48 299.551 12/3/01 10:02:48 298.992 12/3/01 11:02:48 298.433 12/3/01 12:02:48 296.643 12/3/01 13:02:48 294.375 12/3/01 14:02:48 292.086 12/3/01 15:02:48 289.27 12/3/01 16:02:48 287.285 12/3/01 17:02:48 286.631 12/3/01 18:02:48 285.179 12/3/01 19:02:48 283.278 12/3/01 20:02:48 281.662 12/3/01 21:02:48 280.284 12/3/01 22:02:48 279.989 12/3/01 23:02:48 280.502 12/4/01 0:02:48 279.909
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table C.3 Sample of pressure data
Location First Shut-in Pressure21/05-02-001-16W2/0 1951221 /05-03-001 -16W2/0 1920921 /07-03-001 -16W2/0 1969101/10-03-001-16W2/0 1944301/12-03-001-16W2/0 1916701/02-04-001-16W 2/0 1847801/02-04-001-16W2/0 1876801 /02-04-001 -16W2/0 1986401/04-04-001-16W2/0 1967101 /04-04-001 -16W2/0 2000901/10-04-001-16W2/0 1855401/12-04-001-16W2/0 1894701/06-05-001-16W2/0 1000401 /06-05-001 -16W2/0 981801/08-05-001-16W2/0 1975301 /08-05-001 -16W2/0 1983601/16-05-001-16W2/0 2004301/10-08-001-16W2/0 2020201/10-08-001-16W2/0 2025001/02-09-001-16W2/0 19781
Gas Consumption Data
Table C.4 Sample of flow rate data at Melfort station
Date Time Melfort Flow12/3/01 7:02:48 313.49812/3/01 8:02:48 302.86912/3/01 9:02:48 299.55112/3/01 10:02:48 298.99212/3/01 11:02:48 298.43312/3/01 12:02:48 296.64312/3/01 13:02:48 294.37512/3/01 14:02:48 292.08612/3/01 15:02:48 289.2712/3/01 16:02:48 287.28512/3/01 17:02:48 286.63112/3/01 18:02:48 285.17912/3/01 19:02:48 283.27812/3/01 20:02:48 281.66212/3/01 21:02:48 280.28412/3/01 22:02:48 279.98912/3/01 23:02:48 280.50212/4/01 0:02:48 279.909
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
12/4/01 1:02:48 278.911 12/4/01 2:02:48 277.122 12/4/01 3:02:48 275.748 12/4/01 4:02:48 274.825 12/4/01 5:02:48 274.501 12/4/01 6:02:48 272.838 12/4/01 7:02:48 267.886 12/4/01 8:02:48 259.44 12/4/01 9:02:48 258.686 12/4/01 10:02:48 257.922 12/4/01 11:02:48 259.766 12/4/01 12:02:48 260.665 12/4/01 13:02:48 263.435 12/4/01 14:02:48 264.775 12/4/01 15:02:48 265.253 12/4/01 16:02:48 264.418 12/4/01 17:02:48 260.613 12/4/01 18:02:48 258.936 12/4/01 19:02:48 259.931 12/4/01 20:02:48 259.742 12/4/01 21:02:48 262.267 12/4/01 22:02:48 269.248 12/4/01 23:02:48 277.601 12/5/01 0:02:48 396.097 12/5/01 1:02:48 439.517 12/5/01 2:02:48 422.462 12/5/01 3:02:48 417.804 12/5/01 4:02:48 414.955 12/5/01 5:02:48 412.88 12/5/01 6:02:48 409.965 12/5/01 7:02:48 406.003 12/5/01 8:02:48 400.458 12/5/01 9:02:48 397.828 12/5/01 10:02:48 399.274 12/5/01 11:02:48 400.686 12/5/01 12:02:48 403.374 12/5/01 13:02:48 252.343 12/5/01 14:02:48 132.44 12/5/01 15:02:48 233.556 12/5/01 16:02:48 259.07 12/5/01 17:02:48 267.627 12/5/01 18:02:48 274.7 12/5/01 19:02:48 278.981 12/5/01 20:02:48 277.827 12/5/01 21:02:48 276.722 12/5/01 22:02:48 276.011 12/5/01 23:02:48 277.28 12/6/01 0:02:48 280.312 12/6/01 1:02:48 283.357
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
12/4/01 1:02:48 278.91112/4/01 2:02:48 277.12212/4/01 3:02:48 275.74812/4/01 4:02:48 274.82512/4/01 5:02:48 274.50112/4/01 6:02:48 272.83812/4/01 7:02:48 267.88612/4/01 8:02:48 259.4412/4/01 9:02:48 258.68612/4/01 10:02:48 257.92212/4/01 11:02:48 259.76612/4/01 12:02:48 260.66512/4/01 13:02:48 263.43512/4/01 14:02:48 264.77512/4/01 15:02:48 265.25312/4/01 16:02:48 264.41812/4/01 17:02:48 260.61312/4/01 18:02:48 258.93612/4/01 19:02:48 259.93112/4/01 20:02:48 259.74212/4/01 21 :02:48 262.26712/4/01 22:02:48 269.24812/4/01 23:02:48 277.60112/5/01 0:02:48 396.09712/5/01 1:02:48 439.51712/5/01 2:02:48 422.46212/5/01 3:02:48 417.80412/5/01 4:02:48 414.95512/5/01 5:02:48 412.8812/5/01 6:02:48 409.96512/5/01 7:02:48 406.00312/5/01 8:02:48 400.45812/5/01 9:02:48 397.82812/5/01 10:02:48 399.27412/5/01 11:02:48 400.68612/5/01 12:02:48 403.37412/5/01 13:02:48 252.34312/5/01 14:02:48 132.4412/5/01 15:02:48 233.55612/5/01 16:02:48 259.0712/5/01 17:02:48 267.62712/5/01 18:02:48 274.712/5/01 19:02:48 278.98112/5/01 20:02:48 277.82712/5/01 21 :02:48 276.72212/5/01 22:02:48 276.01112/5/01 23:02:48 277.2812/6/01 0:02:48 280.31212/6/01 1:02:48 283.357
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
12/6/01 2:02:48 284.738 12/6/01 3:02:48 286.074 12/6/01 4:02:48 286.101 12/6/01 5:02:48 285.916 12/6/01 6:02:48 285.767 12/6/01 7:02:48 281.365 12/6/01 8:02:48 275.539 12/6/01 9:02:48 274.457 12/6/01 10:02:48 273.49 12/6/01 11:02:48 273.429 12/6/01 12:02:48 280.909 12/6/01 13:02:48 285.983 12/6/01 14:02:48 289.991 12/6/01 15:02:48 289.406 12/6/01 16:02:48 288.704 12/6/01 17:02:48 287.357 12/6/01 18:02:48 289.722 12/6/01 19:02:48 292.257 12/6/01 20:02:48 293.333 12/6/01 21:02:48 294.406 12/6/01 22:02:48 295.003 12/6/01 23:02:48 371.902 12/7/01 0:02:48 473.774 12/7/01 1:02:48 440.975 12/7/01 2:02:48 421.134 12/7/01 3:02:48 404.733 12/7/01 4:02:48 392.364 12/7/01 5:02:48 381.867 12/7/01 6:02:48 371.74 12/7/01 7:02:48 360.038 12/7/01 8:02:48 340.814 12/7/01 9:02:48 0 12/7/01 10:02:48 66.8873
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
12/6/01 2:02:48 284.73812/6/01 3:02:48 286.07412/6/01 4:02:48 286.10112/6/01 5:02:48 285.91612/6/01 6:02:48 285.76712/6/01 7:02:48 281.36512/6/01 8:02:48 275.53912/6/01 9:02:48 274.45712/6/01 10:02:48 273.4912/6/01 11:02:48 273.42912/6/01 12:02:48 280.90912/6/01 13:02:48 285.98312/6/01 14:02:48 289.99112/6/01 15:02:48 289.40612/6/01 16:02:48 288.70412/6/01 17:02:48 287.35712/6/01 18:02:48 289.72212/6/01 19:02:48 292.25712/6/01 20:02:48 293.33312/6/01 21:02:48 294.40612/6/01 22:02:48 295.00312/6/01 23:02:48 371.90212/7/01 0:02:48 473.77412/7/01 1:02:48 440.97512/7/01 2:02:48 421.13412/7/01 3:02:48 404.73312/7/01 4:02:48 392.36412/7/01 5:02:48 381.86712/7/01 6:02:48 371.7412/7/01 7:02:48 360.03812/7/01 8:02:48 340.81412/7/01 9:02:48 012/7/01 10:02:48 66.8873
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.