Project Report

B.ENG. IN ELECTRONIC ENGINEERING

PROJECT REPORT

Development of Neural Networks

For

System Identification

Siobhan Murphy 98276506

ii

Acknowledgements

I would like to thank Ms Jennifer Bruton for her time and invaluable guidance during this project. I would also like to thank my friends and family for their ordinary laughter and most especially my mom for “hanging in there” with me.

Declaration I hereby declare that, except where otherwise indicated, this document is entirely my own work and has not been submitted in whole or in part to any other university.

Signed: ...................................................................... Date: ...............................

iii

Abstract This project outlines the development of a neural network model for system identification. It traces the growth of neural networks from their humble beginnings as single-layer perceptrons to neural network models. Both multi-layer and recurrent networks models are examined and their merits as system identifiers discussed. The system chosen as a basis for the empirical data collection is the anti-lock brake system, which exhibits highly non-linear behaviour and lends itself to neural network modelling for system identification purposes. The backpropagation algorithm is used in the development of the neural network. Until recently, backpropagation neural networks made up 80% of all neural network applications [1]. The use of backpropagation has declined due to the relatively long required training times for the iterative algorithm. Genetic algorithms are discussed as a possible alternative.

iv

Table of Contents

Acknowledgements..................................................................................................................ii

Declaration...............................................................................................................................ii

Abstract ...................................................................................................................................iii

Table of Contents....................................................................................................................iv

Table of Figures ......................................................................................................................vi

Introduction..............................................................................................................................1

1.1 Artificial Neural Networks ........................................................................................1

1.1.1 Background.........................................................................................................1

1.1.2 How the Human Brain Learns ............................................................................2

1.2 Artificial Neuron and Activation Function................................................................3

1.2.1 Linear Activation Function.................................................................................3

1.2.2 Non - Linear Activation Functions .....................................................................4

1.2.3 Neural Network Matlab Toolbox........................................................................6

1.3 Summary ....................................................................................................................6

The Perceptron.........................................................................................................................7

2.1 Implementing a Single Layer Perceptron in Matlab..................................................8

2.1.1 Single Layer Perceptron Designed without Neural Network Toolbox...............8

2.1.2 Designing and Training using the Neural Network Toolbox .............................9

2.2 Multi-Layer Perceptron............................................................................................10

2.2.1 Implementing a Multi-Layer Perceptron in Matlab - XOR Classification .......12

2.3 Summary ..................................................................................................................16

Anti-Lock Braking System ....................................................................................................17

4.1 ABS Model ..............................................................................................................17

4.1.1 Basic Steps of System Identification ................................................................17

4.1.2 The Simulink Model .........................................................................................18

4.1.3 Pseudo Random Binary Sequence Input...........................................................19

4.2 Data ..........................................................................................................................19

4.2.1 Data Collection .................................................................................................19

v

4.2.2 Loading Data.....................................................................................................20

4.3 Summary ..................................................................................................................21

Building Neural Network – the design detail ........................................................................22

5.1 Design Detail ...........................................................................................................22

5.1.1 Pre & Post processing .......................................................................................22

5.1.2 Neural Network Model Structure .....................................................................23

5.1.3 Types of Neural Networks ................................................................................24

5.1.4 Training Algorithms – multi-layer network results ..........................................26

5.2 Test Procedure for Multi-Layer Neural Network ....................................................31

5.2.1 Over-fitting .......................................................................................................31

5.2.2 Post Training Analysis......................................................................................32

5.3 Summary ..................................................................................................................35

Recurrent Neural Networks ...................................................................................................36

6.1 Structure of Recurrent Neural Network – design detail ..........................................36

6.2 The Elman Structure ................................................................................................36

6.2.1 Building the structure........................................................................................36

6.2.2 Results...............................................................................................................38

6.3 Overall Analysis of Networks..................................................................................41

6.3.1 Comparison .......................................................................................................41

6.3.2 Conclusion and Future Directions ....................................................................42

References..............................................................................................................................44

Appendix 1.............................................................................................................................45

Appendix 2.............................................................................................................................46

Appendix 3.............................................................................................................................47

vi

Table of Figures FIGURE 1 COMPONENTS OF BIOLOGICAL NEURON [2] ................................................................................2 FIGURE 2 COMPONENTS OF THE SYNAPSE [2].............................................................................................3 FIGURE 3 LINEAR ACTIVATION FUNCTION, EQUATION 1.............................................................................4 FIGURE 4 LOG SIGMOID ACTIVATION FUNCTION, EQUATION 2 ...................................................................5 FIGURE 5 TAN-SIGMOID ACTIVATION FUNCTION, EQUATION 3...................................................................5 FIGURE 6 SINGLE LAYER PERCEPTRON ARCHITECTURE [6] .......................................................................7 FIGURE 7 INPUT VECTORS OF THE SLP PLOTTED........................................................................................9 FIGURE 8 CLASSIFICATION PLOT WITH NEW INPUT CORRECTLY PLOTTED IN RED. ....................................10 FIGURE 9 MULTI-LAYER PERCEPTRON ARCHITECTURE [6].......................................................................11 FIGURE 10 OUTPUT PLOT OF XOR INPUTS INTO SLP, WHICH WAS UNABLE TO PERFORM CLASSIFICATION12 FIGURE 11 MODEL OF MLP THAT SOLVES EXOR CLASSIFICATION DIFFICULTIES ...................................13 TABLE 1 THE XOR TRUTH TABLE............................................................................................................13 TABLE 2 TRUTH TABLE FOR THE NEURON WITH STRONG NEGATIVITY N1 AND THE NEURON WITH STRONG

POSITIVITY N2 [5]. .........................................................................................................................13 FIGURE 12 OVERALL CLASSIFICATION OF XOR PROBLEM [5]..................................................................14 FIGURE 13 TRAINING OF THE XOR NETWORK, MEAN SQUARE ERROR PLOT OVER 70 EPOCHS [5] ............15 FIGURE 14 TRAINING OF XOR NETWORK WITH PERFORMANCE GOAL MET, MEAN SQUARE ERROR PLOT UNTIL

CONVERGENCE [5]..........................................................................................................................15 FIGURE 15 ABS MODEL WITH PSEUDO RANDOM BINARY SEQUENCE INPUT.............................................18 FIGURE 16 PSEUDO RANDOM BINARY SEQUENCE.....................................................................................19 FIGURE 17 A VISUAL REPRESENTATION OF INPUT AND OUTPUT DATA......................................................20 FIGURE 18 PARALLEL IDENTIFICATION MODEL [3]...................................................................................23 FIGURE 19 SERIES-PARALLEL IDENTIFICATION MODEL [3].......................................................................24 FIGURE 20 TRAINING PLOT OF TRAINGD..................................................................................................26 FIGURE 21 TRAINGDM PLOT WITH MOMENTUM CONSTANT OF 0.9 ...........................................................27 FIGURE 22 TRAINGDM WITH MU=0, TRAINING PLOT SIMILAR TO TRAINGD PLOT AS WEIGHT CHANGE BASED ON

GRADIENT ......................................................................................................................................28 FIGURE 23 TRAINING PLOT OF TRAINGDA................................................................................................29 FIGURE 24 VARIABLE LEARNING RATE PLOTTED AGAINST EACH EPOCH ITERATION ................................29 FIGURE 25 TRAINLM PERFORMANCE TRAINING PLOT...............................................................................30 FIGURE 26 TRAINING, TESTING AND VALIDATION DATA PLOT USING TRAINLM, HIGHLIGHTING OVER-FITTING 31 FIGURE 27 TRAINING, TESTING AND VALIDATION DATA PLOT USING TRAINGDM, HIGHLIGHTING OVER-FITTING 32 FIGURE 28 POST TRAINING ANALYSIS PLOT FOR TRAINLM ALGORITHM. ..................................................33 FIGURE 29 POST TRAINING ANALYSIS PLOT FOR TRAINDGM ALGORITHM. ...............................................34 FIGURE 31 POOR PERFORMANCE OF RECURRENT NEURAL NETWORK WITH TRAINLM ALGORITHM ..........38 FIGURE 32 TRAINGDX ALGORITHM PERFORMANCE PLOT .........................................................................39 FIGURE 33 DETERIORATED PERFORMANCE OF RECURRENT NEURAL NETWORK .......................................40 FIGURE 34 SUM SQUARE ERROR PLOT OF RECURRENT NETWORK WITH PRE AND POST PROCESSING IMPLEMENTED

.......................................................................................................................................................41 FIGURE 35 THE BASIC CONCEPTS BEHIND GENETIC ALGORITHMS [7] .......................................................43

1

Chapter 1

Introduction System identification using both conventional and neural network systems is the

development of a mathematical model of a dynamic system based on empirical data.

Choice of identifier structure is based on well-established results in linear systems theory

and can be applied in the development of non-linear neural networks identifiers with great

success. This is the basis of neural network system identification solutions and the

technique applied in the anti-lock braking system identification model. Before neural

network system identification and its merits is examined, artificial neural networks and their

concepts will be described. The background and concepts behind artificial neural networks

is discussed along with the development of these simple structures into more complex

recurrent neural networks.

1.1 Artificial Neural Networks

1.1.1 Background Artificial Neural Networks (ANN) can be likened to collections of identical mathematical

models that emulate some of the observed properties of biological nervous systems and

draw on the analogies of adaptive biological learning. The key element of an Artificial

Neural Network is its structure. It is composed of a number of interconnected processing

elements tied together with weighted connections, which take inspiration from biological

neurons. Learning like in a biological system takes place through training, or exposure to a

set of input and output data where the training algorithm adjusts the weights iteratively.

Artificial Neural Networks are good pattern recognition engines and robust classifiers, with

the ability to make decisions about imprecise input data. This ability makes them extremely

useful as a medical analysis tool. There is no need to provide a specific algorithm on how

to identify the disease when using a neural network. Neural networks learn by example so

the details of how to recognise the disease are not needed. What is needed is a set of

examples that is representative of all the variations of the disease. The quality of examples

is not as important as the 'quantity'. Artificial Neural Networks for this reason are used

2

extensively for system modelling where the physical processes are not understood fully or

are highly complex.

1.1.2 How the Human Brain Learns Artificial Neural Networks success at system modelling for highly complex physical

processes can be attributed to the original architecture on which they are based, the human

brain. At present, brain function is not fully understood. A brain neuron collects signals

from other neurons of the Central Nervous System (CNS), through structures called

dendrites (Figure1). The neuron sends out spikes of electrical activity through a long thin

strand called an axon. This axon splits into thousands of branches. At the end of a branch,

a structure called a synapse converts the activity from the axon into electrical effects (Figure

2). They may excite or inhibit activity in the connected neurons. When a neuron receives

an excitatory input that is sufficiently large compared with its inhibitory input, it sends a

spike of electrical activity down its axon. Learning occurs by changing the effectiveness of

the synapses so that the influence of one neuron on another changes. [2]

Figure 1 Components of biological neuron [2]

The structure of the human brain neuron is the template for artificial learning. However,

lack of knowledge leads to approximations and assumptions of the general architecture of

an artificial neural network. The knowledge of neurons is incomplete and computing power

is limited so models are often idealisations of real networks of neurons.

3

Figure 2 Components of the synapse [2]

1.2 Artificial Neuron and Activation Function The artificial neuron like the biological neuron described in figures 1 & 2 is a processing

element. An output for this artificial neuron is calculated by multiplying its inputs by a

weight vector. The results are then added together and an activation function is applied to

the sum. The activation function is a function used to transform the activation level of a

unit or rather a neuron into an output signal. Typically, activation functions have a

“squashing” effect; they contain the output within a range.

1.2.1 Linear Activation Function There are many activation functions that can be applied to neural networks; three main

activation functions are dealt with in this project. [3]

The first is the linear transform function, or purelin function. It is defined as follows

f(x)=x equation 1

Neurons of this type are used as linear approximators.

4

Figure 3 Linear activation function, equation 1

1.2.2 Non - Linear Activation Functions

There are several types of non-linear activation functions; the two most common are the

log-sigmoid transfer function and the tan-sigmoid transfer function. Plots of these

differentiable, non-linear activation functions are illustrated in figures 4 & 5. They are

commonly used in networks trained with backpropagation. The networks referred to in this

project are generally backpropagation models and they mainly use log-sig and tan-sig

activation functions. The logistic activation function; it is defined by the equation

Logsig(x)=1/(1+exp(-βx)) equation 2

β=1 though it can be changed which in turn changes the shape of the sigmoid. As β tends

toward infinity it behaves more and more like a hard-limiter where the slope of the sigmoid

is zero. In this case where the slope is not zero, the output range is contained between 0 and

1.

5

Figure 4 Log sigmoid activation function, equation 2

Figure 5 Tan-sigmoid activation function, equation 3

Tansig(x) is equivalent to tanh(x), and is defined as

f(x)=tanh(x)= xx

xx

eeee

−

−

+− equation 4

Tansig(x) runs faster than tanh(x) so it is a good choice when speed is an important factor.

6

1.2.3 Neural Network Matlab Toolbox These activation functions were built using Matlab. MATLAB, which stands for matrix

laboratory is an interactive system, which was originally written as software for matrix

computation. It has evolved into a testing and analysis research tool used in engineering,

mathematics and science. Matlab toolboxes are collections of functions used to solve

particular classes of problems. In this project the Matlab Neural Network Toolbox is used

to build, train and test system identification neural network models.

1.3 Summary

Artificial neural networks use the CNS of living creatures as a basis for system architecture.

This architecture is used as the basis for artificial structures called artificial neural networks.

This development of an artificial neural network requires an activation function, which is

either linear or non-linear. This function changes the activation level of a unit into an

output signal. This activation function must be applied to all neural networks including the

single layer perceptron.

7

Chapter 2

The Perceptron A single layer perceptron (SLP) is the simplest form of artificial neural network that can be

built. This chapter discusses the single layer perceptron in detail. It consists of one or more

artificial neurons in parallel. Each neuron in the single layer provides one network output,

and is usually connected to all of the external inputs (Figure 6). The diagram below

illustrates a very simple neural network; it consists of a single neuron in the output layer.

Figure 6 Single layer Perceptron Architecture [6]

There are n neurons in the input layer; each circle represents a neuron. The total input

stimuli to the neuron in the output layer is

nni

n

iiin wxwxwxwxwxz ...221100

0+++==∑

=

equation 4

y= Output of the neuron = f(zin) The input 0x is a special input, referred to as the bias

input. Its value is normally fixed at +1. Its associated weight 0w is referred to as the bias

weight.

8

2.1 Implementing a Single Layer Perceptron in Matlab

2.1.1 Single Layer Perceptron Designed without Neural Network Toolbox. A single layer perceptron can be built in Matlab without the use of the neural network

toolbox [Appendix 1]. This approach to building a single layer perceptron encourages a

greater understanding of the concepts relating to neural networks. The single layer

perceptron, implements a form of supervised learning. Supervised neural networks are

trained to produce desired outputs when specific inputs are used in the system. Supervised

neural networks are particularly well suited for modelling and controlling dynamic systems,

classifying noisy data, and predicting future events. In this case, building without the

toolbox creates a less powerful but functioning SLP. When designing the SLP structure the

weights are assigned small random values, input and target output patterns are also applied.

The output of the perceptron is calculated from the equation

))()(()( kxkwfky T= equation 5

inputsxweightswoutputy

===

The weights are adapted using the error until ∆w=0:

The update of the weights is as follows

)()()()1( kxkekwkw µ+=+ equation 6

)()()( kykke −Γ= equation 7

ettvaluefixed

errore

arg=Γ==

µ

A hard limit activation function is used to calculate the y(k). This activation function is a

threshold activation function; it is implemented in Matlab code using the sign function. The

activation function limits the output between one and minus one. When an element is fed

through this function it returns a one if the element is greater than zero; or zero if it equals

zero; and minus one if it is less than zero. A zero output will never be produced if the target

is never set to zero. This network is in effect a binary output perceptron. It can only

classify input patterns that are linearly separable. Frank Rosenblatt first developed this

perceptron architecture in 1958 [3].

9

2.1.2 Designing and Training using the Neural Network Toolbox Although it has been shown that a neural network can be implemented without the use of

the neural network toolbox this network is limited in its applications. Building a single

layer neural network is most successfully done with the toolbox using the function newp()

[Appendix 2]. This function has a default hard limit activation function. Using newp()

inputs can be classified according to Boolean AND logic (Figure 7). Inputs are fed into the

newly created neural network and targets are applied – these targets are based on the outputs

of an AND gate. The network is first trained with the inputs and classification does not take

place. This model is based on a demonstration model in the toolbox itself called demop1.

Figure 7 Input vectors of the SLP plotted.

The network trains so that it behaves like an AND gate. The outputs are linearly separable

so the network can classify them as a one or a zero like binary logic. A classification line is

drawn across the linear plane, which is shown in blue in figure 8. If a new input is applied

the newly trained network is simulated and classification of this new point occurs. In this

case the new input is [0.7; 1.2], it is correctly classified as a one and shown in red on the

right side of the classification line.

10

Figure 8 Classification plot with new input correctly plotted in red.

After one training cycle of the network the correct classification is not always achieved. It

can take several training cycles or epochs to modify the weights until the correct

classification of the problem is achieved. This design structure is the basis for all artificial

neural networks. The SLP leads to the creation of multi-layer perceptrons, which are

structures of multiple single layer perceptrons.

2.2 Multi-Layer Perceptron

A multi-layer perceptron builds on the architecture of the single layer perceptron. The

single layer perceptron is not very useful because of its limited mapping ability; it is only

really applicable to linearly separable inputs. It will fail if the inputs are not linearly

separable. The SLP however, can be used as a building block for larger, much more

practical structures. Using multi-layer architectures, non-binary activation functions and

more complex training algorithms mean the limitations of a simple perceptron may be

overcome. A typical multi-layer perceptron (MLP) network consists of a set of source

nodes forming the input layer, one or more hidden layers of computation nodes, and an

output layer of nodes illustrated in figure 9. The input signal propagates through the

11

network layer by layer. The computations performed by this feed forward network with a

single hidden layer, non-linear activation functions and a linear output layer, can be written

mathematically as

baAsBsfx ++== )()( ϕ equation 8

• s = inputs

• x = outputs

• A = matrix of weights of the first layer

• a = bias vector of the first layer

• B= weight matrix of second layer

• b=bias vector of second layer

• ϕ= non-linearity function.

Figure 9 Multi-layer perceptron architecture [6].

It has been proven that this architecture can approximate any continuous function to any

degree of accuracy of a compact set. The multi-layer perceptron has been termed the

universal approximator. However, it is never known exactly how many hidden layers of

12

neurons will ensure optimum network convergence and if the weight matrix that

corresponds to that error goal can be found. These solutions are unique to each neural

network and the input and output data applied [4]. To begin with the MLP architecture is

applied to the EXOR problem. Historically, it was this problem that first exhibited the

limitations of the SLP and also led to the development of more complex multi-layer

perceptrons. Minsky and Papert (1969) believed that in their, “… intuitive judgement the

extension (to multi-layer systems would be) sterile”. This opinion was based on the

inability of the SLP to classify the EXOR problem and other such linearly non-separable

problems [6].

2.2.1 Implementing a Multi-Layer Perceptron in Matlab - XOR Classification The opinion of Minsky and Papert has since been discarded and the XOR problem solved

using multi-layer perceptrons. The XOR problem is linearly non-separable so when it is

applied to the single layer perceptron no classification line can be plotted because the linear

plane cannot be divided shown in figure 10 below.

Figure 10 Output plot of XOR inputs into SLP, which was unable to perform classification

A new MLP must be built using the newff() function [5]. This creates a new network

function, which has an input layer, a hidden layer and an output layer (Figure 11)

X1

1

3 y

13

Figure 11 Model of MLP that solves EXOR classification difficulties

The essence of this problem is to build a perceptron network that takes two Boolean inputs

and outputs the XOR of them. The XOR truth table is shown below in table 1.

X1 X2 Desired

Outputs

0 0 0

0 1 1

1 0 1

1 1 0

Table 1 The XOR truth table

The first neuron is designed with strong negativity. The second neuron is designed with

strong positivity and the third neuron must discriminate between the two of them

Table 2 Truth table for the neuron with strong negativity N1 and the neuron with strong positivity N2 [5].

X1 X2 N1 N2 Y

0 0 0 0 0

0 1 0 1 1

1 0 0 1 1

1 1 1 0 1

14

This problem is now linearly separable and classification can be achieved. Matlab produces

a plot of the overall classification (Figure 12). There is a classification line through (0,0) to

(1,1) indicating the output is 0 for both of these inputs and another classification line

through (0,1) to (1,0) indicating that both of these inputs produce 1 as an output.

Figure 12 Overall classification of XOR problem [5]

The Neural Network does not automatically classify the inputs correctly. When the input

data was first applied the output was incorrect, the network had to be trained to recognise

the inputs and perform as an XOR gate. The training of the data takes place over 70 epochs.

Figure 13 highlights that the minimum gradient has been reached and the performance goal

was not met.

15

Figure 13 Training of the XOR network, mean square error plot over 70 epochs [5]

The network is trained again. This time a specific goal is set for the network to achieve.

This goal is 0.0037^2. It only takes four epochs for this goal to be achieved and correct

classification then takes place (Figure 14). The hidden layer behaves like a little black box,

hence its name, hidden layer. Its behaviour is hidden from view and can only be

approximated so it may behave slightly differently each time the network and training

algorithm are run, every time producing different results.

Figure 14 Training of XOR network with performance goal met, mean square error plot until convergence [5]

16

2.3 Summary The single layer perceptron is the simplest form of artificial neural network. It is possible to

implement the SLP without the neural network toolbox but this perceptron is not as

powerful as one created with the toolbox. Using the toolbox the single layer perceptron can

perform classification on linear inputs. The next step is to extend the single layer

perceptron architecture to solve more challenging problems.

This leads to the development of multi-layer perceptrons whose structure can be applied to

more difficult problems, which the SLP cannot solve. The XOR classification problem is

one such example. It clearly illustrates the benefits of multi-layer perceptrons in the

solution classification problems. An MLP can classify non-linear problems successfully.

The next step is to apply multi-layer perceptrons to system identification. The system

chosen to test MLP applications is the Anti-Lock Braking System (ABS).

17

Chapter 4

Anti-Lock Braking System The ABS model is a demo model found in Simulink Matlab. It, like many other models,

can be modelled or identified using multi-layer perceptrons. A typical anti-lock braking

system senses when the wheel lock up is to occur. It then releases the brakes for a very

short time and reapplies the brakes when the wheel spins up again. ABS greatly reduces the

possibility of skidding during hard braking. ABS also lets the driver steer during braking.

This ability to steer during braking is the one of the main benefits of ABS; in a hard braking

situation without the ABS the wheels may skid and at times lose traction between the tires

and road, which could result in accidents. Neural Networks have already been used with

great success to develop a genetic neural fuzzy controller. This controller finds the optimal

wheel slips that maximize the road adhesion coefficient [7]. The Anti-lock brake system

lends itself to neural network modelling and Fuzzy Logic Control because of its need to

constantly alter its response to variations of inputs [8]. It exhibits highly non-linear

behaviour; also artificial neural modelling of ABS results in applications implemented in

the real world. For this reason and also because an externally controlled input can be

applied in the form of a Pseudo-Random Binary Sequence (PRBS) the ABS system is

chosen as a model. This model provides input and output data that is used in an artificial

neural network built with the Matlab neural network toolbox.

4.1 ABS Model

4.1.1 Basic Steps of System Identification There are three phases of system identification:

• collect experimental input/output data,

• select and estimate the model structures used to build the neural network,

• validate the models and select the best model.

These are the steps followed in the development of multi-layer perceptrons for the ABS

model.

18

4.1.2 The Simulink Model

This ABS model is a simplified model of a normal ABS design (figure 15). This model

captures the essential features of the process and it is reasonable to assume that this model

behaves as a real ABS would [7]. It is possible to develop a set of input equations to model

this system. Recent studies of the ABS model derive these equations in full from tractive

forces and normal forces acting on the tyres and other elements like adhesion and angular

velocity [7]. The model used for this recent research is very similar (albeit more simplistic)

to the model used in this project, which is shown below.

Figure 15 ABS Model with pseudo random binary sequence input

19

4.1.3 Pseudo Random Binary Sequence Input The controlled input into this model is a PRBS. Within Matlab/Simulink generating a

PRBS m-file is done in the frequency domain system identification toolbox using mlbs –

maximum length binary sequence or in the system identification toolbox version 4.0 using

idinput. The pseudo-random binary sequence as its name suggests generates a pseudo

random binary sequence output shown below (Figure 16). This output is used as a

controlled input into the ABS model. The input is either 0 or 1, which provides random

excitation. The data is persistently exciting, so that the training set has to be representative

of the entire class of inputs that may excite the system.

Figure 16 Pseudo random binary sequence.

4.2 Data

4.2.1 Data Collection Data must be collected from the model during simulation. Input and output data is collected

using a workspace sink. The data is categorised into three heading. These are training data,

testing data and validation data. Generally 60% of the input and output data is used for

training, 20% is used for testing and 20% used for validation. However, previous research

using neural networks for feature extraction and temporal segmentation of acoustic signal

used 80% of the collected data for training the system and 20% for testing [9]. Testing and

validation of a network is a very important aspect in developing an effective neural network

so when modelling the ABS 60% of the data is used for training and validation and testing

data files are created using 20% of the collected data in each. This set of data is reused each

20

time a new training algorithm is implemented to ensure training; testing and validation

parameter conditions remain constant though out the experiment. Any variations in results

can only be related to the algorithms or the networks architecture as opposed to a different

input data sequence and its corresponding output data.

4.2.2 Loading Data The data must be loaded in matlab before the neural network can be run or trained. The

network in a sense must be able to see the input and target data. It models itself on this

data. Loading the data is done with the following code.

Figure 17 illustrates the response of the ABS to the random excitation signal input. This is

a visual representation of a small section of the data that is loaded before the network can be

run or trained.

Figure 17 A visual representation of input and output data

load input_dataload output_data

21

4.3 Summary The ABS model exhibits a high level of non-linearity, this is the main reason for its choice

as a model for system identification; also it is easy to modify the Simulink Model so that a

PRBS input can be applied. Data is collected from the ABS Simulink Model and this data is

applied to the design of the Neural Network. This is the first step in the system

identification process.

22

Chapter 5

Building Neural Network – the design detail System identification is carried out in phases. The first is the data collection process, which

is outlined in previous section (Section 4.2 Data). Next, this data is processed to filter it and

remove any outliners. Processing can improve the overall performance of a model [11]. A

model structure is selected and the best parameters for this structure computed. The

model’s properties and convergence results are examined and analysed. The matlab neural

network toolbox provides all the necessary functions to ensure these procedures can be

followed.

5.1 Design Detail

5.1.1 Pre & Post processing Network training can be made more efficient if certain processing steps are performed on

the network inputs and targets. Two types of pre and post processing are implemented in

testing [11].

• scaling – known as min and max,

• normalisation of mean and standard deviation of the standard deviation of the

training set.

Scaling

The function premnmx() is used to scale the inputs and targets so they fall within a specified

range. The output of the network is now trained to produce outputs in the (-1,1) range.

These are converted back into the same units that were used for the original targets.

Mean and Standard Deviation

The second approach is normalisation of the mean and standard deviation of the training set.

This is done using prestd(). It normalises the inputs and targets so they will have zero mean

and unity standard deviation. The outputs are converted back into the same units that are

used for the original targets using poststd.

23

5.1.2 Neural Network Model Structure

There are two basic neural network model structures; the parallel identification structure

and the series parallel structure. The parallel identification structure has direct feedback

from the network outputs to its inputs (Figure 18). It estimates the outputs and uses these

estimates to predict the future outputs. However, this structure does not guarantee stability

because of feedback. As, it also requires dynamic backpropagation training. This structure

is only used if the actual plant outputs are not available.

Figure 18 Parallel identification model [3]

The series-parallel identification structure does not use feedback (Figure 19). Instead, it

uses the actual plant output to predict the future outputs. Static backpropagation is used and

generally stability and convergence are guaranteed with this method [3]

24

Figure 19 Series-parallel identification model [3]

Like a normal system identification model a neural network model structure is defined by

inputs but also by the neural network architecture. This architecture includes the type of

network, hidden layers and hidden nodes. In this case the series-parallel identification

model is used as the neural network model structure. This is because of its high level of

stability and convergence success and because of its ability to be used off line [10].

5.1.3 Types of Neural Networks Two types of neural networks are used to construct this series-parallel identification model.

These networks are recurrent neural networks and multi-layer networks. First of all the

series-parallel identification model is constructed in Matlab using a feed-forward

backpropagation network, which is a multi-layer network. [Appendix 3] This feed-forward

backpropagation model, which is a model with static backpropagation is built using the

newff() shown in the example MLP code. A neural network is not programmed but

‘trained’. The algorithm that is used to adjust the weights of the links so as to produce the

desired output is known as “the training the network”. Backpropagation involves

performing computations backwards through the neural network. There are several

variations to the basic training algorithm of the back propagation neural network. These

25

variation algorithms are the basis of test procedures evaluating the overall most effective

way to model the ABS.

MLP Code

%Designing Neural Networkclose all % close all open figuresclear all % clear all old variables, to reducethe risk of confusing errors

tic;load input_dataload output_datateach1=teach1';teach2=teach2';

net = newff(minmax(teach1),[5,2],{'tansig''purelin'},'traingd');

%Training the Neural Networknet=init(net);Y = sim(net,teach1);[pn,minp,maxp,tn,mint,maxt] =premnmx(teach1,teach2);%net.trainParam.show=5;net.trainParam.epochs=200;net.trainParam.lr=0.02;net=train(net,pn,tn);

Y = sim(net,teach1);

%plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf)%legend('Training','Validation','Test',-1);%ylabel('Squared Error'); xlabel('Epoch')

toc

26

5.1.4 Training Algorithms – multi-layer network results The ABS is modelled using a number of training algorithms; the first is the steepest descent.

Steepest descent is the simplest implementation of back propagation learning. It updates the

network weights and biases in the direction in which the performance function decreases.

This function is represented as

kkkk gxx α−−+1 equation 9

ratelearninggradientcurrentg

biasweightscurrentofvectorx

k

k

k

===

α

This function is known as the steepest gradient descent training function. The changes to

the weights and biases are obtained by multiplying the learning rate by the negative

gradient. The higher the learning rate the larger the step taken. If the learning rate is set to

large the algorithm can become unstable, if the learning rate is set to small then the

algorithm will take to long to converge.

Traingd implements the steepest descent algorithm. Figure 20 shows the training plot of

the artificial neural network using Traingd. The learning rate is set to 0.02. The

performance of the network is measured in this case according to the mean square errors

(mse).

Figure 20 Training plot of Traingd

27

Traingdm implements the steepest descent with momentum. Momentum allows the

network to respond to the local gradient and recent trends in error surface. Momentum

prevents the network getting beyond a local minima. The momentum constant is defined by

µ it is a number between 0 and 1. The training plot in figure 21 exhibits the ABS data

modelled with the traingdm algorithm with a momentum constant of 0.9. When the

momentum constant is 1 the new weight change is set equal to the last weight change and

the gradient is simply ignored. When the momentum constant µ is 0 a weight change is

based solely on the gradient and the traingdm simply behaves, as the traingd algorithm

would (Figure 22).

Figure 21 Traingdm plot with momentum constant of 0.9

28

Figure 22 Traingdm with mu=0, training plot similar to Traingd plot as weight change based on gradient

Traingda implements the steepest descent training function with a variable learning rate. If

the learning rate is set too large the algorithm can oscillate and become unstable but if it is

set too small the algorithm will take to long to converge. The learning rate with the

algorithm Traingda is allowed to change during the training process in response to the

complexity of the local surface error. This procedure increases the learning rate, but only to

the extent that the network can learn without large error increases. Near optimal learning is

achieved for the local terrain. When a large learning rate could result in stable learning the

learning rate is increased, when the learning rate is too high to guarantee a decrease in error

it gets decreased until stable learning is achieved again. In figure 23 the minimum gradient

is reached by the 66 epoch so the learning rate variation and training stops at this epoch.

29

Figure 23 Training plot of Traingda

The increase in learning rate is plotted in figure 24. The learning rate increase terminates at

epoch 66 when the training stops. The training plots outputted with these steepest gradient

decent algorithms all achieve performance at around 0.477 mse. Trainlm algorithm, another

type of algorithm is implemented however it does not affect the mse performance output.

Figure 24 Variable learning rate plotted against each epoch iteration

30

Trainlm implements the Levenberg-Marquardt algorithm. Trainlm was designed to

overcome the problems of having to compute the Hessian matrix (second derivatives) of the

performance index at the current values of weights and biases. This algorithm appears to be

a faster method for training moderate size feed-forward neural networks. In this case the

time elapsed is just 9.0470 seconds for the training to take place in comparison to the time

elapsed for the training algorithm Traingda which was approximately 15 seconds. Trainlm

is a very efficient Matlab implementation since the solution of the matrix equation is a built-

in function so its attributes become even more pronounced in a Matlab setting [11].

Figure 25 Trainlm performance training plot

The performance of the network remains constant at approximately 0.477 mse. Even with

the use of the most efficient training algorithm in the toolbox the performance is unchanged.

The efficiency of this algorithm is concluded to relate to time and its ability to compute the

algorithm more rapidly than other algorithms.

31

5.2 Test Procedure for Multi-Layer Neural Network

5.2.1 Over-fitting The testing data set has been implemented with previously described procedure, the whole

collection of data is implemented including the training, testing and validation data. These

new plots highlight how the neural model is performing. The data sets are implemented

with out pre-processing and the performance mse is 0.6667 (Figure 26). This performance

is not as good compared to 0.477 mse achieved with pre and post processing of the data set.

Over-fitting however cannot be held responsible for this neural networks poor performance.

Over-fitting occurs when the error on the training set is driven to a very small value but

when new data is presented to the network the error is large. The network has memorised

the training examples but has learned not to generalise to new situations. These data sets do

not show signs of over-fitting. Over-fitting is typically highlighted by the validation data

rising and converging at a higher level than the training data [11]. This is not the case in the

in the Traingdm plot where the test set rises above the validation and training set (Figure

27). If over-fitting were to occur early stopping could be implemented. In this case more

data is easily collected from the ABS model and the size of the training set increased, so

there is no possibility of over-fitting.

Figure 26 Training, testing and validation data plot using Trainlm, highlighting over-fitting

32

Figure 27 Training, testing and validation data plot using Traingdm, highlighting over-fitting

5.2.2 Post Training Analysis

This neural network seems to be producing a poor response, which is evident from the

training, testing and validation plots. In order to examine exactly how poor a response

given by the neural network post training analysis is carried out. Post training or regression

analysis is performed between the network response and the corresponding targets. The

following code produces a plot for post training analysis.

m and b correspond to the slope of the y-intercept of the best linear regression relating

targets to the network outputs. If there was a perfect fit i.e. if the outputs exactly equal the

targets, the slope would be 1 and the y-intercept would be 0. The third variable returned,

the R-value is the coefficient between the outputs and targets. It is a measure of how well

[a]=postmnmx(Y,mint,maxt);[m,b,r]=postreg(a(2,:),teach2(2,:));

33

the variation in output is explained by the targets. If this number is equal to 1, then there is

perfect correlation between targets and outputs. These are the post training analysis outputs

for the Trainlm algorithm.

m= 6.9188e-005

b=63.2698

v=0.0083

Figure 28 Post training analysis plot for Trainlm algorithm.

The R-value is extremely low and indicates a very poor linear fit, which is shown in figure

28. A similar plot is obtained for the Traingdm algorithm indicating the overall weakness of

the neural network to perform system identification (Figure 29). The R-value in this case is

a negative number but the system still exhibits poor linear fit.

34

Figure 29 Post training analysis plot for Traindgm algorithm.

The analysis of this system indicates a very poorly functioning neural network identifier.

This system could be improved by changing the architecture. This could be done by adding

more hidden layers and increasing the number of input neurons. The actual optimum

structure is achieved through trial and error. Some changes are made to the structure but a

signification improvement in performance is not highlighted. The hidden layer of the

network designed with the Trainlm algorithm is increased from 5 neurons to 22 neurons.

The output training performance plot shows no significant change (Figure 30).

Figure 30 Performance training plot with 22 hidden layers

35

5.3 Summary The building of a neural network follows a number of systematic procedures. Adherence to

these procedures does not necessarily guarantee a highly effective neural network model.

The model requires vigorous testing to obtain the optimal architecture as the number of

hidden layers and neurons in each layer determine the performance of the network. The

MLP architecture in this study does not reach its optimal potential. However, the structure

provides the basis for a recurrent neural network.

36

Chapter 6

Recurrent Neural Networks

6.1 Structure of Recurrent Neural Network – design detail

Although multi-layer networks and recurrent neural networks have different structures they

may be viewed similarly. The networks have the potential to be used in unison in systems

with dynamic elements and feedback [10]. In effect recurrent neural networks used for

identification or model based predictive control are multi-layer neural networks with a delay

element in their feedback loop. Recurrent neural networks could be built with multi-layer

networks in their feedback loop, creating a system where the structures compute in tandem.

Hence the networks could be used in unison creating systems with both dynamic elements

and feedback. This is beyond the scope of the structures examined and tested with the ABS

data, multi-layer perceptrons and recurrent neural networks were tested as separate entities

and their results compared. There are two neural network structures available in the Matlab

neural network toolbox: the Hopfield and the Elman structure. The Elman structure is

chosen as the architecture of the recurrent network used to model ABS. This choice is made

because the Hopfield architecture is seldom used in practice, even the best Hopfield designs

may have spurious results that can lead to incorrect answers [11]. Elman networks are two-

layer backpropagation networks with the addition of a feedback connection from the output

of the hidden layer to its input.

6.2 The Elman Structure

6.2.1 Building the structure The structure of the Elman recurrent network takes its skeletal shape from the multi-layer

architecture. The matlab function newelm() is used in a similar way to newff(). This

function includes a delay in the feedback loop calculations, hence creating a recurrent

37

neural network architecture. Elman Code is an example of the code used to test the Elman

structure.

Elman Code

%Designing Recurrent Neural Network

close all % close all open figuresclear all % clear all old variables, to reduce the riskof confusing errors

tic;load input_dataload output_datateach1=teach1'; %converting input sequence into columnsteach2=teach2'; %converting the target to columns

net=newelm([0 1],[5,2],{'tansig','tansig'},'traingdx');teach1seq=con2seq(teach1);teach2seq=con2seq(teach2);net=init(net);

net.trainParam.epochs=300;net.trainParam.show=5;net.trainParam.goal=0.01;net.performFcn='sse';[pn,minp,maxp,tn,mint,maxt]=premnmx(teach1,teach2);pnseq=con2seq(pn);tnseq=con2seq(tn);

[net,tr]=train(net,pnseq,tnseq);

tochold on;semilogy(tr.epoch,tr.perf)title('Sum squared error of Elman Network')xlabel('Epoch')ylabel('Sum squared error')Y=sim(net,pnseq);

38

The recurrent connection present in the Elman network allows the network to detect and

generate time-varying patterns. The Elman structure differs from conventional two layer

networks in that the first layer has the recurrent connection. The delay in this connection

stores values form the previous time step, which can be used as the current time step. This

property may give rise to the miscorrelation of results. Even if two Elman networks with

the same weights and biases are given identical inputs at a given time step their outputs can

be different due to different feedback states. The network has proved effective at storing

information for future reference and that is why it is tested for identification of the ABS

model. Different training algorithms are tested and the results compared with the multi-

layer structures.

6.2.2 Results Trainlm is the first algorithm, which trains the network it is the quickest of all the

algorithms. It tends to proceed so rapidly it does not necessarily do well when implemented

in Elman structures. However, this is a relative statement as the algorithm takes 75.0630

minutes to run 100 epochs compared with the multi-layer network run time of 28.6410

seconds for the trainlm algorithm. The performance results were also very poor. The mean

square error performance measurement was 3954.82. Figure 31 highlights the networks

poor performance.

Figure 31 Poor performance of recurrent neural network with Trainlm algorithm

39

Traingdx is now implemented to see if trainlm performance can be bettered. It takes

2.7670e+003 minutes to run 100 epochs, which is significantly longer than the trainlm run

time of 75.0630 minutes. The performance error only outperforms the trainlm algorithm

slightly until its maximum epoch is reached.

Figure 32 Traingdx algorithm performance plot

These results were inadequate and pre and post processing is implemented to see if

improvements can be made. First, all the mean and standard deviation of the input and

target data are normalised. As a result of normalisation they now have zero mean and unity

standard deviation. After training the inputs and outputs are scaled back into the original

units. This does not improve performance; in fact figure 33 highlights that performance has

deteriorated.

40

Figure 33 Deteriorated performance of recurrent neural network

A second type of pre and post processing, scaling is implemented because of the lack of

success with the mean and standard deviation method. The function premnmx() scales the

data for training and postmnmx() converts the data back to its original state after the

algorithm has run. The resultant plot shown in figure 34 does not show any significant

difference in performance even when mean and standard deviation processing was carried

out on the data.

41

Figure 34 Sum square error plot of recurrent network with pre and post processing implemented

6.3 Overall Analysis of Networks

6.3.1 Comparison Both systems tested do not perform to their optimum potential i.e. the MLP & recurrent

network. The multi-layer network out performs the recurrent network in terms of run time

and also square error performance. This result is not wholly unexpected because both

structures tested had just one hidden layer with a maximum of 5 neurons in this layer. For

an Elman to have the best chance at learning a problem it needs more hidden neurons in its

hidden layer than actually are required for a solution by any other method. With fewer

neurons, the Elman network is less able to find the appropriate weights for hidden neurons

since the error gradient is approximated [11]. Extensive testing is needed to improve the

performance of both networks because it is necessary to modify the architecture sometimes

only very slightly to produce a huge performance improvement. This testing for recurrent

networks is restricted by the length of time it takes for the networks to converge using the

backpropagation algorithm sometimes the structures have to be left over night to train

because of their long running time. The Genetic Algorithm (GA) is a possible solution for

42

the backpropagation training algorithm because it is not based on error gradient and does

not require as much computational time when the neuron number is high [12].

Development of genetic algorithms for identification and training purposes is a relatively

new direction and could produce extremely interesting results.

6.3.2 Conclusion and Future Directions Genetic algorithms implemented in recent research have proved that training cost in terms

of run time is still manageable as the number of neurons increases [12]. Genetic algorithms

are based on a different concept to the backpropagation training algorithm. They offer an

exciting future direction for the research. The GA starts off with a population of randomly

generated chromosomes and then advances towards better chromosomes by applying

genetic operators. During successive iterations or generations the chromosomes are

evaluated as possible solutions. Based on these evaluations, a new population is formed

using a mechanism of selection and applying genetic operators such as crossover and

mutation.

Figure 35 illustrates the basic concepts behind genetic algorithms operators. These are the

genetic operators of a genetic algorithm used for optimising the fuzzy rule base of the fuzzy

component of an ABS controller [7].

43

Figure 35 The basic concepts behind genetic algorithms [7]

Development of the project in the future will not be limited to the use of genetic algorithms

and the improvement of the structures which use backpropagation. Architecture may also

be developed to include both multi-layer and recurrent networks hence maximising the

strength of each of the individual architectures in one unified unit. The multi-layers

strength lies in its success at pattern recognition problems and the recurrent networks

success is in its solution of optimisation problems. Matlab toolbox has proved a very

powerful tool for building each of the architecture separately its capabilities may be

investigated and perhaps extended to build a more complex model. In this study the

development of research and testing has been progressive. It traces the development of the

SLP through its growth into recurrent networks. Testing highlights the flaws in all the

architectures such as the SLP inability to perform non-linear classification, the MLP poor

error performance and the recurrent networks poor error performance and long training

durations. Possible solutions are offered and interesting future directions are discussed in

the form genetic algorithm development and architecture modification.

44

References

[1] Bruce D. Baker & Craig E. Richards (In Press), Exploratory application of neural

networks to school finance: forecasting educational spending

[2] Arthur W.Ham, (1974), Histology Seventh Edition, J.B. Lippincott Company, Philadelphia and Toronto.

[3] J.Wesley Hines (1997), Fuzzy and Neural Approaches in Engineering, A Wiley-

Interscience Publication, John Wiley & Sons, INC.

[4] S. Haykin (1994), Neural Networks: A Comprehensive Foundation, N.Y.

Macmillian

[5] Jennifer Bruton, Course notes and reference code mlpeg1

[6] Chris Stergiou, Historical Background of Neural Networks

http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol1/cs11/article1.html

[7] Yonggon Lee & Stanislaw H. Zak (2001), Designing a Genetic Neural Fuzzy Anti-

Lock Brake System Controller, IEEE Transactions on Evolutionary Computation

[8] W.K. Lennon & K.M. Passino (1995), “Intelligent control for brake systems”,IEEE

Transactions on Fuzzy Systems, VOL.3, 381-388.

[9] S.Rossignol, X.Rodet, J.Soumagne, J-L Collette & P Depalle, Feature extraction

and temporal segmentation of acoustic signal, CNET/RENNES (Centre National

d’Etudes des Telecommunicatiors), France

[10] Kumpati S Narendra & Kannan Parthasarathy (1990), Identification and Control of

Dynamical Systems Using Neural Networks, IEEE Transactions on Neural

Networks, VOL 1, no. 1.

[11] http://www.mathworks.com/access/helpdesk/help/helpdesk.shtml

[12] A. Blanco, M. Delgado, M.C. Pegalarjar (2001), A real-coded genetic algorithm for

training recurrent neural networks, Neural Networks VOL 14, 93-95.

45

Appendix 1

%The andgate problem again this time with 12 cyclesclearw1=[0 1 -1]';b=1;k=1;x1=[-1 -1]';x2=[-1 1]';x3=[1 -1]';x4=[1 1]';tau1=-1;tau2=-1;tau3=-1;tau4=1;tau=[tau1 tau2 tau3 tau4];p=[[b;x1][ b;x2][ b;x3][ b;x4]];mu=0.2;new_w(:,k)=w1;y(k)=sign(w1'*p(:,k))e(k)=tau(:,k)-y(k);new_w(:,k+1)=w1+(mu*e(k)*p(:,k));k=0while k<12;for i=1:4;y(i)=sign(new_w(:,k+i)'*p(:,i));e(i)=tau(:,i)-y(i);new_w(:,k+i+1)=new_w(:,k+i)+(mu*e(i)*p(:,i));endk=k+4;end

46

Appendix 2

P=[-0.5 -0.5 0.3 0.1; %inputs-0.5 0.5 -0.5 1.0];

T=[0 0 0 1]; %targetsplotpv(P,T); %vectors plottednet=newp(minmax(P),1); %network created with one layer (slp)plotpv(P,T); %vectors replotted with networks

%attempt at classificationnet.b{1}=1; %bias

plotpc(net.IW{1},net.b{1}); %ploted with weights and values%weights are set to zero so no%classification line appears

%the network is now trained and a classification line is produced

E=1;while (sse(E));

[net,Y,E]=adapt(net,P,T);clf;plotpv(P,T);plotpc(net.IW{1},net.b{1});

drawnow;end

% a new point is classified with this network

p=[0.7;1.2];a=sim(net,p);plotpv(p,a);Point = findobj(gca,'type','line');set(Point,'color','red');hold on;plotpv(P,T);plotpc(net.IW{1},net.b{1});

47

Appendix 3

%Designing Neural Networkclose all % close all open figuresclear all % clear all old variables, to reduce the risk ofconfusing errors

tic;load input_dataload output_datateach1=teach1';teach2=teach2';

net = newff(minmax(teach1),[5,2],{'tansig''purelin'},'traingd');

%Training the Neural Networknet=init(net);Y = sim(net,teach1);[pn,minp,maxp,tn,mint,maxt] = premnmx(teach1,teach2);%net.trainParam.show=5;net.trainParam.epochs=200;net.trainParam.lr=0.02;net=train(net,pn,tn);

Y = sim(net,teach1);

%plot(tr.epoch,tr.perf,tr.epoch,tr.vperf,tr.epoch,tr.tperf)%legend('Training','Validation','Test',-1);%ylabel('Squared Error'); xlabel('Epoch')

toc

Date post:	26-Nov-2014
Category:	Documents
Upload:	meenakshig6
View:	5 times
Download:	2 times

Project Report

Documents