THE NEURAL-NETWORK ANALYSIS
& its applications
DATA FILTERS
Saint-Petersburg State University JASS 2006
About me
NameName:: Alexey MininPlace of studying:Place of studying: Saint-Petersburg State UniversityCurrent semester:Current semester: 7th semesterField of interests:Field of interests: Neural Nets, Data filters for Optics (Holography), Computational Physics,EconoPhisics.
Content:
What is Neural Net & it’s applications Neural Net analysis Self organizing Kohonen maps Data filters Obtained results
What is NeuroNet & it’s applications
Recognition of images Recognition of images
Processing Processing of noisedof noised signals signals
Addition of imagesAddition of images
Associative search
ClassificationClassification
Drawing up of schedules
Optimization
The forecastThe forecast
Diagnostics
PPrediction of risksrediction of risks
What is Neural Net & it’s applications
M-X2 9980
Recognition of imagesRecognition of images
What is Neural Net & it’s applications
PARADIGMS PARADIGMS of neurocomputingof neurocomputing
Neural Net analysis
Connection
Localness and parallelism of calculations
The training based on data (programming)
Universality of training algorithms
Neural Net analysis
What is Neuron?
Typical formal neuron makes the elementary operation – weighs values of the inputs with the locally stored weights and makes above their sum nonlinear transformation:
y f u u w w xi ii , 0
x1 xn
y
u
y
u w w xi i 0
neuron makes nonlinear operation above a linear combination of inputs
Neural Net analysis
Global communications
Formal neurons
Layers
Connectionism
Neural Net analysis
Localness and parallelism of calculations
Localness of processing of the information
Any neuron reacts only to the information from connected with it neurons without the appeal
to a general plan of calculations
Neurons are capable to function in parallel
Parallelism of calculations
Comparison of ANN&BNN
hzN 100 hzN910
BRAIN PC IBM
Vprop=100m/s Vprop=3*108 m/s
100hz 109hzN=1010-1011 neurons N=109
The parallelism degree ~1014
like 1014processors with 100 Hz frequency. 104 connected at the same time.
The training based on data (programming)
Neural Net analysis
Absence of the global plan
Mode of distribution of the information on a network with
corresponding adaptation neurons
The algorithm is not set inadvance, and generated by data
Training of a network occurs on a small share of all possible situations then the trained network is capable to function in much wider range of patterns
Local change by any neuron the selected parameters
Synaptic weights
Training of a network
Patterns, on which Network is training
An ability for generalization
Neural Net analysis
Universality of training algorithms
The only principle of studying - - is to find minimum of empirical error
W – set of synaptic weightsE (W) – error function
The task is to findThe task is to findGlobal minimumGlobal minimum
The stochastic optimization asa way not to stick at local minimum
Neural Net analysis
BASIS NEURAL NETS
Perceptron
Hopfield network
Kohonen maps
Probabilistic NNets
NN with general regression
Polynomial nets
The architecture of NN
Neural Net analysis
LEVEL-BY-LEVEL
WITHOUT FEEDBACK
RECURRENT with FEEDBACK (Elman-Jordan)
PROTOTYPES OF ANY NEURAL ARCHITECTURE
Classification of NN
Neural Net analysis
By type of training
with tutor without tutor
In this case the network is offered most to find the latent laws in data file. So, redundancy of data supposes compression of the information, and a network it is possible to learn to find the most compact representation of such data, i.e. to make optimum coding the given kind of the entrance information.
)},(,,{)( wxyyxEwE )},(,{)( wxyxEwE
Methodology Methodology of self-organizingof self-organizing cards cards
Self-organizing Kohonen cards represent the type of the neural networks trained without the teacher. The network independently forms the outputs, adapting to signals acting on its input. As "teacher" of a network only data, that is an information available in them, the laws distinguishing entrance data from casual noise can serve.
Cards unite in themselves two types of compression of the information:
Downturn of dimension of data with the minimal loss of the information
Reduction of a variety of data due to allocation of a final set of prototypes, and references of data to one of such types
Schematic representation of self-organizing network
Methodology Methodology of self-organizingof self-organizing cards cards
Neurons in the target layer are ordered and correspond to cells of a bi-dimensional card which can be painted by a principle of affinity of attributes
Hebb training ruleHebb training rule
Hebb, 1949 Change of weight at presentation of ith example is proportionally its inputs and outputs: :
Change of weight at presentation of ith example is proportionally its inputs and outputs: :
If to formulate training as a problem of optimization trained on Hebb neuron aspires to increase amplitude of the output:
Where averaging is spent on training sample x
Training on Hebb in that kind in what it is described above, In practice not useful since leads to unlimited increase of amplitude of weights.
NB: in this case there is no minimum error
Vector representation
j jw y x y w x
2 21 12 2
,
, ,
E
E y
ww
w x w x
Oya training ruleOya training rule
x
w
1
The member interfering is added To unlimited growth of weights
Vector representation
Rule Oya maximizes sensitivity of an output neuron at the limited amplitude of weights. It is easy to be convinced of it, having equated average change of weights to zero. Having increased then the right part of equality on w. We are convinced, that in balance
Thus, weights trained neuron are located on hyper sphere:
At training on Oya, a vector of weights neuron settles down on hyper sphere, In a direction maximizing Projection of entrance vectors.
j j jw y x y w y y w x w
2 1 0y 2w
1.w
Competition of neurons: the winner takes away all
# : iii i w x w x
x1 xd
y w xi ij jj
d
1
i i k kky y w x w
Basis algorithmTraining of a competitive layer remains constant
Winner:
1i iiif i i
w w x w x
i # of neuron winner
I.e. the winner will appear neuron, giving the greatest response to the given entrance stimulus
Training of the winner:
i i
w x w1, 0,ii
y y i i
The winner takes away all
One of variants of updating of a base rule of training of a competitive layer Consists in training not only the neuron-winner, but also its "neighbors", though and with In the smaller speed. Such approach - "pulling up" of the nearest to the winner neuron- It is applied in topographical Kohonen cards
# : min iii
i i
w x w x
( 1) ( 1) ( ) , ( ) ( )t t t t t t i i i iw w w i i x w
, t i i Function of the neighborhood is equal to unit for the neuron--winner with an index And gradually falls down at removal from the neuron-winner
i
Modified by Kohonen training rule
Training on Kohonen reminds stretching an elastic grid of prototypes on Data file from training sample
Bidimentional topographical card of a set Three-dimensional data
Each point in three-dimensional space gets in the cell of a grid having coordinate of the nearest to it’s neuron from bidimentional card.
xi The convenient tool of visualization Data is coloring topographical Cards, it is similar to how it do on Usual geographical cards. Allattribute of data generates the coloring Cells of a card - on size of average value This attribute at the data who have got in given Cell.
Visualization a topographical card, Induced by i-th Visualization a topographical card, Induced by i-th component of entrance datacomponent of entrance data
Having collected together cards of all interesting Us of attributes, we shall receive topographical The atlas, giving integrated representation About structure of multivariate data.
Classified SOM for NASDAQ100 index for the period from 10-Nov-1997 till 27-Aug-2001
Methodology Methodology of self-organizingof self-organizing cards cards
1,0
1,5
2,0
2,5
3,0
3,5
4,0
1 51 101 151
Время
Ln Y
(t)
Change in time of the log-Change in time of the log-price of actions of price of actions of companies JP Morgan companies JP Morgan Chase (The top schedule) Chase (The top schedule) and American Express (the and American Express (the bottom schedule) for the bottom schedule) for the period With 10-Jan-1994 on period With 10-Jan-1994 on 27-Oct-199727-Oct-1997
1,5
2,0
2,5
3,0
3,5
4,0
4,5
1 51 101 151
Время
Ln Y
(t)
Change in time of the log-price of actions of companies JP Morgan Chase (The top schedule) and Citigroup (the bottom schedule) for the period c 10-Nov-1997 on 27-Aug-2001
How to choose a variant?How to choose a variant?
Annual prediction
-29
-28
-27
-26
-25
-241988 1993 1998 2003 2008 2013 2018 2023 2028 2033 2038
Annu
al CS
L
TEST PREDICTION
This is the forecast of theThis is the forecast of theSea level (Caspian)Sea level (Caspian)
DATA FILTERS
Custom filters (e.g. Fourier filter) Adaptive filters (e.g. Kalman filter) Empirical mode decomposition Holder exponent
Adaptive filters
Further we will keep in mind, that we are going to make forecasts, that’s why we need filters, which won’t won’t change phasechange phase of the signal.
Z-1
X(n)
X(n-1)X(n-2)
…
X(n-nb)
Z-1
Z-1
b(2)
b(3)
b(nb+1)
Z-1
Z-1
Z-1
-a(2)
-a(3)
-a(na+1)
y(n) y(n-1)
y(n-2)
…y(n-nb)
)()1(...)1()2()()1(...)1()2()()1()( nanynaanyanbnxnbbnxbnxbny
)()1(...)1()2(
)()1(...)1()2()()1()(
nanynaanya
nbnxnbbnxbnxbny
Adaptive filters
We saved all maxima, there is no phase distortion
Siemens value, ad close (scaled)
Adaptive filters
Let’s try to predict next value using zero-phase filter, having information about historical price: I used Perceptron with 3 hidden layers, logistic act function, rotation alg, 20 min
Adaptive filters
Kalman filter
noisewhite)(,aftersignal)()()(
andnoisewhite)(
,signalgeneratingofmodel)1()1()(
where)],1()()[()1()(
nnetneuralnncxny
nw
nwnaxnx
nxacnynknxanx
K(n) )(nx
+ +
Z-1
ac)1(
nx)1(
nxac
)(ny
Adaptive filters
Lets use Kalman filter, like the error estimator for the forecast of the zero-phase filtered data.
Empirical Mode Decomposition
What is it?
We can heuristically define a (local) high-frequency part{d(t), t− ≤ t ≤ t+}, or local detail, which correspondsto the oscillation terminating at the two minimaand passing through the maximum which necessarilyexists in between them. For the picture tobe complete, one still has to identify the corresponding(local) low-frequency part m(t), or local trend,so that we have x(t) = m(t) + d(t) for t− ≤ t ≤ t+.
)()(
...)()()()()()(
thatso
),()()(m
as decomposed itself is )( residualfirsttheand
),()(x(t)
as loopmain ethrough th
decomposedfirst is x(t)signal original the,Eventually
1
22111
221
1
11
tmtd
tmtdtdtmtdtx
tmtdt
tm
tmtd
K
k Kk
What is it?Empirical Mode Decomposition
Empirical Mode Decomposition
Algorithm
Given a signal x(t), the effective algorithm of EMDcan be summarized as follows:1. identify all extrema of x(t)2. interpolate between minima (resp. maxima),ending up with some envelope emin(t) (resp. emax(t))3. compute the mean m(t) = (emin(t)+emax(t))/24. extract the detail d(t) = x(t) − m(t)5. iterate on the residual m(t)
tone
chirp
tone + chirp
10 20 30 40 50 60 70 80 90 100 110 120
-2
-1
0
1
2
IMF 1; iteration 0
10 20 30 40 50 60 70 80 90 100 110 120
-2
-1
0
1
2
IMF 1; iteration 0
10 20 30 40 50 60 70 80 90 100 110 120
-2
-1
0
1
2
IMF 1; iteration 0
10 20 30 40 50 60 70 80 90 100 110 120
-2
-1
0
1
2
IMF 1; iteration 0
10 20 30 40 50 60 70 80 90 100 110 120
-2
-1
0
1
2
IMF 1; iteration 0
10 20 30 40 50 60 70 80 90 100 110 120
-2
-1
0
1
2
IMF 1; iteration 0
10 20 30 40 50 60 70 80 90 100 110 120
-2
-1
0
1
2
IMF 1; iteration 0
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 1
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 1
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 1
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 1
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 1
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 1
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 1
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 2
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 2
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 2
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 2
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 2
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 2
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1.5
-1
-0.5
0
0.5
1
1.5
IMF 1; iteration 2
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 1; iteration 3
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 1; iteration 4
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 1; iteration 5
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 1; iteration 6
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 1; iteration 7
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 1; iteration 8
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 2; iteration 0
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 2; iteration 1
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 2; iteration 2
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 2; iteration 3
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 2; iteration 4
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
IMF 2; iteration 5
10 20 30 40 50 60 70 80 90 100 110 120
-1
-0.5
0
0.5
1
residue
imf1
Empirical Mode Decomposition
imf2
imf3
imf4
imf5
imf6
10 20 30 40 50 60 70 80 90 100 110 120
res.
Empirical Mode Decomposition
Lets do it for Siemens indeximf3
-0,05
-0,04-0,03
-0,02
-0,01
00,01
0,02
0,03
0,040,05
0,06
0 200 400 600 800 1000 1200 1400
imf3
imf4
-0,08
-0,06
-0,04
-0,02
0
0,02
0,04
0,06
0,08
-100 100 300 500 700 900 1100 1300
imf4
resid
0
0,10,2
0,30,4
0,5
0,60,7
0,80,9
1
-100 100 300 500 700 900 1100 1300
resid
Empirical Mode Decomposition
Lets do it for Siemens index
0
0,2
0,4
0,6
0,8
1
1,2
0 200 400 600 800 1000 1200 1400
3+4+resactual
We saved all strong maxima and there is no phase distortion
Empirical Mode Decomposition
Lets make a forecast for Siemens index
-0,08
-0,06
-0,04
-0,02
0
0,02
0,04
0,06
0,08
0,1
0 50 100 150 200 250
forecast
actual
THERE WAS NO DELAY IN THE FORECAST AT ALL!!!
Holder exponent
fDtf )(
]1,0[)(,)(|)()(| )( ttconsttfttf t
)( have that wemeans1
order second ofbreak have that wemeans0
tO
The main idea is next. Consider
Holder derived, that
So this formula is a somewhat connection between “bad” functions and “good” functions. If we will look on this formula with more precise we will notice, that we can catch moments in time, when our function knows, that it’s going to change it’s behavior from one to another. It means that today we can make a forecast on tomorrow behavior. But one should mention that we don’t know the sigh on what behavior is going to change.
Results
Thank You!Any QUESTIONS?SUGGESTIONS?
IDESAS?
Soft I’m using:1)MatLab2)NeuroShell3)FracLab4)Statistika5)Builder C++