Date post: | 10-Nov-2015 |
Category: |
Documents |
Upload: | gowthamucek |
View: | 3 times |
Download: | 1 times |
1UNIT-1
ARCHITECTURE
www.Vidyarthiplus.com
2What are Neural Networks? Simple computational elements forming a largenetwork Emphasis on learning (pattern recognition) Local computation (neurons)
Definition of NNs is vague Often | but not always | inspired by biological brain
www.Vidyarthiplus.com
3Machine LearningMachine LearningMachine learning involves adaptive mechanismsMachine learning involves adaptive mechanismsthat enable computers to learn from experience,that enable computers to learn from experience,learn by example and learn by analogy. Learninglearn by example and learn by analogy. Learningcapabilities can improve the performance of ancapabilities can improve the performance of anintelligent system over time. The most popularintelligent system over time. The most popularapproaches to machine learning areapproaches to machine learning are artificialartificialneural networksneural networks andand genetic algorithmsgenetic algorithms. This. Thislecture is dedicated to neural networks.lecture is dedicated to neural networks.
www.Vidyarthiplus.com
4Biological neural networkBiological neural network
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
DendritesAxon
www.Vidyarthiplus.com
5The neuron as a simple computing elementThe neuron as a simple computing elementDiagram of a neuronDiagram of a neuron
Neuron Y
Input Signals
x1
x2
xn
Output Signals
Y
Y
Y
w2
w1
wn
Weights
www.Vidyarthiplus.com
6Architecture of a typical artificial neural networkArchitecture of a typical artificial neural network
Input Layer Output Layer
Middle Layer
I n p
u t
S
i g n
a l
s
O u t
p
u t
S
i g n
a l
s
www.Vidyarthiplus.com
7nn AA neural networkneural network can be defined as a model ofcan be defined as a model ofreasoning based on the human brain. The brainreasoning based on the human brain. The brainconsists of a densely interconnected set of nerveconsists of a densely interconnected set of nervecells, or basic informationcells, or basic information--processing units, calledprocessing units, calledneuronsneurons..
nn The human brain incorporates nearly 10 billionThe human brain incorporates nearly 10 billionneurons and 60 trillion connections,neurons and 60 trillion connections, synapsessynapses,,between them. By using multiple neuronsbetween them. By using multiple neuronssimultaneously, the brain can perform its functionssimultaneously, the brain can perform its functionsmuch faster than the fastest computers inmuch faster than the fastest computers inexistence today.existence today.
www.Vidyarthiplus.com
8nn Each neuron has a very simple structure, but anEach neuron has a very simple structure, but anarmy of such elements constitutes a tremendousarmy of such elements constitutes a tremendousprocessing power.processing power.
nn A neuron consists of a cell body,A neuron consists of a cell body, somasoma, a number of, a number offibers calledfibers called dendritesdendrites, and a single long fiber, and a single long fibercalled thecalled the axonaxon..
www.Vidyarthiplus.com
9nn Our brain can be considered as a highly complex,Our brain can be considered as a highly complex,nonnon--linear and parallel informationlinear and parallel information--processingprocessingsystem.system.
nn Information is stored and processed in a neuralInformation is stored and processed in a neuralnetwork simultaneously throughout the wholenetwork simultaneously throughout the wholenetwork, rather than at specific locations. In othernetwork, rather than at specific locations. In otherwords, in neural networks, both data and itswords, in neural networks, both data and itsprocessing areprocessing are globalglobal rather than local.rather than local.
nn Learning is a fundamental and essentialLearning is a fundamental and essentialcharacteristic of biological neural networks. Thecharacteristic of biological neural networks. Theease with which they can learn led to attempts toease with which they can learn led to attempts toemulate a biological neural network in a computer.emulate a biological neural network in a computer.
www.Vidyarthiplus.com
10
nn An artificial neural network consists of a number ofAn artificial neural network consists of a number ofvery simple processors, also calledvery simple processors, also called neuronsneurons, which, whichare analogous to the biological neurons in theare analogous to the biological neurons in thebrain.brain.
nn The neurons are connected by weighted linksThe neurons are connected by weighted linkspassing signals from one neuron to another.passing signals from one neuron to another.
www.Vidyarthiplus.com
11
Network Structure The output signal is transmitted through theThe output signal is transmitted through theneuronneurons outgoing connection. The outgoings outgoing connection. The outgoingconnection splits into a number of branchesconnection splits into a number of branchesthat transmit the same signal. The outgoingthat transmit the same signal. The outgoingbranches terminate at the incomingbranches terminate at the incomingconnections of other neurons in the network.connections of other neurons in the network.
www.Vidyarthiplus.com
12
Biological Neural Network Artificial Neural NetworkSomaDendriteAxonSynapse
NeuronInputOutputWeight
Analogy between biological andAnalogy between biological andartificial neural networksartificial neural networks
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
DendritesAxon
Input Layer Output Layer
Middle LayerI n
p u
t S
i g n
a l
s
O u
t p
u t
S
i g n
a l
s
www.Vidyarthiplus.com
13
Course TopicsLearning Tasks
Supervised UnsupervisedData:Labeled examples(input , desired output)Tasks:classificationpattern recognitionregressionNN models:perceptronadalinefeed-forward NNradial basis functionsupport vector machines
Data:Unlabeled examples(different realizations of theinput)Tasks:clusteringcontent addressable memoryNN models:self-organizing maps (SOM)Hopfield networks
www.Vidyarthiplus.com
14
Network architectures
Three different classes of network architectures
single-layer feed-forward neurons are organized multi-layer feed-forward in acyclic layers recurrent
The architecture of a neural network is linked with thelearning algorithm used to train
www.Vidyarthiplus.com
15
Single Layer Feed-forward
Input layerof
source nodesOutput layer
ofneurons
www.Vidyarthiplus.com
16
Multi layer feed-forward
Inputlayer
Outputlayer
Hidden Layer
3-4-2 Network
www.Vidyarthiplus.com
17
Recurrent Network with hidden neuron: unit delay operator z-1 isused to model a dynamic system
z-1
z-1
z-1
Recurrent network
inputhiddenoutput
www.Vidyarthiplus.com
18
The Neuron
Inputvalues
weights
Summingfunction
Biasb
ActivationfunctionLocalField
v Outputy
x1
x2
xm
w2
wm
w1
)(.
www.Vidyarthiplus.com
19
The Neuron The neuron is the basic information processing unit of a
NN. It consists of:1 A set of links, describing the neuron inputs, withweights W1, W2, , Wm
2 An adder function (linear combiner) for computing theweighted sum of the inputs(real numbers):
3 Activation function (squashing function) forlimiting the amplitude of the neuron output.
m
1jjxwu
j
)(uy b
www.Vidyarthiplus.com
20
Bias of a Neuron The bias b has the effect of applying an affine
transformation to the weighted sum uv = u + b
v is called induced field of the neuronx2x1u x1-x2=0
x1-x2= 1
x1
x2 x1-x2= -1
www.Vidyarthiplus.com
21
Bias as extra input
Inputsignal
Synapticweights
Summingfunction
ActivationfunctionLocalField
v Outputy
x1
x2
xm
w2
wm
w1
)(
w0x0 = +1
The bias is an external parameter of the neuron. It can bemodeled by adding an extra input.
bw
xwv jm
jj
0
0
..
www.Vidyarthiplus.com
22
Activation FunctionThere are different activation functions used in different applications. Themost common ones are:
Hard-limiter Piecewise linear Sigmoid Hyperbolic tangent
0001
vifvif
v
2102121211
vifvifv
vifv )exp(1
1av
v vv tanh
www.Vidyarthiplus.com
23
Neuron Models The choice of determines the neuron model. Examples: step function:
ramp function:
sigmoid function:with z,x,y parameters
Gaussian function:
2
21
exp21)(
vv
)exp(11)(
yxvzv
otherwise))/())(((
ifif
)(cdabcva
dvbcva
v
cvbcva
vifif)(
www.Vidyarthiplus.com
24
Learning Algorithms
Depend on the network architecture: Error correcting learning (perceptron) Delta rule (AdaLine, Backprop) Competitive Learning (Self Organizing Maps)
www.Vidyarthiplus.com
25
Applications Classification:
Image recognition Speech recognition Diagnostic Fraud detection
Regression: Forecasting (prediction on base of past history)
Pattern association: Retrieve an image from corrupted one
Clustering: clients profiles disease subtypes
www.Vidyarthiplus.com
26
Supervised Learning Training and test data sets Training set: input & target
www.Vidyarthiplus.com
27
Perceptron: architecture We consider the architecture: feed-forward NN
with one layer It is sufficient to study single layer perceptrons
with just one neuron:
www.Vidyarthiplus.com
28
Single layer perceptrons Generalization to single layer perceptrons with more
neurons is easy because:
The output units are independent among each otherEach weight only affects one of the outputs
www.Vidyarthiplus.com
29
Perceptron: Neuron Model The (McCulloch-Pitts) perceptron is a single layerNN with a non-linear , the sign function
x1
x2
xn
w2w1
wn
b (bias)
v y(v)
0if10if1)(
v
vv
www.Vidyarthiplus.com
30
Perceptron for Classification The perceptron is used for binaryclassification.
Given training examples of classes C1, C2 trainthe perceptron in such a way that it classifiescorrectly the training examples: If the output of the perceptron is +1 then the input is
assigned to class C1 If the output is -1 then the input is assigned to C2
www.Vidyarthiplus.com
31
Perceptron Training How can we train a perceptron for aclassification task?
We try to find suitable values for theweights in such a way that the trainingexamples are correctly classified.
Geometrically, we try to find a hyper-planethat separates the examples of the twoclasses.
www.Vidyarthiplus.com
32
Perceptron Geometric ViewThe equation below describes a (hyper-)plane in the input space
consisting of real valued 2D vectors. The plane splits the inputspace into two regions, each of them describing one class.
0wxw 02
1iii
x2
C1C2 x1
decisionboundary
w1x1 + w2x2 + w0 = 0
decisionregion for C1
w1x1 + w2x2 + w0 >= 0
www.Vidyarthiplus.com
33
Example: AND Here is a representation of the AND function White means false, black means true for the output -1 means false, +1 means true for the input
-1 AND -1 = false
-1 AND +1 = false
+1 AND -1 = false
+1 AND +1 = true
www.Vidyarthiplus.com
34
Example: AND continued A linear decision surface (i.e. a plane in 3Dspace) intersecting the feature space (i.e.the 2D plane where z=0) separates falsefrom true instances
www.Vidyarthiplus.com
35
Example: AND continued Watch a perceptron learn the AND function:
www.Vidyarthiplus.com
36
Example: XOR Heres the XOR function:
-1 XOR -1 = false-1 XOR +1 = true
+1 XOR -1 = true
+1 XOR +1 = false
Perceptrons cannot learn such linearly inseparable functions
www.Vidyarthiplus.com
37
Example: XOR continued
Watch a perceptron try to learn XOR
www.Vidyarthiplus.com
38
Perceptron: Limitations The perceptron can only model linearly separableclasses, like (those described by) the following Booleanfunctions: AND OR COMPLEMENT It cannot model the XOR.
You can experiment with these functions in the Matlabpractical lessons.
www.Vidyarthiplus.com
39
Gradient Descent Learning Rule Perceptron learning rule fails to converge if examplesare not linearly separable Gradient Descent: Consider linear unit withoutthreshold and continuous output o (not just 1,1) o(x)=w0 + w1 x1 + + wn xn Update the wis such that they minimize the squarederror E[w1,,wn] = (x,d)D (d-o(x))2where D is the set of training examples
www.Vidyarthiplus.com
40
Replace the step function in the perceptron with a continuous (differentiable)function f, e.g the simplest is linear function
With or without the threshold, the Adaline is trained based on the output of thefunction f rather than the final output.
f (x)f (x)
(Adaline)(Adaline)
+/
www.Vidyarthiplus.com
41
Incremental StochasticGradient Descent
Batch mode : gradient descentw=w - ED[w] over the entire data DED[w]=1/2d(td-od)2
Incremental mode: gradient descentw=w - Ed[w] over individual training examples dEd[w]=1/2 (td-od)2
Incremental Gradient Descent can approximate Batch Gradient Descentarbitrarily closely if is small enough
www.Vidyarthiplus.com
42
Weights Update Rule:incremental mode
Computation of Gradient(E):
Delta rule for weight update:]xe[
w
ee
w
)w(
T
E
e(n)x(n)w(n)1)w(n
www.Vidyarthiplus.com
43
LMS learning algorithmn=1;initialize w(n) randomly;while (E_tot unsatisfactory and n
44
Perceptron Learning Rule VS.Gradient Descent Rule
Perceptron learning rule guaranteed to succeed if Training examples are linearly separable Sufficiently small learning rate Linear unit training rules uses gradient descent Guaranteed to converge to hypothesis with
minimum squared error Given sufficiently small learning rate Even when training data contains noise Even when training data not separable by H
www.Vidyarthiplus.com
45
Outline INTRODUCTION ADALINE MADALINE Least-Square Learning Rule The proof of Least-Square Learning Rule
www.Vidyarthiplus.com
46
Bernard Widrow and Ted Hoff introduced the Least-Mean-Square algorithm (a.k.a.delta-rule or Widrow-Hoff rule) and used it to train theAdaline (ADAptive Linear Neuron)--The Adaline was similar to the perceptron, except that itused a linear activation function instead of the threshold--The LMS algorithm is still heavily used in adaptive signalprocessing
Widrow and Hoff, 1960
MADALINE: Many ADALINEs; Network of ADALINEs
www.Vidyarthiplus.com
47
Perceptron vs. ADALINE
s
f(s) sgn(s)
tanh(s)1
-1
Percptron: LTUEmperical Hebbian Assumption
ADALINE: LGUGradient-Decent
LTU: sign function; +/- (Positive/Negative)LGU: Continuous and Differentiable Activation functionincluding Linear function
MADALINE: Many ADALINEs; Network of ADALINEs
x0
x1
xnwn
w1
w0
Linear Threshold Unit (LTU) orLinear Graded Unit (LGU)
fy
linear(s)
www.Vidyarthiplus.com
48
ADALINE ADALINE(Adaptive Linear Neuron) is a networkmodel proposed by Bernard Widrow in1959.
single processing element
PEX1
X2
X3
www.Vidyarthiplus.com
49
Method Method : The value in each unit must +1 or 1
net = iiWX
0net1if
0net1net1 221100
Y
XWXWXWWX nn
This is different from perception's transfer function.
www.Vidyarthiplus.com
50
Method (T-Y) , T expected output
ADALINE can solve only linear problem(the limitation)
i i
i i i
W XW W W
www.Vidyarthiplus.com
51
MADALINE MADALINE It is composed of many ADALINEMultilayer Adaline.
YjNo Wij
netjWij
nX
Xi
After the second layer, the majority vote is used.
if more than half of netj 0,thenoutput1,otherwise, output1
www.Vidyarthiplus.com
52
Least-Square Learning Rule (1/2) Least-Square Learning Rule
nnjj
n
i
tj XWXWXWNetXWNet
1100
0.,i.e,
).i.e(,),,,(
pj1).i.e(,),,,(
1
0
100
1
0
10
n
tn
n
tnj
W
WW
WWWWW
X
XX
XXXXX
www.Vidyarthiplus.com
53
Least-Square Learning Rule (2/2) By applying the least-square learning rule theweights is
PRW
,
'RmatrixncorrelatioR
e wherPRW
*
1
1
''
2'
1'1-*
p
XTP
XXRRRRpR
p
j
tjj
t
P
j
tjjP
or
www.Vidyarthiplus.com
54
Exercise: Use Adaline (1/4)
1 2 3
1 2 3
1 1 1
1 1 1Example X 1 X 0 X 1
0 1 1T T T
X1 X2 X3 Tj
X1 1 1 0 1
X2 1 0 1 1
X3 1 1 1 -1
www.Vidyarthiplus.com
Sol. First calculate R
32
31
32
31
32
32
32
321
212122223
31
111111111
111111
101000101
101101
000011011
110011
'
3
'
2
'
1
R
R
R
R
55
www.Vidyarthiplus.com
2-W2-W3W
022
022
1223
0
0
31
32
31
32
31
32
32
32
321
0,0,31100
31
1111,1,11
1011,0,11
1100,1,11
1
2
1
321
321
321
3
2
1
*
3
2
1
WWW
WWW
WWW
W
W
W
PWR
P
P
P
P
t
t
t
t
56
www.Vidyarthiplus.com
Verify the net: 1,1,0 net=3X1-2X2-2X3=1 Y=1 ok1,0,1 net=3X1-2X2-2X3=1 Y=1 ok1,1,1 net=3X1-2X2-2X3=-1 Y=-1 ok
3
ADALINE-2
-2
X1
X2
X3
Y
57
www.Vidyarthiplus.com
58
Proof of Least Square Learning Rule(1/3) Let us use Least Mean Square Error to ensure the minimum
total error. As long as the total error approaches zero, the bestsolution is found. Therefore, we are looking for the minimum of
Proof:2
k
L
k
L
k
tkk
tkk
L
k
L
kkkk
L
k
L
kkkkk
L
kkk
L
kk
WXXWL
YTL
T
TYL
YTL
TL
YYTTL
YTLL
1 1
2k
2k
1 1
2
1
2
1
22
1
2
1
22
])([12
islet121
)2(1)(11mean
mean of
www.Vidyarthiplus.com
59
Proof of Least Square Learning Rule(2/3)
WXXW
WXXWXWxwYps
L
k
tkk
t
k
L
k
L
k
L
k
tkk
tk
tn
iikik
)(
))(()()(.
1
1
1 1 1 1
22
1
2
WXXWWXTT
WXXWWXTL
T
WXXWXWTL
T
WXXL
WYTL
T
tkk
ttkkt
L
k
tkk
ttkkt
L
k
tkk
tk
tkkt
L
k
L
k
tk
tkkt
2
])(1[2
)(2
)](1[2
2
1
2
1
2
1 1
2
www.Vidyarthiplus.com
60
=PRWPRW
PRWW
PRWXTXTRW
WXTRWWTW
WXTRWWT
XXLRR
XT
RXXR
*-
k
tkkkk
tkk
tk
k
k
tkk
tk
tkk
L
k
tkk
kt
kkk
=
=22if
22PLet22
]'2[
minimalissuch that Wfind want toWe2
Ris which)/'(Let
R'R'R'R'R'Let
MatrixnCorrelatiocalledalso,matrixnnaisi.e.,,Let
1*
*
2
22
2
1LK21
Proof of Least Square Learning Rule(3/3)www.Vidyarthiplus.com
61
Comparison of Perceptron and AdalinePerceptron Adaline
Architecture Single-layer Single-layer
Neuronmodel
Non-linear linear
Learningalgorithm
Minimzenumber ofmisclassifiedexamples
Minimize totalerror
Application Linearclassification
Linear classification andregression
www.Vidyarthiplus.com