unit1

1UNIT-1

ARCHITECTURE

www.Vidyarthiplus.com

2What are Neural Networks? Simple computational elements forming a largenetwork Emphasis on learning (pattern recognition) Local computation (neurons)

Definition of NNs is vague Often | but not always | inspired by biological brain


3Machine LearningMachine LearningMachine learning involves adaptive mechanismsMachine learning involves adaptive mechanismsthat enable computers to learn from experience,that enable computers to learn from experience,learn by example and learn by analogy. Learninglearn by example and learn by analogy. Learningcapabilities can improve the performance of ancapabilities can improve the performance of anintelligent system over time. The most popularintelligent system over time. The most popularapproaches to machine learning areapproaches to machine learning are artificialartificialneural networksneural networks andand genetic algorithmsgenetic algorithms. This. Thislecture is dedicated to neural networks.lecture is dedicated to neural networks.


4Biological neural networkBiological neural network

Soma Soma

Synapse

Synapse

Dendrites

Axon

Synapse

DendritesAxon


5The neuron as a simple computing elementThe neuron as a simple computing elementDiagram of a neuronDiagram of a neuron

Neuron Y

Input Signals

x1

x2

xn

Output Signals

Y

Y

Y

w2

w1

wn

Weights


6Architecture of a typical artificial neural networkArchitecture of a typical artificial neural network

Input Layer Output Layer

Middle Layer

I n p

u t

S

i g n

a l

s

O u t

p

u t

S

i g n

a l

s


7nn AA neural networkneural network can be defined as a model ofcan be defined as a model ofreasoning based on the human brain. The brainreasoning based on the human brain. The brainconsists of a densely interconnected set of nerveconsists of a densely interconnected set of nervecells, or basic informationcells, or basic information--processing units, calledprocessing units, calledneuronsneurons..

nn The human brain incorporates nearly 10 billionThe human brain incorporates nearly 10 billionneurons and 60 trillion connections,neurons and 60 trillion connections, synapsessynapses,,between them. By using multiple neuronsbetween them. By using multiple neuronssimultaneously, the brain can perform its functionssimultaneously, the brain can perform its functionsmuch faster than the fastest computers inmuch faster than the fastest computers inexistence today.existence today.


8nn Each neuron has a very simple structure, but anEach neuron has a very simple structure, but anarmy of such elements constitutes a tremendousarmy of such elements constitutes a tremendousprocessing power.processing power.

nn A neuron consists of a cell body,A neuron consists of a cell body, somasoma, a number of, a number offibers calledfibers called dendritesdendrites, and a single long fiber, and a single long fibercalled thecalled the axonaxon..


9nn Our brain can be considered as a highly complex,Our brain can be considered as a highly complex,nonnon--linear and parallel informationlinear and parallel information--processingprocessingsystem.system.

nn Information is stored and processed in a neuralInformation is stored and processed in a neuralnetwork simultaneously throughout the wholenetwork simultaneously throughout the wholenetwork, rather than at specific locations. In othernetwork, rather than at specific locations. In otherwords, in neural networks, both data and itswords, in neural networks, both data and itsprocessing areprocessing are globalglobal rather than local.rather than local.

nn Learning is a fundamental and essentialLearning is a fundamental and essentialcharacteristic of biological neural networks. Thecharacteristic of biological neural networks. Theease with which they can learn led to attempts toease with which they can learn led to attempts toemulate a biological neural network in a computer.emulate a biological neural network in a computer.


10

nn An artificial neural network consists of a number ofAn artificial neural network consists of a number ofvery simple processors, also calledvery simple processors, also called neuronsneurons, which, whichare analogous to the biological neurons in theare analogous to the biological neurons in thebrain.brain.

nn The neurons are connected by weighted linksThe neurons are connected by weighted linkspassing signals from one neuron to another.passing signals from one neuron to another.


11

Network Structure The output signal is transmitted through theThe output signal is transmitted through theneuronneurons outgoing connection. The outgoings outgoing connection. The outgoingconnection splits into a number of branchesconnection splits into a number of branchesthat transmit the same signal. The outgoingthat transmit the same signal. The outgoingbranches terminate at the incomingbranches terminate at the incomingconnections of other neurons in the network.connections of other neurons in the network.


12

Biological Neural Network Artificial Neural NetworkSomaDendriteAxonSynapse

NeuronInputOutputWeight

Analogy between biological andAnalogy between biological andartificial neural networksartificial neural networks

Soma Soma

Synapse

Synapse

Dendrites

Axon

Synapse

DendritesAxon

Input Layer Output Layer

Middle LayerI n

p u

t S

i g n

a l

s

O u

t p

u t

S

i g n

a l

s


13

Course TopicsLearning Tasks

Supervised UnsupervisedData:Labeled examples(input , desired output)Tasks:classificationpattern recognitionregressionNN models:perceptronadalinefeed-forward NNradial basis functionsupport vector machines

Data:Unlabeled examples(different realizations of theinput)Tasks:clusteringcontent addressable memoryNN models:self-organizing maps (SOM)Hopfield networks


14

Network architectures

Three different classes of network architectures

single-layer feed-forward neurons are organized multi-layer feed-forward in acyclic layers recurrent

The architecture of a neural network is linked with thelearning algorithm used to train


15

Single Layer Feed-forward

Input layerof

source nodesOutput layer

ofneurons


16

Multi layer feed-forward

Inputlayer

Outputlayer

Hidden Layer

3-4-2 Network


17

Recurrent Network with hidden neuron: unit delay operator z-1 isused to model a dynamic system

z-1

z-1

z-1

Recurrent network

inputhiddenoutput


18

The Neuron

Inputvalues

weights

Summingfunction

Biasb

ActivationfunctionLocalField

v Outputy

x1

x2

xm

w2

wm

w1

)(.


19

The Neuron The neuron is the basic information processing unit of a

NN. It consists of:1 A set of links, describing the neuron inputs, withweights W1, W2, , Wm

2 An adder function (linear combiner) for computing theweighted sum of the inputs(real numbers):

3 Activation function (squashing function) forlimiting the amplitude of the neuron output.

m

1jjxwu

j

)(uy b


20

Bias of a Neuron The bias b has the effect of applying an affine

transformation to the weighted sum uv = u + b

v is called induced field of the neuronx2x1u x1-x2=0

x1-x2= 1

x1

x2 x1-x2= -1


21

Bias as extra input

Inputsignal

Synapticweights

Summingfunction

ActivationfunctionLocalField

v Outputy

x1

x2

xm

w2

wm

w1

)(

w0x0 = +1

The bias is an external parameter of the neuron. It can bemodeled by adding an extra input.

bw

xwv jm

jj

0

0

..


22

Activation FunctionThere are different activation functions used in different applications. Themost common ones are:

Hard-limiter Piecewise linear Sigmoid Hyperbolic tangent

0001

vifvif

v

2102121211

vifvifv

vifv )exp(1

1av

v vv tanh


23

Neuron Models The choice of determines the neuron model. Examples: step function:

ramp function:

sigmoid function:with z,x,y parameters

Gaussian function:

2

21

exp21)(

vv

)exp(11)(

yxvzv

otherwise))/())(((

ifif

)(cdabcva

dvbcva

v

cvbcva

vifif)(


24

Learning Algorithms

Depend on the network architecture: Error correcting learning (perceptron) Delta rule (AdaLine, Backprop) Competitive Learning (Self Organizing Maps)


25

Applications Classification:

Image recognition Speech recognition Diagnostic Fraud detection

Regression: Forecasting (prediction on base of past history)

Pattern association: Retrieve an image from corrupted one

Clustering: clients profiles disease subtypes


26

Supervised Learning Training and test data sets Training set: input & target


27

Perceptron: architecture We consider the architecture: feed-forward NN

with one layer It is sufficient to study single layer perceptrons

with just one neuron:


28

Single layer perceptrons Generalization to single layer perceptrons with more

neurons is easy because:

The output units are independent among each otherEach weight only affects one of the outputs


29

Perceptron: Neuron Model The (McCulloch-Pitts) perceptron is a single layerNN with a non-linear , the sign function

x1

x2

xn

w2w1

wn

b (bias)

v y(v)

0if10if1)(

v

vv


30

Perceptron for Classification The perceptron is used for binaryclassification.

Given training examples of classes C1, C2 trainthe perceptron in such a way that it classifiescorrectly the training examples: If the output of the perceptron is +1 then the input is

assigned to class C1 If the output is -1 then the input is assigned to C2


31

Perceptron Training How can we train a perceptron for aclassification task?

We try to find suitable values for theweights in such a way that the trainingexamples are correctly classified.

Geometrically, we try to find a hyper-planethat separates the examples of the twoclasses.


32

Perceptron Geometric ViewThe equation below describes a (hyper-)plane in the input space

consisting of real valued 2D vectors. The plane splits the inputspace into two regions, each of them describing one class.

0wxw 02

1iii

x2

C1C2 x1

decisionboundary

w1x1 + w2x2 + w0 = 0

decisionregion for C1

w1x1 + w2x2 + w0 >= 0


33

Example: AND Here is a representation of the AND function White means false, black means true for the output -1 means false, +1 means true for the input

-1 AND -1 = false

-1 AND +1 = false

+1 AND -1 = false

+1 AND +1 = true


34

Example: AND continued A linear decision surface (i.e. a plane in 3Dspace) intersecting the feature space (i.e.the 2D plane where z=0) separates falsefrom true instances


35

Example: AND continued Watch a perceptron learn the AND function:


36

Example: XOR Heres the XOR function:

-1 XOR -1 = false-1 XOR +1 = true

+1 XOR -1 = true

+1 XOR +1 = false

Perceptrons cannot learn such linearly inseparable functions


37

Example: XOR continued

Watch a perceptron try to learn XOR


38

Perceptron: Limitations The perceptron can only model linearly separableclasses, like (those described by) the following Booleanfunctions: AND OR COMPLEMENT It cannot model the XOR.

You can experiment with these functions in the Matlabpractical lessons.


39

Gradient Descent Learning Rule Perceptron learning rule fails to converge if examplesare not linearly separable Gradient Descent: Consider linear unit withoutthreshold and continuous output o (not just 1,1) o(x)=w0 + w1 x1 + + wn xn Update the wis such that they minimize the squarederror E[w1,,wn] = (x,d)D (d-o(x))2where D is the set of training examples


40

Replace the step function in the perceptron with a continuous (differentiable)function f, e.g the simplest is linear function

With or without the threshold, the Adaline is trained based on the output of thefunction f rather than the final output.

f (x)f (x)

(Adaline)(Adaline)

+/


41

Incremental StochasticGradient Descent

Batch mode : gradient descentw=w - ED[w] over the entire data DED[w]=1/2d(td-od)2

Incremental mode: gradient descentw=w - Ed[w] over individual training examples dEd[w]=1/2 (td-od)2

Incremental Gradient Descent can approximate Batch Gradient Descentarbitrarily closely if is small enough


42

Weights Update Rule:incremental mode

Computation of Gradient(E):

Delta rule for weight update:]xe[

w

ee

w

)w(

T

E

e(n)x(n)w(n)1)w(n


43

LMS learning algorithmn=1;initialize w(n) randomly;while (E_tot unsatisfactory and n

44

Perceptron Learning Rule VS.Gradient Descent Rule

Perceptron learning rule guaranteed to succeed if Training examples are linearly separable Sufficiently small learning rate Linear unit training rules uses gradient descent Guaranteed to converge to hypothesis with

minimum squared error Given sufficiently small learning rate Even when training data contains noise Even when training data not separable by H


45

Outline INTRODUCTION ADALINE MADALINE Least-Square Learning Rule The proof of Least-Square Learning Rule


46

Bernard Widrow and Ted Hoff introduced the Least-Mean-Square algorithm (a.k.a.delta-rule or Widrow-Hoff rule) and used it to train theAdaline (ADAptive Linear Neuron)--The Adaline was similar to the perceptron, except that itused a linear activation function instead of the threshold--The LMS algorithm is still heavily used in adaptive signalprocessing

Widrow and Hoff, 1960

MADALINE: Many ADALINEs; Network of ADALINEs


47

Perceptron vs. ADALINE

s

f(s) sgn(s)

tanh(s)1

-1

Percptron: LTUEmperical Hebbian Assumption

ADALINE: LGUGradient-Decent

LTU: sign function; +/- (Positive/Negative)LGU: Continuous and Differentiable Activation functionincluding Linear function

MADALINE: Many ADALINEs; Network of ADALINEs

x0

x1

xnwn

w1

w0

Linear Threshold Unit (LTU) orLinear Graded Unit (LGU)

fy

linear(s)


48

ADALINE ADALINE(Adaptive Linear Neuron) is a networkmodel proposed by Bernard Widrow in1959.

single processing element

PEX1

X2

X3


49

Method Method : The value in each unit must +1 or 1

net = iiWX

0net1if

0net1net1 221100

Y

XWXWXWWX nn

This is different from perception's transfer function.


50

Method (T-Y) , T expected output

ADALINE can solve only linear problem(the limitation)

i i

i i i

W XW W W


51

MADALINE MADALINE It is composed of many ADALINEMultilayer Adaline.

YjNo Wij

netjWij

nX

Xi

After the second layer, the majority vote is used.

if more than half of netj 0,thenoutput1,otherwise, output1


52

Least-Square Learning Rule (1/2) Least-Square Learning Rule

nnjj

n

i

tj XWXWXWNetXWNet

1100

0.,i.e,

).i.e(,),,,(

pj1).i.e(,),,,(

1

0

100

1

0

10

n

tn

n

tnj

W

WW

WWWWW

X

XX

XXXXX


53

Least-Square Learning Rule (2/2) By applying the least-square learning rule theweights is

PRW

,

'RmatrixncorrelatioR

e wherPRW

*

1

1

''

2'

1'1-*

p

XTP

XXRRRRpR

p

j

tjj

t

P

j

tjjP

or


54

Exercise: Use Adaline (1/4)

1 2 3

1 2 3

1 1 1

1 1 1Example X 1 X 0 X 1

0 1 1T T T

X1 X2 X3 Tj

X1 1 1 0 1

X2 1 0 1 1

X3 1 1 1 -1


Sol. First calculate R

32

31

32

31

32

32

32

321

212122223

31

111111111

111111

101000101

101101

000011011

110011

'

3

'

2

'

1

R

R

R

R

55


2-W2-W3W

022

022

1223

0

0

31

32

31

32

31

32

32

32

321

0,0,31100

31

1111,1,11

1011,0,11

1100,1,11

1

2

1

321

321

321

3

2

1

*

3

2

1

WWW

WWW

WWW

W

W

W

PWR

P

P

P

P

t

t

t

t

56


Verify the net: 1,1,0 net=3X1-2X2-2X3=1 Y=1 ok1,0,1 net=3X1-2X2-2X3=1 Y=1 ok1,1,1 net=3X1-2X2-2X3=-1 Y=-1 ok

3

ADALINE-2

-2

X1

X2

X3

Y

57


58

Proof of Least Square Learning Rule(1/3) Let us use Least Mean Square Error to ensure the minimum

total error. As long as the total error approaches zero, the bestsolution is found. Therefore, we are looking for the minimum of

Proof:2

k

L

k

L

k

tkk

tkk

L

k

L

kkkk

L

k

L

kkkkk

L

kkk

L

kk

WXXWL

YTL

T

TYL

YTL

TL

YYTTL

YTLL

1 1

2k

2k

1 1

2

1

2

1

22

1

2

1

22

])([12

islet121

)2(1)(11mean

mean of


59

Proof of Least Square Learning Rule(2/3)

WXXW

WXXWXWxwYps

L

k

tkk

t

k

L

k

L

k

L

k

tkk

tk

tn

iikik

)(

))(()()(.

1

1

1 1 1 1

22

1

2

WXXWWXTT

WXXWWXTL

T

WXXWXWTL

T

WXXL

WYTL

T

tkk

ttkkt

L

k

tkk

ttkkt

L

k

tkk

tk

tkkt

L

k

L

k

tk

tkkt

2

])(1[2

)(2

)](1[2

2

1

2

1

2

1 1

2


60

=PRWPRW

PRWW

PRWXTXTRW

WXTRWWTW

WXTRWWT

XXLRR

XT

RXXR

*-

k

tkkkk

tkk

tk

k

k

tkk

tk

tkk

L

k

tkk

kt

kkk

=

=22if

22PLet22

]'2[

minimalissuch that Wfind want toWe2

Ris which)/'(Let

R'R'R'R'R'Let

MatrixnCorrelatiocalledalso,matrixnnaisi.e.,,Let

1*

*

2

22

2

1LK21

Proof of Least Square Learning Rule(3/3)www.Vidyarthiplus.com

61

Comparison of Perceptron and AdalinePerceptron Adaline

Architecture Single-layer Single-layer

Neuronmodel

Non-linear linear

Learningalgorithm

Minimzenumber ofmisclassifiedexamples

Minimize totalerror

Application Linearclassification

Linear classification andregression


Date post:	10-Nov-2015
Category:	Documents
Upload:	gowthamucek
View:	3 times
Download:	1 times

unit1

Documents