+ All Categories

unit1

Date post: 10-Nov-2015
Category:
Upload: gowthamucek
View: 3 times
Download: 1 times
Share this document with a friend
Description:
Artificil neural net
Popular Tags:
61
1 UNIT-1 ARCHITECTURE www.Vidyarthiplus.com
Transcript
  • 1UNIT-1

    ARCHITECTURE

    www.Vidyarthiplus.com

  • 2What are Neural Networks? Simple computational elements forming a largenetwork Emphasis on learning (pattern recognition) Local computation (neurons)

    Definition of NNs is vague Often | but not always | inspired by biological brain

    www.Vidyarthiplus.com

  • 3Machine LearningMachine LearningMachine learning involves adaptive mechanismsMachine learning involves adaptive mechanismsthat enable computers to learn from experience,that enable computers to learn from experience,learn by example and learn by analogy. Learninglearn by example and learn by analogy. Learningcapabilities can improve the performance of ancapabilities can improve the performance of anintelligent system over time. The most popularintelligent system over time. The most popularapproaches to machine learning areapproaches to machine learning are artificialartificialneural networksneural networks andand genetic algorithmsgenetic algorithms. This. Thislecture is dedicated to neural networks.lecture is dedicated to neural networks.

    www.Vidyarthiplus.com

  • 4Biological neural networkBiological neural network

    Soma Soma

    Synapse

    Synapse

    Dendrites

    Axon

    Synapse

    DendritesAxon

    www.Vidyarthiplus.com

  • 5The neuron as a simple computing elementThe neuron as a simple computing elementDiagram of a neuronDiagram of a neuron

    Neuron Y

    Input Signals

    x1

    x2

    xn

    Output Signals

    Y

    Y

    Y

    w2

    w1

    wn

    Weights

    www.Vidyarthiplus.com

  • 6Architecture of a typical artificial neural networkArchitecture of a typical artificial neural network

    Input Layer Output Layer

    Middle Layer

    I n p

    u t

    S

    i g n

    a l

    s

    O u t

    p

    u t

    S

    i g n

    a l

    s

    www.Vidyarthiplus.com

  • 7nn AA neural networkneural network can be defined as a model ofcan be defined as a model ofreasoning based on the human brain. The brainreasoning based on the human brain. The brainconsists of a densely interconnected set of nerveconsists of a densely interconnected set of nervecells, or basic informationcells, or basic information--processing units, calledprocessing units, calledneuronsneurons..

    nn The human brain incorporates nearly 10 billionThe human brain incorporates nearly 10 billionneurons and 60 trillion connections,neurons and 60 trillion connections, synapsessynapses,,between them. By using multiple neuronsbetween them. By using multiple neuronssimultaneously, the brain can perform its functionssimultaneously, the brain can perform its functionsmuch faster than the fastest computers inmuch faster than the fastest computers inexistence today.existence today.

    www.Vidyarthiplus.com

  • 8nn Each neuron has a very simple structure, but anEach neuron has a very simple structure, but anarmy of such elements constitutes a tremendousarmy of such elements constitutes a tremendousprocessing power.processing power.

    nn A neuron consists of a cell body,A neuron consists of a cell body, somasoma, a number of, a number offibers calledfibers called dendritesdendrites, and a single long fiber, and a single long fibercalled thecalled the axonaxon..

    www.Vidyarthiplus.com

  • 9nn Our brain can be considered as a highly complex,Our brain can be considered as a highly complex,nonnon--linear and parallel informationlinear and parallel information--processingprocessingsystem.system.

    nn Information is stored and processed in a neuralInformation is stored and processed in a neuralnetwork simultaneously throughout the wholenetwork simultaneously throughout the wholenetwork, rather than at specific locations. In othernetwork, rather than at specific locations. In otherwords, in neural networks, both data and itswords, in neural networks, both data and itsprocessing areprocessing are globalglobal rather than local.rather than local.

    nn Learning is a fundamental and essentialLearning is a fundamental and essentialcharacteristic of biological neural networks. Thecharacteristic of biological neural networks. Theease with which they can learn led to attempts toease with which they can learn led to attempts toemulate a biological neural network in a computer.emulate a biological neural network in a computer.

    www.Vidyarthiplus.com

  • 10

    nn An artificial neural network consists of a number ofAn artificial neural network consists of a number ofvery simple processors, also calledvery simple processors, also called neuronsneurons, which, whichare analogous to the biological neurons in theare analogous to the biological neurons in thebrain.brain.

    nn The neurons are connected by weighted linksThe neurons are connected by weighted linkspassing signals from one neuron to another.passing signals from one neuron to another.

    www.Vidyarthiplus.com

  • 11

    Network Structure The output signal is transmitted through theThe output signal is transmitted through theneuronneurons outgoing connection. The outgoings outgoing connection. The outgoingconnection splits into a number of branchesconnection splits into a number of branchesthat transmit the same signal. The outgoingthat transmit the same signal. The outgoingbranches terminate at the incomingbranches terminate at the incomingconnections of other neurons in the network.connections of other neurons in the network.

    www.Vidyarthiplus.com

  • 12

    Biological Neural Network Artificial Neural NetworkSomaDendriteAxonSynapse

    NeuronInputOutputWeight

    Analogy between biological andAnalogy between biological andartificial neural networksartificial neural networks

    Soma Soma

    Synapse

    Synapse

    Dendrites

    Axon

    Synapse

    DendritesAxon

    Input Layer Output Layer

    Middle LayerI n

    p u

    t S

    i g n

    a l

    s

    O u

    t p

    u t

    S

    i g n

    a l

    s

    www.Vidyarthiplus.com

  • 13

    Course TopicsLearning Tasks

    Supervised UnsupervisedData:Labeled examples(input , desired output)Tasks:classificationpattern recognitionregressionNN models:perceptronadalinefeed-forward NNradial basis functionsupport vector machines

    Data:Unlabeled examples(different realizations of theinput)Tasks:clusteringcontent addressable memoryNN models:self-organizing maps (SOM)Hopfield networks

    www.Vidyarthiplus.com

  • 14

    Network architectures

    Three different classes of network architectures

    single-layer feed-forward neurons are organized multi-layer feed-forward in acyclic layers recurrent

    The architecture of a neural network is linked with thelearning algorithm used to train

    www.Vidyarthiplus.com

  • 15

    Single Layer Feed-forward

    Input layerof

    source nodesOutput layer

    ofneurons

    www.Vidyarthiplus.com

  • 16

    Multi layer feed-forward

    Inputlayer

    Outputlayer

    Hidden Layer

    3-4-2 Network

    www.Vidyarthiplus.com

  • 17

    Recurrent Network with hidden neuron: unit delay operator z-1 isused to model a dynamic system

    z-1

    z-1

    z-1

    Recurrent network

    inputhiddenoutput

    www.Vidyarthiplus.com

  • 18

    The Neuron

    Inputvalues

    weights

    Summingfunction

    Biasb

    ActivationfunctionLocalField

    v Outputy

    x1

    x2

    xm

    w2

    wm

    w1

    )(.

    www.Vidyarthiplus.com

  • 19

    The Neuron The neuron is the basic information processing unit of a

    NN. It consists of:1 A set of links, describing the neuron inputs, withweights W1, W2, , Wm

    2 An adder function (linear combiner) for computing theweighted sum of the inputs(real numbers):

    3 Activation function (squashing function) forlimiting the amplitude of the neuron output.

    m

    1jjxwu

    j

    )(uy b

    www.Vidyarthiplus.com

  • 20

    Bias of a Neuron The bias b has the effect of applying an affine

    transformation to the weighted sum uv = u + b

    v is called induced field of the neuronx2x1u x1-x2=0

    x1-x2= 1

    x1

    x2 x1-x2= -1

    www.Vidyarthiplus.com

  • 21

    Bias as extra input

    Inputsignal

    Synapticweights

    Summingfunction

    ActivationfunctionLocalField

    v Outputy

    x1

    x2

    xm

    w2

    wm

    w1

    )(

    w0x0 = +1

    The bias is an external parameter of the neuron. It can bemodeled by adding an extra input.

    bw

    xwv jm

    jj

    0

    0

    ..

    www.Vidyarthiplus.com

  • 22

    Activation FunctionThere are different activation functions used in different applications. Themost common ones are:

    Hard-limiter Piecewise linear Sigmoid Hyperbolic tangent

    0001

    vifvif

    v

    2102121211

    vifvifv

    vifv )exp(1

    1av

    v vv tanh

    www.Vidyarthiplus.com

  • 23

    Neuron Models The choice of determines the neuron model. Examples: step function:

    ramp function:

    sigmoid function:with z,x,y parameters

    Gaussian function:

    2

    21

    exp21)(

    vv

    )exp(11)(

    yxvzv

    otherwise))/())(((

    ifif

    )(cdabcva

    dvbcva

    v

    cvbcva

    vifif)(

    www.Vidyarthiplus.com

  • 24

    Learning Algorithms

    Depend on the network architecture: Error correcting learning (perceptron) Delta rule (AdaLine, Backprop) Competitive Learning (Self Organizing Maps)

    www.Vidyarthiplus.com

  • 25

    Applications Classification:

    Image recognition Speech recognition Diagnostic Fraud detection

    Regression: Forecasting (prediction on base of past history)

    Pattern association: Retrieve an image from corrupted one

    Clustering: clients profiles disease subtypes

    www.Vidyarthiplus.com

  • 26

    Supervised Learning Training and test data sets Training set: input & target

    www.Vidyarthiplus.com

  • 27

    Perceptron: architecture We consider the architecture: feed-forward NN

    with one layer It is sufficient to study single layer perceptrons

    with just one neuron:

    www.Vidyarthiplus.com

  • 28

    Single layer perceptrons Generalization to single layer perceptrons with more

    neurons is easy because:

    The output units are independent among each otherEach weight only affects one of the outputs

    www.Vidyarthiplus.com

  • 29

    Perceptron: Neuron Model The (McCulloch-Pitts) perceptron is a single layerNN with a non-linear , the sign function

    x1

    x2

    xn

    w2w1

    wn

    b (bias)

    v y(v)

    0if10if1)(

    v

    vv

    www.Vidyarthiplus.com

  • 30

    Perceptron for Classification The perceptron is used for binaryclassification.

    Given training examples of classes C1, C2 trainthe perceptron in such a way that it classifiescorrectly the training examples: If the output of the perceptron is +1 then the input is

    assigned to class C1 If the output is -1 then the input is assigned to C2

    www.Vidyarthiplus.com

  • 31

    Perceptron Training How can we train a perceptron for aclassification task?

    We try to find suitable values for theweights in such a way that the trainingexamples are correctly classified.

    Geometrically, we try to find a hyper-planethat separates the examples of the twoclasses.

    www.Vidyarthiplus.com

  • 32

    Perceptron Geometric ViewThe equation below describes a (hyper-)plane in the input space

    consisting of real valued 2D vectors. The plane splits the inputspace into two regions, each of them describing one class.

    0wxw 02

    1iii

    x2

    C1C2 x1

    decisionboundary

    w1x1 + w2x2 + w0 = 0

    decisionregion for C1

    w1x1 + w2x2 + w0 >= 0

    www.Vidyarthiplus.com

  • 33

    Example: AND Here is a representation of the AND function White means false, black means true for the output -1 means false, +1 means true for the input

    -1 AND -1 = false

    -1 AND +1 = false

    +1 AND -1 = false

    +1 AND +1 = true

    www.Vidyarthiplus.com

  • 34

    Example: AND continued A linear decision surface (i.e. a plane in 3Dspace) intersecting the feature space (i.e.the 2D plane where z=0) separates falsefrom true instances

    www.Vidyarthiplus.com

  • 35

    Example: AND continued Watch a perceptron learn the AND function:

    www.Vidyarthiplus.com

  • 36

    Example: XOR Heres the XOR function:

    -1 XOR -1 = false-1 XOR +1 = true

    +1 XOR -1 = true

    +1 XOR +1 = false

    Perceptrons cannot learn such linearly inseparable functions

    www.Vidyarthiplus.com

  • 37

    Example: XOR continued

    Watch a perceptron try to learn XOR

    www.Vidyarthiplus.com

  • 38

    Perceptron: Limitations The perceptron can only model linearly separableclasses, like (those described by) the following Booleanfunctions: AND OR COMPLEMENT It cannot model the XOR.

    You can experiment with these functions in the Matlabpractical lessons.

    www.Vidyarthiplus.com

  • 39

    Gradient Descent Learning Rule Perceptron learning rule fails to converge if examplesare not linearly separable Gradient Descent: Consider linear unit withoutthreshold and continuous output o (not just 1,1) o(x)=w0 + w1 x1 + + wn xn Update the wis such that they minimize the squarederror E[w1,,wn] = (x,d)D (d-o(x))2where D is the set of training examples

    www.Vidyarthiplus.com

  • 40

    Replace the step function in the perceptron with a continuous (differentiable)function f, e.g the simplest is linear function

    With or without the threshold, the Adaline is trained based on the output of thefunction f rather than the final output.

    f (x)f (x)

    (Adaline)(Adaline)

    +/

    www.Vidyarthiplus.com

  • 41

    Incremental StochasticGradient Descent

    Batch mode : gradient descentw=w - ED[w] over the entire data DED[w]=1/2d(td-od)2

    Incremental mode: gradient descentw=w - Ed[w] over individual training examples dEd[w]=1/2 (td-od)2

    Incremental Gradient Descent can approximate Batch Gradient Descentarbitrarily closely if is small enough

    www.Vidyarthiplus.com

  • 42

    Weights Update Rule:incremental mode

    Computation of Gradient(E):

    Delta rule for weight update:]xe[

    w

    ee

    w

    )w(

    T

    E

    e(n)x(n)w(n)1)w(n

    www.Vidyarthiplus.com

  • 43

    LMS learning algorithmn=1;initialize w(n) randomly;while (E_tot unsatisfactory and n

  • 44

    Perceptron Learning Rule VS.Gradient Descent Rule

    Perceptron learning rule guaranteed to succeed if Training examples are linearly separable Sufficiently small learning rate Linear unit training rules uses gradient descent Guaranteed to converge to hypothesis with

    minimum squared error Given sufficiently small learning rate Even when training data contains noise Even when training data not separable by H

    www.Vidyarthiplus.com

  • 45

    Outline INTRODUCTION ADALINE MADALINE Least-Square Learning Rule The proof of Least-Square Learning Rule

    www.Vidyarthiplus.com

  • 46

    Bernard Widrow and Ted Hoff introduced the Least-Mean-Square algorithm (a.k.a.delta-rule or Widrow-Hoff rule) and used it to train theAdaline (ADAptive Linear Neuron)--The Adaline was similar to the perceptron, except that itused a linear activation function instead of the threshold--The LMS algorithm is still heavily used in adaptive signalprocessing

    Widrow and Hoff, 1960

    MADALINE: Many ADALINEs; Network of ADALINEs

    www.Vidyarthiplus.com

  • 47

    Perceptron vs. ADALINE

    s

    f(s) sgn(s)

    tanh(s)1

    -1

    Percptron: LTUEmperical Hebbian Assumption

    ADALINE: LGUGradient-Decent

    LTU: sign function; +/- (Positive/Negative)LGU: Continuous and Differentiable Activation functionincluding Linear function

    MADALINE: Many ADALINEs; Network of ADALINEs

    x0

    x1

    xnwn

    w1

    w0

    Linear Threshold Unit (LTU) orLinear Graded Unit (LGU)

    fy

    linear(s)

    www.Vidyarthiplus.com

  • 48

    ADALINE ADALINE(Adaptive Linear Neuron) is a networkmodel proposed by Bernard Widrow in1959.

    single processing element

    PEX1

    X2

    X3

    www.Vidyarthiplus.com

  • 49

    Method Method : The value in each unit must +1 or 1

    net = iiWX

    0net1if

    0net1net1 221100

    Y

    XWXWXWWX nn

    This is different from perception's transfer function.

    www.Vidyarthiplus.com

  • 50

    Method (T-Y) , T expected output

    ADALINE can solve only linear problem(the limitation)

    i i

    i i i

    W XW W W

    www.Vidyarthiplus.com

  • 51

    MADALINE MADALINE It is composed of many ADALINEMultilayer Adaline.

    YjNo Wij

    netjWij

    nX

    Xi

    After the second layer, the majority vote is used.

    if more than half of netj 0,thenoutput1,otherwise, output1

    www.Vidyarthiplus.com

  • 52

    Least-Square Learning Rule (1/2) Least-Square Learning Rule

    nnjj

    n

    i

    tj XWXWXWNetXWNet

    1100

    0.,i.e,

    ).i.e(,),,,(

    pj1).i.e(,),,,(

    1

    0

    100

    1

    0

    10

    n

    tn

    n

    tnj

    W

    WW

    WWWWW

    X

    XX

    XXXXX

    www.Vidyarthiplus.com

  • 53

    Least-Square Learning Rule (2/2) By applying the least-square learning rule theweights is

    PRW

    ,

    'RmatrixncorrelatioR

    e wherPRW

    *

    1

    1

    ''

    2'

    1'1-*

    p

    XTP

    XXRRRRpR

    p

    j

    tjj

    t

    P

    j

    tjjP

    or

    www.Vidyarthiplus.com

  • 54

    Exercise: Use Adaline (1/4)

    1 2 3

    1 2 3

    1 1 1

    1 1 1Example X 1 X 0 X 1

    0 1 1T T T

    X1 X2 X3 Tj

    X1 1 1 0 1

    X2 1 0 1 1

    X3 1 1 1 -1

    www.Vidyarthiplus.com

  • Sol. First calculate R

    32

    31

    32

    31

    32

    32

    32

    321

    212122223

    31

    111111111

    111111

    101000101

    101101

    000011011

    110011

    '

    3

    '

    2

    '

    1

    R

    R

    R

    R

    55

    www.Vidyarthiplus.com

  • 2-W2-W3W

    022

    022

    1223

    0

    0

    31

    32

    31

    32

    31

    32

    32

    32

    321

    0,0,31100

    31

    1111,1,11

    1011,0,11

    1100,1,11

    1

    2

    1

    321

    321

    321

    3

    2

    1

    *

    3

    2

    1

    WWW

    WWW

    WWW

    W

    W

    W

    PWR

    P

    P

    P

    P

    t

    t

    t

    t

    56

    www.Vidyarthiplus.com

  • Verify the net: 1,1,0 net=3X1-2X2-2X3=1 Y=1 ok1,0,1 net=3X1-2X2-2X3=1 Y=1 ok1,1,1 net=3X1-2X2-2X3=-1 Y=-1 ok

    3

    ADALINE-2

    -2

    X1

    X2

    X3

    Y

    57

    www.Vidyarthiplus.com

  • 58

    Proof of Least Square Learning Rule(1/3) Let us use Least Mean Square Error to ensure the minimum

    total error. As long as the total error approaches zero, the bestsolution is found. Therefore, we are looking for the minimum of

    Proof:2

    k

    L

    k

    L

    k

    tkk

    tkk

    L

    k

    L

    kkkk

    L

    k

    L

    kkkkk

    L

    kkk

    L

    kk

    WXXWL

    YTL

    T

    TYL

    YTL

    TL

    YYTTL

    YTLL

    1 1

    2k

    2k

    1 1

    2

    1

    2

    1

    22

    1

    2

    1

    22

    ])([12

    islet121

    )2(1)(11mean

    mean of

    www.Vidyarthiplus.com

  • 59

    Proof of Least Square Learning Rule(2/3)

    WXXW

    WXXWXWxwYps

    L

    k

    tkk

    t

    k

    L

    k

    L

    k

    L

    k

    tkk

    tk

    tn

    iikik

    )(

    ))(()()(.

    1

    1

    1 1 1 1

    22

    1

    2

    WXXWWXTT

    WXXWWXTL

    T

    WXXWXWTL

    T

    WXXL

    WYTL

    T

    tkk

    ttkkt

    L

    k

    tkk

    ttkkt

    L

    k

    tkk

    tk

    tkkt

    L

    k

    L

    k

    tk

    tkkt

    2

    ])(1[2

    )(2

    )](1[2

    2

    1

    2

    1

    2

    1 1

    2

    www.Vidyarthiplus.com

  • 60

    =PRWPRW

    PRWW

    PRWXTXTRW

    WXTRWWTW

    WXTRWWT

    XXLRR

    XT

    RXXR

    *-

    k

    tkkkk

    tkk

    tk

    k

    k

    tkk

    tk

    tkk

    L

    k

    tkk

    kt

    kkk

    =

    =22if

    22PLet22

    ]'2[

    minimalissuch that Wfind want toWe2

    Ris which)/'(Let

    R'R'R'R'R'Let

    MatrixnCorrelatiocalledalso,matrixnnaisi.e.,,Let

    1*

    *

    2

    22

    2

    1LK21

    Proof of Least Square Learning Rule(3/3)www.Vidyarthiplus.com

  • 61

    Comparison of Perceptron and AdalinePerceptron Adaline

    Architecture Single-layer Single-layer

    Neuronmodel

    Non-linear linear

    Learningalgorithm

    Minimzenumber ofmisclassifiedexamples

    Minimize totalerror

    Application Linearclassification

    Linear classification andregression

    www.Vidyarthiplus.com


Recommended