+ All Categories
Home > Education > Neural Networks

Neural Networks

Date post: 11-May-2015
Category:
Upload: rajendra-akerkar
View: 758 times
Download: 1 times
Share this document with a friend
Popular Tags:
45
Feed forward Neural Nets Feed-forward Neural Nets & SelfOrganising Maps R. Akerkar TMRF, Kolhapur, India September-6-11 Data Mining - R. Akerkar 1
Transcript
Page 1: Neural Networks

Feed forward Neural NetsFeed-forward Neural Nets& Self–Organising Maps g g p

R. AkerkarTMRF, Kolhapur, India

September-6-11 Data Mining - R. Akerkar 1

Page 2: Neural Networks

Feed–Forward Neural Networks

HISTORICAL BACKGROUND 1943 McCulloch and Pitts proposed the first

computational model of a neuron1949 H bb d th fi t l i l 1949 Hebb proposed the first learning rule

1958 Rosenblatt’s work on perceptrons 1969 Minsky and Papert’s paper exposed limitations 1969 Minsky and Papert s paper exposed limitations

of the theory 1970s Decade of dormancy for neural networks 1980–90s Neural network return (self–organisation, back–propagation algorithms, etc)

September-6-11 Data Mining - R. Akerkar 2

Page 3: Neural Networks

SOME FACTS

Human brain contains 1011 neurons Human brain contains 1011 neurons Each neuron is connected 104 others

Some scientists compared the brain with a Some scientists compared the brain with a “complex, nonlinear, parallel computer”.Th l t d l t k hi The largest modern neural networks achieve the complexity comparable to a nervous system of a flysystem of a fly.

September-6-11 Data Mining - R. Akerkar 3

Page 4: Neural Networks

Neuron

The main purpose ofThe main purpose of neurons is to receive, analyse and transmit f th th i f ti ifurther the information in a form of signals (electric pulses).pulses).

When neuron sends theWhen neuron sends the information we say that a neuron “fires”.

September-6-11 Data Mining - R. Akerkar 4

Page 5: Neural Networks

EXCITATION AND INHIBITION

The receptors of a neuron are called synapses, and they are locatedon many branches called dendrites. There are many types ofsynapses, but roughly they can be divided into two classes:

Excitatory a signal received at this synapse “encourages” the Excitatory — a signal received at this synapse encourages the neuron to fire.

Inhibitory – a signal received at this synapse will try to make the “ h t ”neuron “shut up”.

The neuron analyses all the signals received at its synapses. If mostof them are encouraging then the neuron gets “excited” and fires itsof them are encouraging, then the neuron gets excited and fires itsown message along a single wire called axon. The axon may havebranches to reach as many other neurons as possible.

September-6-11 Data Mining - R. Akerkar 5

Page 6: Neural Networks

A MODEL OF A SINGLE NEURON (UNIT) In 1943 McCulloch and Pitts proposed the followingIn 1943 McCulloch and Pitts proposed the following

idea:

Denote the incoming signals by x = (x1, x2, . . . , xn) (the input),

and the output of a neuron by y (the output y = f(x)).

September-6-11 Data Mining - R. Akerkar 6

Page 7: Neural Networks

WEIGHTED INPUT

Synapses (receptors) of a neuron have weights w = (w w w ) which can have positivew = (w1,w2, . . . ,wn), which can have positive (excitatory) or negative (inhibitory) values. Each incoming signal is multiplied by the weight of the g g p y greceiving synapse wixi. Then all the “weighted” inputs are added together into a weighted sum v:

n

v = w1x1 + w2x2 + · · · + wnxn = i=1 wixi = (w, x)n

Example Let x = (0, 1, 1) and w = (1,−2, 4). Thenv = 1 · 0 − 2 · 1 + 4 · 1 = 2

September-6-11 Data Mining - R. Akerkar 7

Page 8: Neural Networks

ACTIVATION (TRANSFER) FUNCTION

The output of a neuron y is decided by the activation function ϕ (also transfer function), which uses the weighted sum v as the

targument: y = ϕ(v)

The most popular is a step function ( threshold function):

If the weighted sum v is large enough (e.g. v = 2 > 0), then the neuron fires (y = ϕ(2) = 1).

September-6-11 Data Mining - R. Akerkar 8

Page 9: Neural Networks

EXAMPLES OF ACTIVATION FUNCTIONSFUNCTIONS

September-6-11 Data Mining - R. Akerkar 9

Page 10: Neural Networks

FEED–FORWARD NEURAL NETWORKS A collection of neurons connected together in a network can be

represented by a directed graph:

Nodes and arrows represent neurons and links with theNodes and arrows represent neurons and links with the direction of a signal flow between them. Each node has its number and a link between two nodes will have a pair of numbers (e.g. (1, 4) connecting nodes 1 and 4).

A neural network that does not contain cycles (feedback loops) is called a feed–forward network (or perceptron).

September-6-11 Data Mining - R. Akerkar 10

Page 11: Neural Networks

INPUT AND OUTPUT NODES

Input nodes receive the signal directly from the environment (nodes 1, 2 and 3). They do not compute anything, but simply transfer the input values.

Output nodes send the signal directly to the environment (nodes 4 and 5).

September-6-11 Data Mining - R. Akerkar 11

Page 12: Neural Networks

HIDDEN NODES AND LAYERS

A network may have hidden nodes — they are not connected directly to the environment (“hidden” inside the network):

We may organise nodes in layers: input (1,2,3), hidden (4,5) and t t (6 7) l S ff t k h l hiddoutput (6,7) layers. Some ff networks can have several hidden

layers.

September-6-11 Data Mining - R. Akerkar 12

Page 13: Neural Networks

WEIGHTS

Each jth node in a network has a set of weights wij . For example, d 4 h f i h ( )node 4 has a set of weights w4 = (w14,w24,w34).

A network is defined if we know its topology (its graph), the set of all weights wij and the transfer functions ϕ of all nodes.

September-6-11 Data Mining - R. Akerkar 13

Page 14: Neural Networks

Example

What will be the network output if the inputs are x1 = 1 and x2 = 0?

September-6-11 Data Mining - R. Akerkar 14

Page 15: Neural Networks

Answer Calculate weighted sums in the first hidden layer:

v3 = w13x1 + w23x2 = 2 · 1 − 3 · 0 = 2v4 = w14x1 + w24x2 = 1 · 1 + 4 · 0 = 1

Apply the transfer function: Apply the transfer function:y3 = ϕ(2) = 1, y4 = ϕ(1) = 1

Thus, the input to output layer (node 5) is (1, 1). Thus, the input to output layer (node 5) is (1, 1). Now, calculate the weighted sum of node 5:

v5 = w35y3 + w45y4 = 2 · 1 − 1 · 1 = 1 The output is y5 = ϕ(1) = 1

September-6-11 Data Mining - R. Akerkar 15

Page 16: Neural Networks

TRAININGLet us inverse the previous problem:

S th t th i t t th t k 1 d 0 d Suppose that the inputs to the network are x1 = 1 and x2 = 0, and ϕ is a step function as in previous example. Find values of weights wij such that the output of the network y5 = 0.

This problem is much more difficult, because it has infinite number of solutions. The process of finding a set of weights such that for a given input the network produces the desired output isthat for a given input the network produces the desired output is called training.

Algorithms for training neural networks can be supervised (with Algorithms for training neural networks can be supervised (with a “teacher”) and unsupervised (self–organising)

September-6-11 Data Mining - R. Akerkar 16

Page 17: Neural Networks

SUPERVISED LEARNING

A set of pairs of inputs with their corresponding desired outputs is called a training set. We may think of a training set as a set of examples. Supervised learning can be described by the f ll i dfollowing procedure:

1. Initially set all the weights to some random valuesy g2. Feed the network with an input from one of the examples in

the training set3. Compare the output of the network with the desired output4. Correct the error by adjusting the weights of the nodes5. Repeat from step 2 with another example from the training

set

September-6-11 Data Mining - R. Akerkar 17

Page 18: Neural Networks

Lab 12 (a)

Consider the unit shown in the figure. Suppose that the weights corresponding to the three inputs have the following values:

w1 = 2w2 = -4W 1W3 = 1

and the activation of the unit is given by the step function:

Calculate what will be the output value y of the unit for each of the p yfollowing input patterns:

September-6-11 Data Mining - R. Akerkar 18

Page 19: Neural Networks

Solution 12 (a)

To find the output value y for each pattern we have to:p y pa) Calculate the weighted sum:

v = i wi xi = w1 x1 + w2 x2 + w3 x3

b) Apply the activation function to vThe calculations for each input pattern are:

September-6-11 Data Mining - R. Akerkar 19

Page 20: Neural Networks

Lab 12 (b)

September-6-11 Data Mining - R. Akerkar 20

Page 21: Neural Networks

Solution 12 (b)

September-6-11 Data Mining - R. Akerkar 21

Continued…

Page 22: Neural Networks

Solution 12 (b)

September-6-11 Data Mining - R. Akerkar 22

Page 23: Neural Networks

Self–Organising Maps (SOM)

HISTORICAL BACKGROUNDHISTORICAL BACKGROUND

1960s Vector quantisation problems studied by q p ymathematicians (Glienn, 1964; Stratonowitch, 1966).

1973 von der Malsburg did the first computer simulation demonstrating self–organisation.

1976 Willshaw and von der Malsburg suggested the idea of SOMidea of SOM.

1980s Kohonen further developed and studied computational algorithms for SOM

September-6-11 Data Mining - R. Akerkar 23

computational algorithms for SOM.

Page 24: Neural Networks

EUCLIDEAN SPACE

Points in Euclidean space have coordinates (e.g. x, y, z) presented by real numbers R. We denote n–dimensional space by Rn.

Every point in Rn is defined by n coordinates:y p y{x1, . . . , xn}

or by an n–dimensional Vector

x = (x1, . . . , xn)

September-6-11 Data Mining - R. Akerkar 24

Page 25: Neural Networks

EXAMPLES

Example 1 In R1 (one–dimensional space or Example 1 In R (one dimensional space or a line) points are represented by just one number, such as a = (2) or b = (−1)., ( ) ( )

Example 2 In R3 (three–dimensional space) Example 2 In R3 (three–dimensional space) points are represented by three coordinates x y and z (or x1 x2 and x3) such asx, y and z (or x1, x2 and x3), such as a = (2,−1, 3).

September-6-11 Data Mining - R. Akerkar 25

Page 26: Neural Networks

EUCLIDEAN DISTANCE

Distance between two points a = (a1, . . . , an) and b = p ( 1, , n)(b1, . . . , bn) in Euclidean space Rn is calculated as:

September-6-11 Data Mining - R. Akerkar 26

Page 27: Neural Networks

EXAMPLES

September-6-11 Data Mining - R. Akerkar 27

Page 28: Neural Networks

MULTIDIMENSIONAL DATA IN BUSINESS A bank gathered information about its customers:g

We may consider each entry as a coordinate xi and all the information about one customer as a point inall the information about one customer as a point in Rn (n–dimensional space).

How to analyse such data?

September-6-11 Data Mining - R. Akerkar 28

Page 29: Neural Networks

CLUSTERS

Multivariate analysis offers variety of methods to analyse multidimensional data (e.g. NN). SOM is one of such techniques. O f th i l i t fi d l t f i tOne of the main goals is to find clusters of points.

Clusters are groups of points close to each other. “Similar” customers would have small Euclidean distance between

them and would belong to the same group (cluster).

September-6-11 Data Mining - R. Akerkar 29

g g p ( )

Page 30: Neural Networks

SOM ARCHITECTURE

SOM uses neural networks without hidden layer and with yneurons in the output layer competing with each other, so that only one neuron (the winner) can fire at a time.

September-6-11 Data Mining - R. Akerkar 30

Page 31: Neural Networks

SOM ARCHITECTURE (CONT.)

Input layer has n nodes. We can represent an input pattern by n–di i l t ( ) Rndimensional vector x = (x1, . . . , xn) ∈ Rn.

Each neuron j on the output layer is connected to all input nodes, so each neuron has n weights We represent them by n dimensionaleach neuron has n weights. We represent them by n–dimensional vector wj = (w1j, . . . ,wnj) ∈ Rn.

Usually neurons in the output layer are arranged in a line (one–y p y g (dimensional lattice) or in a plane (two–dimensional).

SOM uses unsupervised learning algorithm, which organises weights j i h l i h h “ i i ” h h i i f hwj in the output lattice so that they “mimic” the characteristics of the

input patterns.

September-6-11 Data Mining - R. Akerkar 31

Page 32: Neural Networks

HOW DOES AN SOM WORK

The algorithm consists of three processes: competition, g p p ,cooperation and adaptation.

Competition Input pattern x = (x1, . . . , xn) is compared ith th i ht t ( ) fwith the weight vector wj = (w1j, . . . ,wnj) of every neuron

in the output layer. The winner is the neuron whose weight wj is the closest to the input x in terms of jEuclidean distance:

September-6-11 Data Mining - R. Akerkar 32

Page 33: Neural Networks

Example

Consider SOM with three inputs and two output nodes (A d B) L t (2 1 3) d ( 2 0 1)and B). Let wA = (2,−1, 3) and wB = (−2, 0, 1).

Find which node wins if the inputx = (1 −2 2)x (1, 2, 2)

Solution:

What if x = (−1 −2 0)?

September-6-11 Data Mining - R. Akerkar 33

What if x = (−1,−2, 0)?

Page 34: Neural Networks

Cooperation The winner helps its neighbours in the output lattice. Those nodes which are closer to the winner in the lattice get more

help, those which are further away get less.

If the winner is node i, then the amount of help to node j is calculated using the neighbourhood function hij(dij), where dij is the distance between i and j in the lattice. A good example of hij(d) is j g p ij( )Gaussian function:

Note that the winner also helps itself more than others (for dii = 0).

September-6-11 Data Mining - R. Akerkar 34

Page 35: Neural Networks

Adaptation

After the input x has been presented to SOM, the weights wj of jthe nodes are adjusted so that they become “closer” to the input. The exact formula for adaptation of weights is:

w’j = wj + αhij [x − wj ] ,

where α is the learning rate coefficient.

One can see that the amount of change depends on the neighbourhood hij of the winner. So, the winner helps itself and its neighbours to adapt.

Finally, the neighbourhood hij is also a function of time, such that the neighbourhood shrinks with time (e.g. σ decreases with t).

September-6-11 Data Mining - R. Akerkar 35

Page 36: Neural Networks

Example

Let us adapt the winning node from earlier Example ( (2 1 3) f (1 2 2))(wA = (2,−1, 3) for x = (1,−2, 2)) if α = 0.5 and h = 1:

September-6-11 Data Mining - R. Akerkar 36

Page 37: Neural Networks

TRAINING PROCEDURE

1 Initially set all the weights to some random1. Initially set all the weights to some random values

2 Feed a set of data into the network2. Feed a set of data into the network3. Find the winner4 Adj t th i ht f th i d it4. Adjust the weight of the winner and its

neighbours to be more like the input5 R t f t 2 til th t k5. Repeat from step 2 until the network

stabilises

September-6-11 Data Mining - R. Akerkar 37

Page 38: Neural Networks

APPLICATIONS OF SOM IN BUSINESS SOM can be very useful during the intelligence y g g

phase of decision making. It helps to analyse and understand rather complex and large amounts of information (data)information (data).

Ability to visualise multi–dimensional data can be used for presentations and reports.

Identifying clusters in the data (e.g. typical groups of customers) can help optimise distribution of resources (e g advertising products selection etc)resources (e.g. advertising, products selection, etc).

Can be used to identify credit–card fraud, errors in data, etc.

September-6-11 Data Mining - R. Akerkar 38

Page 39: Neural Networks

USEFUL PROPERTIES OF SOM

Reducing dimensions (Indeed SOM is a map Reducing dimensions (Indeed, SOM is a map f : Rn → Zm)

Visualisation of clusters Visualisation of clusters Ordered display

H dl i i d t Handles missing data The learning algorithm is unsupervised.

September-6-11 Data Mining - R. Akerkar 39

Page 40: Neural Networks

Similarities and differences between feed-forward l k d lf i ineural networks and self-organising maps

Similarities are:Similarities are: Both are feed-forward networks (no loops). Nodes have weights corresponding to each

link. Both networks require training.

September-6-11 Data Mining - R. Akerkar 40

Page 41: Neural Networks

The main differences are:

Self-organising maps (SOM) use just a single output layer, they do not have hidden layers.

In feed-forward neural networks (FFNN) we have to calculate weighted sums of the ( ) gnodes. There are no such calculations in SOM, weights are only compared with the input patterns using Euclidean distance.

In FFNN the output values of nodes are important, and they are defined by the p p , y yactivation functions. In SOM nodes do not have any activation functions, and the output values are not important.

In FFNN all the output nodes can re, while in SOM only one.p , y

The output of FFNN can be a complex pattern consisting of the values of all the output nodes. In SOM we only need to know which of the output nodes is the winner.

Training of FFNN usually employs supervised learning algorithms, which require a training set. SOM use unsupervised learning algorithm.

September-6-11 Data Mining - R. Akerkar 41

There are, however, unsupervised training methods for FFNN as well.

Page 42: Neural Networks

Lab 13 (a)

Consider the self-organising map: The output layer of this map consists of six nodes, A, B, C, D, E and F,

which are organised into a two-dimensional lattice with neighbours connected by lines.E h f th t t d h t i t d ( t h th Each of the output nodes has two inputs x1 and x2 (not shown on the diagram). Thus, each node has two weights corresponding to these inputs: w1 and w2. The values of the weights for all output in the SOM nodes are given in the table below:g

Calculate which of the six output nodes is the winner if the input pattern is

September-6-11 Data Mining - R. Akerkar 42

p p px = (2, -4)?

Page 43: Neural Networks

Solution 13 (a)

First, we calculate the distance for each node:,

The winner is the node with the smallest distance from x. Thus, in this case the winner is node C (because 5 is the smallest distance here).

September-6-11 Data Mining - R. Akerkar 43

Page 44: Neural Networks

Lab 13 (b)

September-6-11 Data Mining - R. Akerkar 44

Page 45: Neural Networks

Solution 13 (b)

September-6-11 Data Mining - R. Akerkar 45


Recommended