Artificial Neural Networks Introduction and Perceptron...

Artificial Neural Networks

Introduction and Perceptron LearningCPSC 565 — Winter 2003

Christian Jacob ©

Department of Computer Science

University of Calgary

Canada

The Brain Paradigm

Example: Visual Cortex of a Cat

‡ Image 1

CPSC 565 - Winter 2003 - Emergent Computing 2

‡ Image 2

ANN Image Processing Example


Brains vs. Digital ComputersÏ Computers require hundreds of cycles to simulate a firing of a neuron. (How does the "firing" pattern of a neuron look

like?)

Ï Computers are good at symbol processing.

ï Is "life" and "mind" reducible to "symbol processing"?

Ï Brains perform extremely well at highly parallel pattern recognition tasks:

Ë face recognition,

Ë language processing,

Ë language understanding (!),

Ë creativity, inventing, use of tools, ...

Ë self-reflection, self-awareness, ...

‡ Computers versus Human Brains

Human Brain:

Ë Grown by cell differentiation and iterated cell division (instead of constructed from pre-fabricated building blocks)

Ë Rather simple "processing elements"

Ë High degree of interconnectivity (adaptive!)

Ë Adaptive and hierarchical architecture

Ë Highly parallel and distributed information processing

Ë Redundant information storage and processing

Ë Functionality is both pre-programmed (to some degree) and "programmable"

Ë "Algorithms" are designed through learning, not programming.


‡ Computers versus Human Brains: Hard- / Software and Processing

Computer Human Brain

Computational units 1 CPU, 105 gates 1011 neurons

Storage units 109 bits RAM, 1010 bits disk1011 neurons, 1014 synapses

Cycle time 10-9 sec 10-3 sec

Neuron updates per sec 105 1014

Networks of Neurons

Dendrites, Synapses, Cell Body, Axon

Dendrites, synapses, cell body, and axon are the four elements that are usually adopted from the biological model in order to

build artificial neural networks.

Artificial neurons for computing will have

Ë input channels,

Ë a cell body, and

Ë an output channel.

Synapses are simulated by contact points between the cell body and input or output connections.

A weight will be associated with these points.


Figure 1. A typical motor neuron

Transmission of Information

A fundamental problem of any information processing system is the way by which information is transmitted through the system.

Neurons transmit information using electrical signals.

However, in biological structures this can not be done by simple electronic transport as in metallic cables.

Evolution arrived at another solution: involving ions and semi-permeable membranes.

‡ Charged Cells

Our body consists mainly of water, 55% of which is contained within the cells and 45% forming its environment.

The cells preserve their identity and biological components by enclosing the protoplasm in a membrane.

Membranes are made of a double layer of molecules that form a diffusion barrier.

Some salts, present in our body, dissolve in the intra- and extracellular fluid and dissociate into negative and positive ions.

Ions present in the cells that play an important role for neurons and their information processing are

Ë sodium ions HNa+ L , chlorine ions HCl- L , potassium HK+ L , and calcium HCa2+ L .

The membranes of the cells exhibit different degrees of permeability for each of these ions.


The permeability is determined by the number and size of pores in the membrane, the so-called ionic channels.

The specific permeability of the membrane leads to different distributions of ions in the interior and the exterior of the cells.

‡ Action Potential

In particular, differences in membrane permeability lead to the interior of neurons being negatively charged with respect to

the extracellular fluid.

An action potential is produced by an initial depolarization of the cell membrane.

Figure 2. Typical form of the action potential

The potential increases from -70mV to +40mV.

After some time, the potential becomes negative again, but it overshoots.

Gradually, the cell recovers and the cell membrane returns to the initial potential.


‡ Transmission of an Action Potential

Figure 3. Transmission of an action potential

Information Processing at the Synapses

Neurons transmit information using action potentials.

The processing of this information at the interfaces between neurons, the synapses, involves a combination of electrical and

chemical processes.

‡ Directed Transmission of Information

Synapses determine a direction for the transmission of information.


Signals flow from one cell to another in a well-defined manner.

Figure 4. Chemical signaling at the synapse

When an electric impulse arrives at the synapse, the synaptic vesicles fuse with the cell membrane.

The transmitters flow into the synaptic gap and some attach to the ionic channels.

This opens the ionic channels such that more ions can now flow from the exterior to the interior of the cell.

This way, the cell's potential is altered.

If the interior of the cell potential is increased, this helps prepare an action potential and the synapse causes an excitation of

the cell.

Storage of Information and Learning

NMDA receptors help to understand some forms of learning (among many others) in neurons (NMDA = N-methyl-D-aspartate).

NMDA receptors are ionic channels permeable for different kinds of molecules (sodium, calcium, or potassium ions).

Figure 5. Unblocking of an NMDA receptor

These channels are blocked by a magnesium ion, such that the permeability for sodium and potassium is low.


If the cell has reached a certain excitation level, the ionic channels lose the magnesium ions and become unblocked.

The permeability for Ca2+ ions increases immediately, which starts a chain of reactions resulting in a durable change of the

threshold level of the cell.

Artificial Neural Networks: Introductory Concepts

Definition of an ANN

Ë A neural network is a system composed of (usually a large number of) simple processing elements (neurons).

Ë Ideally, the processing elements operate asynchronously and in parallel.

Ë The ANNs can be used to acquire (through training, learning), store, and utilize experiential knowledge.

Mathematically, a neural network is a "mapping machine" capable of modeling a function

F : !n ö!m

That is, a network maps an m -dimensional real input vector Hx1 , x2 , …, xn L to an m -dimensional real output Hy1 , y2 , …, ym L .

ANN Architectures

‡ Feed-forward networks:

Ë Neurons are arranged in layers.

Ë Links only follow in one direction, namely from input to output layer.

Ë Usually, a unit is linked only to units in the following layer(s).

Ë Units within the same layer are not linked.

Ë Signal (and error) propagation as well as weight updating can proceed uniformly from the input to the output layer.


Figure 6. Example of a feed-forward network with a single hidden layer

‡ Recurrent networks:

Ë Links can be between any neuron and can form arbitrary topologies.

Ë Can implement more complex neural architectures.

Ë Internal states with memory can be modelled.

Ë A stable internal state and output might not be reached.

Figure 7.McCulloch-Pitts network for a binary scaler. For example, it translates the binary sequence 00110110 into the sequence 00100100.


A Generic Neuron Model

Generic Model of a Neuron Processing Unit

‡ A typical model of a neural processing unit:

‡ A more detailed model of a neural processing unit:

Input function:

ini = ⁄ j w ji a j

Activation function:

gHiniL = gH⁄ j w ji a jL


Output function:

ai = outHgHini LL = outHgH⁄ j w ji a j LL‡ Activation Functions

(1) Step Function:

stepHx, tL =ikjjj 1 if x ¥ t

0 if x < t

-4 -2 2 4

0.2

0.4

0.6

0.8

1t = 1

(2) Sign Function:

signHxL =ikjjj 1 if x ¥ 0

-1 if x < 0

-4 -2 2 4

-1

-0.5

0.5

1


(3) Sigmoid Function:

sigmoidHx, aL = 1ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ1+E-a x

-4 -2 2 4

0.2

0.4

0.6

0.8

1

The parameter a determines the slope of the sigmoid function:

Ë 0.1 ≤ a ≤ 1: sigmoidHx, aL = 1ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ1+E-a x

-4 -2 2 4

0.45

0.5

0.55

0.6

a = 0.1

Ë 1 ≤ a ≤ 10 : sigmoidHx, aL = 1ÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅÅ1+E-a x

-4 -2 2 4

0.2

0.4

0.6

0.8

1a = 1

This allows the sigmoid function to approximate both the step and the sign function.


Neurons in Action: Logic Gates

‡ Neurons as Logic Gates

Individual units, representing Boolean functions, can act as logic gates, given appropriate thresholds and weights.

Activation function: stepHx, tL

-4 -2 2 4

0.2

0.4

0.6

0.8

1t = 1

‡ (1) Which logic function?

w = 1

w = 1

t = 1.5


w = 1

w = 1

t = 0.5



w = -1t = -0.5

Specific Neuron Models

McCulloch-Pitts Units

McCulloch-Pitts processing units are the simplest neuron models, which produce and transmit only binary information.

Figure 8. McCulloch-Pitts unit

The rule for evaluating the input of a McCulloch-Pitts unit (MP unit) is as follows:

Ë The MP unit gets two sorts of input:

- input x1 , x2 , …, xn through n excitatory edges

- input y1 , y2 , …, ym through m inhibitory edges.

Ë If m ¥ 1 and at least one of the signals y1 , y2 , …, ym is 1, the unit is inhibited and the output is 0.

Ë Otherwise, the total excitation x = ⁄i=1n xi = x1 + x2 + … + xn is computed and compared to the threshold q:

output =ikjjj 1 if x ¥ q

0 if x < q


‡ Conjunction and Disjunction

Figure 9. Generalized AND and OR gates as McCulloch-Pitts units

‡ Negation and More Logical Functions

Figure 10. Logical functions and their realizations as McCulloch-Pitts neurons

‡ What Do MP Units Compute?

For visualization purposes, we consider the function space of logical functions of three variables.

Figure 11. Function values of a logical function of three variables Hx1 , x2 , x3 L


McCulloch-Pitts units divide the input space into two half-spaces.

For a given input Hx1 , x2 , x3 L and a threshold q the condition

x1 + x2 + x3 ¥ q

is tested, which is true for all points to one side of the plane defined by x1 + x2 + x3 = 0 and false for all points to the other

side.

Figure 12. Separation of the input space for the OR function

The majority function (with threshold q = 2) of three variables divides the input space in a similar manner, but the separat-ing plane is given by the equation x1 + x2 + x3 = 2.

Figure 13. Separating planes of the OR and majority functions

The planes are always parallel in the case of McCulloch-Pitts units.


The Perceptron

Today, the perceptron is one of the classic models of neural network processing elements and architectures.

Its use in practical applications is limited, however, due to its simplicity (both in its structure and learning algorithm) it

provides a good model to study the basics and problems of connectionist information processing.

‡ The Classical Perceptron

The perceptron was probably the first computation device inspired by neural networks.

The perceptron was developed in 1958 by the American psychologist Frank Rosenblatt.

Rosenblatt used the perceptron for image processing and image classification tasks.

Figure 14. The classical perceptron architecture as proposed by Frank Rosenblatt

‡ Minsky-and-Papert Perceptron

Minsky and Papert distilled the essential features from Rosenblatt's model in order to study the computational capabilities of

the perceptron under different assumptions.

A retina is directly connected to logic elements called predicates, which can computer a single bit according to their input.

These predicates can be as computationally complex as we like. For example, each predicate could perform a filter function

on the pixel image.


Figure 15. Predicates and weights of a perceptron.

Each predicate, however, is limited in its diameter or the number of input pixels. No predicate sees the whole retina.

A threshold unit, which receives weighted inputs from the predicates, is used to compute the final output of the perceptron.

‡ Limitations*

‡ A Perceptron Cell

.

.

.

x1

x2

xn

xn+1 = 1

w1w2

wn

wn+1 = -q

0 y

‡ Perceptron with a Bias

In many cases it is more convenient to deal with perceptrons of threshold zero only. This corresponds to linear separations

which go through the origin of the input space.


Any perceptron with threshold q can be converted into an equivalent perceptron with threshold zero, which has an additional input called the bias weighted by -q.

Figure 17. A perceptron with a bias

Most learning algorithms can be stated more concisely by transforming thresholds into biases.

The input and weight vectors must be extended:

Ë extended input vector: Hx1 , x2 , …, xn , 1LË extended weight vector: Hw1 , w2 , …, wn , wn+1 L with wn+1 = -q .

‡ From Inputs to Output

The perceptron calculates its output value as follows:

y = 9 1 if ⁄i=1n+1 wi ÿ xi ¥ 0

0 if ⁄i=1n+1 wi ÿ xi < 0


What Do Perceptrons Compute?

‡ Geometric Interpretation

A simple perceptron is a computing unit with threshold q.

Receiving the n real inputs x1 , x2 , …, xn through edges with the associated weights w1 , w2 , …, wn , a perceptron computes its output as follows:

output = 9 1 if ⁄i=1n wi xi ¥ q

0 otherwise.

The following figure shows this separation of the input space for weights Hw1 , w2 L = H0.9, 0.2L .

Figure 18. Separation of input space with a perceptron testing the condition 0.9 x1 + 2 x2 ¥ 1

‡ Linearly Separable Functions

A perceptron network is capable of computing any logical function.

If we reduce the network to a single perceptron, which functions are still computable?

The 16 Boolean functions of two variables:

x1 x2 f0 f1 f2 f3 f4 f5 f6

f7 f8 f9

f10 f11 f12 f13 f14 f15

0 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1


0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1

1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1

1 1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1

Perceptron-computable functions are those for which the points whose function value is 0 can be separated from the points

whose function value is 1 using a single line.

Figure 19. Linear separations of input space corresponding to OR and AND

Two sets of points A and B in an n -dimensional space are called linearly separable if n + 1 real numbers w1 , …, wn+1

exist, such that every point Hx1 , x2 , …, xn L œ A satisfies ⁄i=1n wi xi ¥ wn+1 and every point Hx1 , x2 , …, xn L œ B satisfies ⁄i=1

n wi xi < wn+1 .

‡ Duality of Input Space and Weight Space

Figure 20. Duality of input and weight space

The computation performed by a perceptron can be visualized as a linear separation of input space.

When trying to find the appropriate weights for a perceptron, the search process can be better visualized in weight space.


‡ The Error Function in Weight Space

Assume that the set A of input vectors in n-dimensional space must be separated from the set B of input vectors such that a

perceptron computes the binary function fw with

fw HxL = 9 1 if x œ A

0 if x œ B.

The function fw depends on the set w = Hw1 , …, wN L of weights (including the threshold).

The error function value is the number of false classifications for a particular weight vector w :

EHwL = ⁄xœA H1 - fw HxLL + ⁄xœB fw HxL .

Since EHwL is positive or zero, we want to reach the global minimum where EHwL = 0.

Consequently, the aim of perceptron learning is to find the weight vector for which EHwL = 0.

The optimization problem, which the learning algorithm has to solve, can be understood as a descent on the error surface.

Figure 21. Error function for the AND function (for a perceptron with two inputs Hx1 , x2 L and constant threshold q = 1.

Here is an example of such a path through an interation of weight settings w0 , w1 , w2 , w* .


Figure 22. Iteration steps to the region of minimal error

The Perceptron Learning Algorithm

‡ Optimization Problem: Definition

The optimization problem, which the learning algorithm has to solve, can be understood as descent on the error surface.

But we can also look at the problem as a search for an inner point of the solution region (a polytope in the case of the perceptron).

For example, let's have a look at the separation corresponding to the AND function:

P = 8H1, 1L<N = 8H0, 0L, H1, 0L, H0, 1L<

Here P and N are the two sets of points to be separated.

The set P must be classified in the positive and the set N in the negative half-space.

‡ Optimization Problem: Analytical Solution

Three weights w1 , w2 and w3 = -q are needed to implement the desired separation with a generic perceptron.

With the extended input vector Hx3 = 1L , the following four inequalities have to be fulfilled for the AND function:H0, 0, 1L ÿ Hw1 , w2 , w3 L < 0H1, 0, 1L ÿ Hw1 , w2 , w3 L < 0H0, 1, 1L ÿ Hw1 , w2 , w3 L < 0H1, 1, 1L ÿ Hw1 , w2 , w3 L > 0


With the extended input vector Hx3 = 1L , the following four inequalities have to be fulfilled for the AND function:H0, 0, 1L ÿ Hw1 , w2 , w3 L < 0H1, 0, 1L ÿ Hw1 , w2 , w3 L < 0H0, 1, 1L ÿ Hw1 , w2 , w3 L < 0H1, 1, 1L ÿ Hw1 , w2 , w3 L > 0

These equations can be written in a simpler matrix form:ikjjjjjjjjjjjj0 0 -1

-1 0 -10 -1 -11 1 1

y{zzzzzzzzzzzzikjjjjjjj w1

w2w3

y{zzzzzzz>

ikjjjjjjjjjjjj0000

y{zzzzzzzzzzzz

This can be written as

" ÿ w”÷÷ > 0,

where " is the 4µ3 matrix and w”÷÷ the weight vector (written as a column vector).

This equation describes all points in the interior of a convex polytope.

The sides of the polytope are delimited by the planes defined by each of the inequalities above.

Any point in the interior of the polytope represents a solution for the learning problem.

Figure 23. Solution polytope for the AND function in weight space


‡ Optimization Problem: Learning Algorithm

The following procedure describes the learning algorithm for a single perceptron cell.

Given are two sets of points P and N , which the perceptron should learn to classify.

Ï Start: Generate an initial vector of weights w”÷÷ 0 .

t = 0; w”÷÷ = w”÷÷ 0

Ï Testing: Select x”÷ œ P ‹ N .

If x”÷ œ P and w”÷÷ t ÿ x”÷ > 0: goto Test for End

If x”÷ œ P and w”÷÷ t ÿ x”÷ § 0: goto Addition

If x”÷ œ N and w”÷÷ t ÿ x”÷ < 0: goto Test for End

If x”÷ œ N and w”÷÷ t ÿ x”÷ ¥ 0: goto Subtraction

Ï Addition: w”÷÷ t+1 = w”÷÷ t + x”÷t = t + 1

goto Testing

Ï Subtraction: w”÷÷ t+1 = w”÷÷ t - x”÷t = t + 1

goto Testing

Ï Test for End: Are all x”÷ œ P ‹ N correctly classified?

Yes: END

No: goto Testing

Note: The perceptron learning procedure only works if the point sets are linearly separable.


‡ Example

The following example illustrates the convergence behavior of the perceptron learning algorithm.

Figure 24. Initial Configuration

Figure 25. After correction with x”1




Adaptive "Programming" of ANNs through Learning

ANN Learning

A learning algorithm is an adaptive method by which a network of computing units self-organizes to implement the desired

behavior.

Changing Network Parameters

TestingInput/Output

ExamplesCalculating

Network Errors

Figure 28. Learning process in a parametric system

In some learning algorithms, examples of the desired input-output mapping are presented to the network.

A correction step is executed iteratively until the network learns to produce the desired response.


Learning Schemes

‡ Supervised Learning

Some input vectors are collected and presented to the network. The output computed by the network is observed and the

deviation from the expected answer is measured. The weights are corrected (= learning algorithm) according to the magni-tude of the error.

Ë Error-correction Learning:

The magnitude of the error, together with the input vector, determines the magnitude of the corrections to the weights.

Examples: Perceptron learning, backpropagation.

Ë Reinforcement Learning:

After each presentation of an input-output example we only know whether the network produces the desired result or not. The weights are updated based on this Boolean decision (true or false).

Examples: Learning how to ride a bike.

‡ Unsupervised Learning

For a given input, the exact numerical output a network should produce is unknown. Since no "teacher" is available, the

network must organize itself (e.g., in order to associate clusters with units).

Examples: Clustering with self-organizing feature maps, Kohonen networks.

Figure 29. Three clusters and a classifier network


ReferencesRojas, R. (1996). Neural networks : a systematic introduction. Berlin ; New York, Springer-Verlag.

Kasabov, N. K. (1998). Foundations of Neural Networks, Fuzzy Systems, and Knowledge Engineering. Cambridge, MA, MIT Press.

Nilsson, N. J. (1998). Artificial intelligence : a new synthesis. San Francisco, Calif., Morgan Kaufmann Publishers.

Negnevitsky, M. (2002). Artificial intelligence : a guide to intelligent systems. Harlow, England ; Tor-onto, Addison-Wesley.


Date post:	01-Apr-2018
Category:	Documents
Upload:	lythien
View:	214 times
Download:	1 times

Artificial Neural Networks Introduction and Perceptron...

Documents