Artificial Neurons

ARTIFICIAL NEURONS

q

q

Artificial NeuronsAn artificial neuron is a mathematical function conceived as a counterpart of a biological neuron. It is alternatively named elementary processing unit, binary neuron, node, linear threshold function or McCulloch-Pitts (MCP) neuron.

An artificial neuron receives one or more weighted inputs (representing the number of dendrites and synaptic weights) and sums them up (equivalently to spatio-temporal summation of signals at soma level). Subsequently, the sum is passed through a nonlinear function known as an activation or transfer function. If a certain threshold level is exceeded, the neuron fires up and send a signal to the neighboring cells.

The transfer functions usually have a sigmoid shape, though they may take several forms of non-linear functions, like piecewise linear functions, or step functions.

Generally, the transfer functions are monotonically increasing, continuous, differentiable, and bounded.

Most artificial neurons of simple types, such as McCulloch-Pitts models are sometimes characterized as caricature models, in that they are intended to reflect only some neurophysiological characteristics, but without regard to realism /full representation of their biological counterparts.

Structure of Artificial Neurons

An artificial neuron is a more or less simplified mathematical (computational)model of a biological neuron.

These models mimic the real life behavior of neurons with respect to the electro-chemical messages they produce between input (afferent signals from dendrites), signal processing at the level of soma, and output (efferent action potentials delivered by the axonal buttons).

Structure of Artificial Neurons

Consider an artificial neuron and let there be P + 1 inputs with signals x0through xP and weights wk0 through wP.

Usually, the x0 input is assigned the value +1, which makes it a bias input with wk0 = bk. This leaves only P actual inputs to the neuron: from x1 to xP.

Structure of Artificial NeuronsThe equations describing input signals and the output of a single neuron indexed by k are give by the following equations:

( )

1

1

P

k kj jj

P

k k kj jj

u w x

y u w xϕ ϕ

=

=

=

= =

∑

∑

Here u refers to the spatiotemoral sum of all weighted inputs of the neuron.

In most cases, it is useful to include the threshold θk for each neuron:

( )1

P

k k kj j kj

y u w xϕ ϕ θ=

= = −

∑

In vectorial notation, for each neuron k,

1 2

1 2

, ,..., - vector of input signals = , ,..., - vector of synaptic weights

P

k k k kP

x x xw w w

=xw

History of Artificial NeuronsThe first artificial neuron was the Threshold Logic Unit (TLU) first proposed by Warren McCulloch and Walter Pitts in 1943. As a transfer function, it employed a threshold equivalent to using the Heaviside step function.

Initially, a simple model was considered, with binary inputs and outputs, yet it was noticed that any Boolean function could be implemented by networks of such devices.

Cyclic networks, with feedback through neurons, could define dynamical systems with memory, but most of the research concentrated strictly on feedforward networks because of easier mathematical tractability.

An artificial neural network (ANN) that used the linear threshold function was the perceptron, developed by Frank Rosenblatt. This model already considered more flexible weight values in neurons and it was used in machines with adaptive capabilities.

The representation of the threshold values as a bias term was introduced by Bernard Widrow in 1960.

In the late 1980s, neurons with more continuous shapes were considered and optimization algorithms like gradient descent used for adjusting the weights.

ANNs also started to be used as a general function approximation models.

Transfer FunctionThe transfer function of a neuron is chosen to have a number of properties which either enhance or simplify the network containing the neuron.

A sigmoid function is a bounded differentiable real function that is defined for all real input values and has a positive derivative at each point.

All sigmoid functions are normalized in that their slope at the origin is 1.

Sigmoid FunctionA sigmoid function is a bounded differentiable real function that is defined for all real input values and has a positive derivative at each point.

A sigmoid function is a function having an "S" shape (sigmoid curve).

Often, sigmoid functions refer to special cases like the logistic function defined by the formula:

Another example is the Gompertz curve, which is used in modeling systems that saturate at large values of t.

( )( )

11 exp

S tt

=+ −

Gompertz Curve or Function

where a is the upper asymptote, b and c are negative numbers, b sets the displacement along the x axis (translates the graph to the left or right), and c sets the growth rate (y scaling).

A Gompertz function is a sigmoid function that models time series, where growth is slowest at start and end of a time period. The Gompertz curve is used in modeling systems that saturate at large values of t. Gompertz function is a special case of the generalized logistic function.

Examples of usage are in modeling the growth of tumors or populations in a confined space where the availability of nutrients is limited.

The future value asymptote of the function is approached much more gradually by the curve than the left-hand or lower valued asymptote, in contrast to the simple logistic function in which both asymptotes are approached symmetrically.

( ) ( )( )exp expX t a b ct=

Error Function

The error function (also called the Gauss error function) is a special function (non-elementary) of sigmoid shape:

The complementary error function is defined as:

( ) ( )2

0

2 exp dπ

= −∫x

erf x t t

( ) ( ) ( )2

0

21 1 exp dx

erfc x erf x t tπ

= − = − −∫

Gudermannian Function

The Gudermannian function, named after Christoph Gudermann (1798–1852), relates the circular functions and hyperbolic functions without using complex numbers.

The Gudermannian function denoted by gamma or gd and its inverse are defined as:

( )( )

( )( )

1

0 0

d d cosh cos

−= =∫ ∫x xt tgd x gd x

t t

Heaviside Function

The Heaviside function is the integral of the Dirac delta function, δ, although this expansion may not hold (or even make sense) for x = 0, depending on which formalism is used to give meaning to integrals involving δ.

( ) ( ) = d δ δ−∞

′ = ⇔ ∫u

H H u s s

The Heaviside step function, or the unit step function, usually denoted by H, is a discontinuous function. The output y of Htransfer function is binary, depending on the specified threshold, θ:

It seldom matters what value is used for H(0), since H is mostly used as a distribution.

( )1 if

0 if

uy u

uθθ

≥= <

Heaviside FunctionThe Heaviside step function is used in some neuromorphic models as well. It can be approximated from other sigmoidal functions by assigning large values to the weights. It performs a division of the space of inputs by a hyperplane.

An affine hyperplane is an affine subspace of codimension 1 in an affine space. Such a hyperplane in Cartesian coordinates is described by a linear equation (where at least one of the ai’s is non-zero):

The Heaviside function is specially useful in the last layer of a multilayered network intended to perform binary classification of the inputs

Affine hyperplanes are used to define decision boundaries in many machine learning algorithms such as decision trees and perceptrons.

1 1 2 2 ... n na x a x a x b+ + + =

In the case of a real affine space (when the coordinates are real numbers) this affine space separates the space into two half-spaces, which are the connected components of the complement of the hyperplane given by the inequalities:

1 1 2 2

1 1 2 2

...

...n n

n n

a x a x a x ba x a x a x b

+ + + <+ + + >

Random VariablesA random variable (aleatory variable or stochastic variable) is a real-valued function defined on a set of possible outcomes of a random experiment, the sample space Ω. That is, the random variable is a function that maps from its domain, the sample space Ω, to its range (real numbers or a subset of the real numbers).

A random variable can take on a set of possible different values (similarly to other mathematical variables), each with an associated probability (in contrast to other mathematical variables).

The mathematical function describing the possible values of a random variable and their associated probabilities is known as a probability distribution.

Random variables can be discrete, that is, taking any of a specified finite or countable list of values, and hence with a probability mass function as probability distribution, continuous, taking any numerical value in an interval or collection of intervals, and with a probability density function describing the probability distribution, or a mixture of both types.

Random variables with discontinuities in their CDFs can be treated as mixtures of discrete and continuous random variables.

Probability Density Function

Probability density function (pdf), or density of a continuous random variable X, denoted fX is a function that describes the relative likelihood for this random variable to take on a given value.

A random variable X has density fX, where fX is a non-negative Lebesgue-integrable function, if:

[ ] ( )Pr d b

Xa

a X b f x x≤ ≤ = ∫

Cumulative Distribution Function

The cumulative distribution function (cdf), describes the probability that a real-valued random variable X with a given probability distribution ƒX will be found at a value less than or equal to x. In the case of a continuous distribution, it gives the area under the probability density function from minus infinity to x.

The cdf of a continuous random variable X can be expressed as the integral of its probability density function ƒX as follows:

( ) ( ) d x

X XF x f t t−∞

= ∫

Probability Mass FunctionA probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value.

The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete.

A probability mass function (pmf) differs from a probability density function (pdf) in that the latter is associated with continuous rather than discrete random variables; the values of the latter are not probabilities as such: a pdf must be integrated over an interval to yield a probability.[

Consider a random variable X, defined on sample space Ω, then the probability mass function fX is defined as follows:

( ) ( ) ( ): , : [0,1], Pr( ) Pr :Ω→ ⊆ → = = = ∈Ω =X XX A f A f x X x s X s xR

The probability mass function of a fair die.

http://en.wikipedia.org/wiki/Probability_mass_function%23cite_note-2

Probability Mass Distribution - Example

In this case, the random variable of interest X is defined as the function that maps the pairs to the sum:

If the sample space Ω is the set of all possible numbers rolled on two dice, and the random variable X of interest is the sum S of the numbers on the two dice, then X is a discrete random variable whose distribution is described by the probability mass function plotted as the height of columns in the figure.

( ) ( )( ) 1 2 1 2 1 2 1 2, , : , , = , , 1,2,3,4,5,6 n n X S X n n n n n nΩ = Ω→ + ∀ ∈

and has the probability mass function ƒX given by:

( ) min( 1, 13 ) , 2,3,4,5,6,7,8,9,10,11,1236X

S Sf S S− −= ∈

Degenerate DistributionA degenerate distribution is the probability distribution of a random variable which takes a single value only.

( ) 00

0

Probability mass function 1, if

; 0, if

k kf k k

k k=

= ≠PMF for k0=0. The horizontal axis is the index i of ki.

( ) 00

0

Cumulative distribution function 1, if

; 0, if

k kF k k

k k≥

= <

CDF for k0=0. The horizontal axis is the index i of ki.

The degenerate distribution is localized at a point k0 on the real axis. The probability mass function and cumulative distribution function are given by:

ReferencesMcCulloch, W. and Pitts, W. (1943), “A logical calculus of the ideas immanent in

nervous activity”. Bulletin of Mathematical Biophysics, 7: 115-133.Mutihac R., Modelarea si Simularea Neuronala – Elemente Fundamentale.

Editura Universitatii din Bucuresti, 2000.Werbos, P.J. (1990). “Backpropagation through time: What it does and how to

do it.” Proceedings of the IEEE, 78 (10):1550-1560.Robertson, J.S. (1997), "Gudermann and the simple pendulum", The College

Mathematics Journal 28(4):271–276.Fitzhugh, R. and Izhikevich, E. (2006), “FitzHugh-Nagumo model”.

Scholarpedia, 1 (9): 1349.Haykin S., Neural Networks: A Comprehensive Foundation. 2 ed., Prentice Hall,

1998.Hebb, D.O., The Organization of Behavior. New York Wiley, 1949.Hodgkin, A.L. and Huxley, A.F. (1952), “A quantitative description of membrane

current and its application to conduction and excitation in nerve”. The Journal of Physiology, 117 (4): 500–544.

Hoppensteadt, F.C. and Izhikevich E.M., Weakly Connected Neural Networks. Springer, 1997.

ReferencesAbbott, L.F. (1999). “Lapique's introduction of the integrate-and-fire model

neuron (1907)”. Brain Research Bulletin, 50 (5/6): 303-304.Koch, C. and Segev, I. (1999), Methods In Neuronal Modeling : From Ions To

Networks. 2nd ed., Cambridge, Mass., MIT Press., 1999.

Date post:	17-Feb-2016
Category:	Documents
Upload:	madalina-costache
View:	16 times
Download:	2 times

Artificial Neurons

Documents