Q.J. Zhang, Carleton University
Introduction to
Neural Networks:
Structure and Training
Professor Q.J. Zhang
Department of Electronics
Carleton University, Ottawa, Canada
www.doe.carleton.ca/~qjz, [email protected]
Q.J. Zhang, Carleton University
A Quick Illustration Example:
Neural Network Model for Delay
Estimation in a High-Speed
Interconnect Network
Q.J. Zhang, Carleton University
High-Speed VLSI
Interconnect Network
Driver 1
Driver 2
Receiver 1
Driver 3
Receiver 2
Receiver 3
Receiver 4
Q.J. Zhang, Carleton University
Circuit Representation of the
Interconnect Network
C1 R1 R2 C2
C4 R4
C3 R3
L1 L2 L3
L4
1 2
3
4
Source
Vp, Tr
Rs
Q.J. Zhang, Carleton University
Massive Analysis of Signal Delay
Q.J. Zhang, Carleton University
• A PCB contains large number of interconnect networks,
each with different interconnect lengths, terminations, and
topology, leading to need of massive analysis of
interconnect networks
• During PCB design/optimization, the interconnect
networks need to be adjusted in terms of interconnect
lengths, receiver-pin load characteristics, etc, leading to
need of repetitive analysis of interconnect networks
• This necessitates fast and accurate interconnect network
models and neural network model is a good candidate
Need for a Neural Network Model
Q.J. Zhang, Carleton University
Neural Network Model for Delay Analysis
L1 L2 L3 L4 R1 R2 R3 R4 C1 C2 C3 C4 Rs Vp Tr e1 e2 e3
…...
1 2 3 4
Q.J. Zhang, Carleton University
Z1 Z2 Z3 Z4
y1 y2
x1 x2 x3
W ’jk
Wki
Outputs
yj =S W ’jkZk k
3 Layer MLP: Feedforward Computation
Inputs
Zk = tanh(S Wki xi )
Hidden Neuron Values
i
Q.J. Zhang, Carleton University
Neural Net Training
Training Data
by simulation/
measurement
d = d(x)
Neural
Network
y = y(x)
Objective:
to adjust W,V such that
minimize (y - d)2
W,V x
S
Q.J. Zhang, Carleton University
Simulation Time for 20,000
Interconnect Configurations
Method CPU
Circuit Simulator (NILT) 34.43 hours
AWE 9.56 hours
Neural Network Approach 6.67 minutes
Q.J. Zhang, Carleton University
• Neural networks have the ability to model multi-dimensional nonlinear relationships
• Neural models are simple and the model computation is fast
• Neural networks can learn and generalize from available data thus making model development possible even when component formulae are unavailable
• Neural network approach is generic, i.e., the same modeling technique can be re-used for passive/active devices/circuits
• It is easier to update neural models whenever device or component technology changes
Important Features of Neural Networks
Q.J. Zhang, Carleton University
Inspiration
“Stop”
“Start”
“Help”
Mary Lisa John
Q.J. Zhang, Carleton University
A Biological Neuron
Figure from Reference [L.H. Tsoukalas and R.E. Uhrig, Fuzzy and Neural Approaches in Engineering, Wiley, 1997.]
Q.J. Zhang, Carleton University
Neural Network Structures
Q.J. Zhang, Carleton University
• A neural network contains
• neurons (processing elements)
• connections (links between neurons)
• A neural network structure defines
• how information is processed inside a neuron
• how the neurons are connected
• Examples of neural network structures
• multi-layer perceptrons (MLP)
• radial basis function (RBF) networks
• wavelet networks
• recurrent neural networks
• knowledge based neural networks
• MLP is the basic and most frequently used structure
Neural Network Structures
Q.J. Zhang, Carleton University
MLP Structure
(Output) Layer L
(Hidden) Layer L-1
1 2 NL
NL-1 3 2 1
. . . .
. . . . . . .
. . . .
(Hidden) Layer 2
(Input) Layer 1
3 2 1 N2
N1 3 1 2
x1 x2 x3
. . . .
. . . .
xn
y1 y2 ym
Q.J. Zhang, Carleton University
Information Processing In a Neuron
1
2
lz
l
iz
S
(.)
….
l
iw 0
1
0
lz
l
iw 1l
iw 2
l
i
l
iN lw
1
1
1
lz
1
1
l
N lz
Q.J. Zhang, Carleton University
• Input layer neurons simply relay the external inputs to the neural network
• Hidden layer neurons have smooth switch-type activation functions
• Output layer neurons can have simple linear activation functions
Neuron Activation Functions
Q.J. Zhang, Carleton University
0
0.5
1
1.5
-25 -20 -15 -10 -5 0 5 10 15 20 25
-2
-1
0
1
2
-10 -8 -6 -4 -2 0 2 4 6 8 10
-2
-1
0
1
2
-10 -8 -6 -4 -2 0 2 4 6 8 10
Sigmoid
Arctangent
Hyperbolic
tangent
Forms of Activation Functions: z = ()
Q.J. Zhang, Carleton University
Forms of Activation Functions: z = ()
Sigmoid function:
Arctangent function:
Hyperbolic Tangent function:
ez
1
1)(
)arctan(2
)(
z
ee
eez )(
Q.J. Zhang, Carleton University
Multilayer Perceptrons (MLP):
Structure:
(Output) layer L
(Hidden) layer l
(Hidden) layer 2
(Input) layer 1
1
3 2 1 N2
2 NL
NL-1 3 2 1
N1 3 1 2
x1 x2 x3
. . . .
. . . . . . .
. . . .
. . . .
. . . .
xn
Q.J. Zhang, Carleton University
where: L = Total No. of the layers
Nl = No. of neurons in the layer #l
l = 1, 2, 3, …, L
= Link (weight) between the neuron #i in the layer
#l and the neuron #j in the layer #(l-1)
NN inputs = (where n = N1)
NN outputs = (where m = NL)
Let the neuron output value be represented by z,
= Output of neuron #i in the layer #l
Each neuron will have an activation function:
lijw
nxxx ,,, 21
myyy ,,, 21
liz
)(z
Q.J. Zhang, Carleton University
Neural Network Feedforward:
Problem statement:
Given: , get from NN.
Solution: feed to layer #1, and feed outputs of layer #l-1 to l.
For
For :
and solution is
nx
x
x
x2
1
my
y
y
y2
1
1,,,2,1,:1 Nnnixzl il
i
Ll ,,3,2
1
0
1lN
j
li
li
lj
lij
li )(z,zw
LLii Nmmizy ,,,2,1,
Q.J. Zhang, Carleton University
Question: How can NN represent an arbitrary
nonlinear input-output relationship?
Summary in plain words:
Given enough hidden neurons, a 3-layer-perceptron can
approximate an arbitrary continuous multidimensional function
to any required accuracy.
Q.J. Zhang, Carleton University
Theorem (Cybenko, 1989):
Let be any continuous sigmoid function, the finite sums of
the form:
are dense in C(In). In other words, given any and ,
there is a sum, , of the above form, for which
for all
where:
In -- n-dimensional unit cube
C(In): Space of continuous functions on In .
e.g., Original problem is , where ,
the form of is a 3-layer-perceptron NN.
)(xfy )( nICf
)( xf
(.)
2
1 0
23N
j
n
ii
)(ji
)(kjkk )xw(w)x(fy mk ,,2,1
)I(Cf n 0
)( xf
)x(f)x(f nIx
n1,0
nixspacex i ,,2,1,1,0:
Q.J. Zhang, Carleton University
Illustration of the Effect of Neural Network Weights
0
0.2
0.4
0.6
0.8
1
1.2
-6 -4 -2 0 2 4 6
Suppose this neuron is zil. Weights wij
l (or wkil+1)
affect the figure horizontally (or vertically).
Values of the w’s are not unique, e.g., by changing
signs of the 3 values in the example above.
Standard signmoid function
1/(1 + e –x )
Hidden neuron
1/(1 + e –(0.5x – 2.5) ) 2.0
0.5
bias: -2.5
0
0.5
1
1.5
2
2.5
-10 -5 0 5 10 15 20
Q.J. Zhang, Carleton University
Illustration of the Effect of Neural Network Weights
- Example with 1 input
bias: -20 bias: -2.5
bias: 1
bias:
-26
0.8
-4.5
2.0
0.5 2.0
1.0
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
-20 -10 0 10 20 30 40 50
Q.J. Zhang, Carleton University
Illustration of the Effect of Neural Network Weights
- Example with 2 inputs
bias:
-20
bias:
-0.25
bias: 1
bias:
-16
0.7
-1.5 2.3
1.0
1.0
-1.0
z1
0.0 1.0 1.0 1.0
0.0
1
bias:
0
x1 x2
y
z2 z3 z4
z1 as a function of x1 and x2 :
z1 = 1/(1+exp(- x1))
x1
x2
z1
x1 x2
z1
Q.J. Zhang, Carleton University
x2
bias:
0
bias:
0
bias:
0
4.0
4.0
0.1
z1
0.1 0.1 4.0 4.0
0.1
bias:
0
x1
y
z2 z3 z4
x1 x2 z1 z2 z3 z4
-1 -1 -1 -1 -1 0
+1 -1 +1 -1 0 0
-1 +1 -1 +1 0 0
+1 +1 +1 +1 +1 0
Illustration of the Effect of Neural Network Weights
- Example with 2 inputs
Assuming arctan activation function
Q.J. Zhang, Carleton University
x2
bias:
0
bias:
0 bias:
0
4
-4
4
z1
4 -4 4 -4
-4
bias:
0
x1
y
z2 z3 z4
x1 x2 z1 z2 z3 z4
-1 -1 -1 +1 0 0
+1 -1 0 0 +1 -1
-1 +1 0 0 -1 +1
+1 +1 +1 -1 0 0
Illustration of the Effect of Neural Network Weights
- Example with 2 inputs
Assuming arctan activation function
Q.J. Zhang, Carleton University
x2
bias:
-4
bias:
10
bias:
-12
0.1
4
4
z1
0.1 4 4
0.1 0.1
bias:
-4
x1
y
z2 z3 z4
Illustration of the Effect of Neural Network Bias Parameters
- Example with 2 inputs
Assuming arctan activation function
x1 x2 z1 z2 z3 z4
-1 -1 -1 +1 -1 -1
-1 +1 -1 +1 -1 -1
+1 -1 -1 +1 -1 -1
+1 +1 -1 +1 -1 +1
40 40 +1 +1 +1 +1
-70 -70 -1 -1 -1 -1
Q.J. Zhang, Carleton University
x2
Effect of Neural Network Weights, and Inputs on
Neural Network Outputs
As the values of the neural
network inputs x change,
different neurons will respond
differently, resulting in y being a
“rich” function of x. In this way,
y becomes a nonlinear a function
of x.
When the connection weights and
bias change, y will become a
different function of x.
y = f(x, w)
bias:
-20
bias:
-0.25
bias: 1
bias:
-16
0.7
-1.5 2.3
1.0
1.0
-1.0
z1
0.0 1.0 1.0 1.0
0.0
1
bias:
0
x1 x2
y
z2 z3 z4
Q.J. Zhang, Carleton University
Question: How many neurons are needed?
Essence: The degree of non-linearity in original problem.
Highly nonlinear problems need more neurons, more smooth
problems need fewer neurons.
Too many neurons – may lead to over learning
Too few neurons – will not represent problem well enough
Solution:
schemes adaptive
r trial/erro
experience
Q.J. Zhang, Carleton University
Question: How many layers are needed?
3 or more layers are necessary and sufficient for arbitrary
nonlinear approximation
3 or 4 layers are used more frequently
more layers allow more effective representation of
hierarchical information in the original problem
Q.J. Zhang, Carleton University
Radial Basis Function Network (RBF)
Structure:
Z1 Z2 ZN
x1 xn
y1 ym
wij
ij, cij
Q.J. Zhang, Carleton University
0
0.5
1
1.5
-10 -6 -2 2 6 10
()
a = 0.4
a = 0.9
RBF Function (multi-quadratic)
01
22
,
)c()(
Q.J. Zhang, Carleton University
RBF Function (Gaussian)
))/(exp()(2
-0.2
0
0.2
0.4
0.6
0.8
1
1.2
-4 -3 -2 -1 0 1 2 3 4
=1
=0.2
()
Q.J. Zhang, Carleton University
RBF Feedforward:
where
is the center of radial basis function, is the width factor.
So a set of parameters (i = 1, 2, …, N, j = 1, …, n) represent
centers of the RBF. Parameters , (i = 1, 2, …, N, j = 1, …, n)
represent “standard deviation” of the RBF.
Universal Approximation Theorem (RBF) –
(Krzyzak, Linder & Lugosi, 1996): An RBF network exists such that it will approximate an
arbitrary continuous to any accuracy required.
mk ,,2,1
i
ie)(zi
0210
,N,,,i 2
1
)cx
(n
j ij
ijj
i
ijc ij
)(xfy
N
iikik zwy
0
ijijc
Q.J. Zhang, Carleton University
Wavelet Neural Network Structure:
For hidden neuron :
where tj = [ tj1 , tj2 , . . . , tjn]T is the translation vector, is the dilation
factor, is a wavelet function:
Z1 Z2 ZN
x1 xn
y1 ym
a1 a2 aN tij
wij
jz )()(j
jjj
a
txz
ja
)(
22
2
)()()(
j
enjjj
n
i j
jii
j
jj )
a
tx(
a
tx
1
2 where
Q.J. Zhang, Carleton University
Wavelet Function
-1
-0.5
0
0.5
1
-20 -10 0 10 20
a1 = 5
a1 = 1
x
z
Q.J. Zhang, Carleton University
Wavelet Transform: R – space of a real variable
-- n-dimensional space, i.e. space of vectors of n real
variables
A function f : is radial, if a function g : ,
exists such that ,
If is radial, its Fourier Transform is also radial.
Let , is a wavelet function, if
Wavelet transform of is:
Inverse transform is:
)(xf
dxa
txaxftaw
n
Rn )()(),( 2
dtdaa
txatawa
Cxf
n
R
nn )(),(
1)( 2
0
)1(
nR
RRn RR
nRx )()( xgxf
)(ˆ w
)()(ˆ ww )(x
dC
n
0
2)(
)2(
)(x
Q.J. Zhang, Carleton University
The neural network accepts the input information sent to input
neurons, and proceeds to produce the response at the output
neurons. There is no feedback from neurons at layer l back to
neurons at layer k, k l.
Examples of feedforward neural networks:
Multilayer Perceptrons (MLP)
Radial Basis Function Networks (RBF)
Wavelet Networks
Feedforward Neural Networks
Q.J. Zhang, Carleton University
Recurrent Neural Network (RNN):
Discrete Time Domain
….
y(t)
x(t)
y(t-2) y(t-3) y(t-) x(t-) x(t-2)
Feedforwad neural
network
The neural network output is a function of its present input, and a history of its
input and output.
Q.J. Zhang, Carleton University
Dynamic Neural Networks (DNN)
(continuous time domain)
….
y(t)
x(t)
y’’(t) y’’’(t) y’(t) x’(t) x’’(t)
Feedforwad neural
network
The neural network directly represents the dynamic input-output relationship of
the problem. The input-output signals and their time derivatives are related
through a feedforward network.
Q.J. Zhang, Carleton University
Self-Organizing Maps (SOM), (Kohonen, 1984)
Clustering Problem:
Given training data xk, k = 1, 2, …, P,
Find cluster centers ci, i = 1, 2, …, N
Basic Clustering Algorithm: For each cluster i, (i=1,2, .., N), initialize an index set Ri={empty}, and set
the center ci to an initial guess.
For xk, k = 1, 2, …, P, find the cluster that is closest to xk,
i.e. find ci , such that ,
Let Ri= Ri {k}. Then the center is adjusted:
jijxcxc kjki ,,
iRk
k
i
i xRsize
c)(
1
and the process continues until ci does not move any more.
Q.J. Zhang, Carleton University
Self Organizing Maps (SOM)
SOM is a one or two dimensional array of neurons where the
neighboring neurons in the map correspond to the neighboring
cluster centers in the input data space.
Principle of Topographic Map Information: (Kohonen, 1990)
The spatial location of an output neuron in the topographic map
corresponds to a particular domain or features of the input data
Two-dimensional array of
neurons
input
Q.J. Zhang, Carleton University
Training of SOM:
For each training sample xk, k = 1, 2, …, P, find the nearest
cluster cij , such that
Then update and neighboring centers:
where ,
Nc – size of neighborhood
-- a positive value decaying as training proceeds
jqipqpxcxc kpqkij ,,,,
ijc
))(( pqkpqpq cxtcc
Ncip Ncjq
)(t
Q.J. Zhang, Carleton University
Specific MLP for
group 1
Specific MLP for
group 2
Specific MLP for
group 3
Specific MLP for
group 4
y
SOM
Control
x y
x
Control
General MLP
Q
x y
x y
x y
c
Filter Clustering Example (Burrascano et al)
Q.J. Zhang, Carleton University
Other Advanced Structures:
Knowledge Based Neural Networks embedding application
specific knowledge into networks
Empirical formula
Equivalent Circuit
Pure
Neural Networks +