11
1
Backpropagation
11
2
Multilayer Perceptron
R – S1 – S2 – S3 Network
11
3
Example
11
4
Elementary Decision Boundaries
First Subnetwork
First Boundary:a1
1hardlim 1– 0 p 0.5+ =
Second Boundary:
a21
hardlim 0 1– p 0.75+ =
11
5
Elementary Decision Boundaries
Third Boundary:
Fourth Boundary:
Second Subnetwork
a31
hardlim 1 0 p 1.5– =
a41
hardlim 0 1 p 0.25– =
11
6
Total Network
W1
1– 0
0 1–1 0
0 1
= b1
0.50.75
1.5–0.25–
=
W2 1 1 0 0
0 0 1 1= b2 1.5–
1.5–=
W3
1 1= b30.5–=
11
7
Function Approximation Example
f1
n 1
1 en–
+-----------------=
f2
n n=
w1 11
10= w 2 11
10= b11
10–= b21
10=
w1 12
1= w1 22
1= b2
0=
Nominal Parameter Values
11
8
Nominal Response
-2 -1 0 1 2-1
0
1
2
3
11
9
Parameter Variations
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1– w 1 12
1
1– w1 22
1
0 b21
20
1– b2
1
11
10
Multilayer Network
am 1+
fm 1+
Wm 1+
amb
m 1++ = m 0 2 M 1– =
a0
p=
a aM=
11
11
Performance Index
p1 t1{ , } p2 t2{ , } pQ tQ{ , }
Training Set
F x E e2 = E t a– 2 =
Mean Square Error
F x E eTe = E t a–
Tt a– =
Vector Case
F̂ x t k a k – T t k a k – eT k e k = =
Approximate Mean Square Error (Single Sample)
w i jm
k 1+ wi jm
k F̂
w i jm
------------–= bim
k 1+ bim
k F̂
bim
---------–=
Approximate Steepest Descent
11
12
Chain Rule
f n w dwd
-----------------------f n d
nd--------------
n w dwd
---------------=
f n n cos= n e2w
= f n w e2w cos=
f n w dwd
-----------------------f n d
nd--------------
n w dwd
--------------- n sin– 2e2w e
2w sin– 2e2w = = =
Example
Application to Gradient Calculation
F̂
w i jm
------------
F̂
nim
---------ni
m
wi jm
------------= F̂
bim
---------F̂
nim
---------ni
m
bim
---------=
11
13
Gradient Calculation
nim
wi jm
ajm 1–
j 1=
Sm 1–
bim
+=
nim
wi jm
------------ a jm 1–
=ni
m
bim
--------- 1=
sim F̂
nim
---------
Sensitivity
F̂
w i jm
------------ sim
ajm 1–
=F̂
bim
--------- si
m=
Gradient
11
14
Steepest Descent
wi jm
k 1+ wi jm
k sim
a jm 1–
–= bim
k 1+ bim
k sim
–=
Wm
k 1+ Wm
k sma
m 1–
T–= bm
k 1+ bmk sm–=
sm F̂
nm
----------
F̂
n1m
---------
F̂
n2m
---------
F̂
nS
mm
-----------
=
Next Step: Compute the Sensitivities (Backpropagation)
11
15
Jacobian Matrix
nm 1+
nm
-----------------
n1m 1+
n1m
----------------
n1m 1+
n2m
----------------
n1m 1+
nS
mm
----------------
n2m 1+
n1m
----------------
n2m 1+
n2m
----------------
n2m 1+
nS
mm
----------------
n
Sm 1+m 1+
n1m
----------------
nSm 1+m 1+
n2m
----------------
nSm 1+m 1+
nS
mm
----------------
nim 1+
n jm
----------------
wi lm 1+
alm
l 1=
Sm
bim 1+
+
n jm
----------------------------------------------------------- wi jm 1+ a j
m
n jm
---------= =
nim 1+
n jm
---------------- wi jm 1+ f m n j
m
njm
--------------------- wi jm 1+
fÝm
n jm = =
fÝm
n jm
f m n jm
njm
---------------------=
nm 1+
nm----------------- Wm 1+ FÝ
mnm = FÝ
mn
m
fÝm
n1m 0 0
0 fÝm
n2m 0
0 0 fÝm
nS
mm
=
11
16
Backpropagation (Sensitivities)
sm F̂
nm---------- n
m 1+
nm
-----------------
T
F̂
nm 1+
----------------- FÝ
mnm Wm 1+
T F̂
nm 1+
-----------------= = =
sm
FÝmn
m( ) W
m 1+
Ts
m 1+=
The sensitivities are computed by starting at the last layer, andthen propagating backwards through the network to the first layer.
sM
sM 1–
s2
s1
11
17
Initialization (Last Layer)
siM F̂
niM
----------t a–
Tt a–
niM
---------------------------------------
tj a j– 2
j 1=
SM
niM
----------------------------------- 2 ti ai– –ai
niM
----------= = = =
sM
2FÝMn
M( ) t a– –=
ai
niM
----------ai
M
niM
----------f
Mn i
M
niM
----------------------- fÝM
n iM = = =
siM
2 ti ai– – fÝM
n iM =
11
18
Summary
am 1+
fm 1+
Wm 1+
amb
m 1++ = m 0 2 M 1– =
a0
p=
a aM=
sM
2FÝMn
M( ) t a– –=
sm
FÝmn
m( ) W
m 1+
Ts
m 1+= m M 1– 2 1 =
Wm
k 1+ Wm
k sma
m 1–
T–= b
mk 1+ b
mk s
m–=
Forward Propagation
Backpropagation
Weight Update
11
19
Example: Function Approximation
g p 14---p
sin+=
1-2-1Network
+
-
t
a
ep
11
20
Network
1-2-1Network
ap
11
21
Initial Conditions
W10 0.27–
0.41–= b1
0 0.48–
0.13–= W2
0 0.09 0.17–= b20 0.48=
Network ResponseSine Wave
-2 -1 0 1 2-1
0
1
2
3
11
22
Forward Propagation
a0
p 1= =
a1 f1 W1a0 b1+ l ogsig 0.27–
0.41–1
0.48–
0.13–+
logsig 0.75–
0.54–
= = =
a1
1
1 e0.75+--------------------
1
1 e0.54+--------------------
0.321
0.368= =
a2
f2 W2a1 b2
+ purelin 0.09 0.17–0.321
0.3680.48+( ) 0.446= = =
e t a– 1 4---p
sin+
a2– 1 4---1
sin+
0.446– 1.261= = = =
11
23
Transfer Function Derivatives
fÝ1
n nd
d 1
1 en–
+----------------- e
n–
1 en–
+ 2
------------------------ 11
1 en–
+-----------------–
1
1 en–
+-----------------
1 a1
– a1 = = = =
fÝ2
n nd
dn 1= =
11
24
Backpropagation
s2
2FÝ2n
2( ) t a– – 2 fÝ
2n
2 1.261 – 2 1 1.261 – 2.522–= = = =
s 1 FÝ1n1
( ) W2 Ts 2 1 a1
1– a1
1 0
0 1 a21
– a21
0.09
0.17–2.522–= =
s1 1 0.321– 0.321 0
0 1 0.368– 0.368 0.090.17–
2.522–=
s 1 0.218 0
0 0.233
0.227–
0.429
0.0495–
0.0997= =
11
25
Weight Update
W21 W2
0 s2 a1 T
– 0.09 0.17– 0.1 2.522– 0.321 0.368–= =
0.1=
W21 0.171 0.0772–=
b21 b2
0 s2– 0.48 0.1 2.522–– 0.732= = =
W11 W1
0 s 1 a0 T
– 0.27–
0.41–0.1 0.0495–
0.09971– 0.265–
0.420–= = =
b11 b1
0 s1– 0.48–
0.13–0.1 0.0495–
0.0997– 0.475–
0.140–= = =
11
26
Choice of Architecture
g p 1i 4----- p sin+=
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1-3-1 Network
i = 1 i = 2
i = 4 i = 8
11
27
Choice of Network Architecture
g p 16 4
------ p sin+=
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1-5-1
1-2-1 1-3-1
1-4-1
11
28
Convergence
g p 1 p sin+=
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1
23
4
5
0
1
2
34
5
0
11
29
Generalization
p1 t1{ , } p2 t2{ , } pQ tQ{ , }
g p 14---p
sin+= p 2– 1.6– 1.2– 1.6 2 =
-2 -1 0 1 2-1
0
1
2
3
-2 -1 0 1 2-1
0
1
2
3
1-2-1 1-9-1