Post on 13-Dec-2015
transcript
Outline
Perceptrons Learning Hidden Layer Representations Speeding Up Training Bias, Overfitting and Early Stopping (Example: Face Recognition)
Human Learning
Number of neurons: ~ 1010
Connections per neuron: ~ 104 to 105
Neuron switching time: ~ 0.001 second Scene recognition time: ~ 0.1 second
100 inference steps doesn’t seem much
Perceptron
w2 wnw1
w0
x0=1
o u t p u t o
x2 xnx1. . .
i n p u t x
n
i
iixwnet0
o 1 if net > 00 otherwise{
Boolean XOR
input x1
input x2
ouput
0 0 0
0 1 1
1 0 1
1 1 0
h1
x1
o
x1
h1
1
1.5
AND
11
0.5
OR
1
1
0.5
XOR
1
Perceptron Training Rule
iii www
ii xotw )(
step size
perceptronoutput
input
target
increment
new weight incrementold weight
Converges, if…
… training data linearly separable
… step size sufficiently small
… no “hidden” units
Sigmoid Squashing Function
w2 wnw1
w0
x0=1
o u t p u t
x2 xnx1. . .
i n p u t
n
i
iixwnet0
neteo
1
1
Gradient Descent
Gradient:
nw
E
w
E
w
EwE ,...,,][
10
ii w
Ew
Training rule: ][wEw
2)(2
1][
Dddd otwE
Gradient Descent (single layer)
diddd
dd
did
dd
ddd i
dd
ddd i
dd
ddd i
ddd
ii
xxwxwot
xww
ot
xwtw
ot
otw
ot
otw
otww
E
,
2
2
))(1()()(
))(()(
))(()(
)()(22
1
)(2
1
)(2
1
Batch Learning
Initialize each wi to small random value
Repeat until termination:wi = 0
For each training example d do
od (i wi xi,d)
wi wi + (td od) od (1-od) xi,d
wi wi + wi
Incremental (Online) Learning
Initialize each wi to small random value
Repeat until termination:For each training example d do
wi = 0
od i wi xi,d
wi wi + (td od) od (1-od) xi,d
wi wi + wi
Backpropagation Algorithm
Initialize all weights to small random numbers For each training example do
– For each hidden unit h:
– For each output unit k:
– For each output unit k:
– For each hidden unit h:
– Update each network weight wij:
ijjij xw
i
ihih xwo )(
k
hkhk xwo )(
)()1( kkkkk otoo
k
khkhhh woo )1(
withijijij www
Can This Be Learned?
Input Output
10000000 10000000
01000000 01000000
00100000 00100000
00010000 00010000
00001000 00001000
00000100 00000100
00000010 00000010
00000001 00000001
Learned Hidden Layer Representation
Input Output
10000000 .89 .04 .08 10000000
01000000 .01 .11 .88 01000000
00100000 .01 .97 .27 00100000
00010000 .99 .97 .71 00010000
00001000 .03 .05 .02 00001000
00000100 .22 .99 .99 00000100
00000010 .80 .01 .98 00000010
00000001 .60 .94 .01 00000001
Speeding It Up: Momentumerror E
weight wijwij wij
new
ijw
E
ij
ij w
Ew
Gradient descent
ijij
ij ww
Ew
GD with Momentum
left strt right up
Typical input images
Head pose (1-of-4): 90% accuracyFace recognition (1-of-20): 90% accuracy
ANNs for Face Recognition