Date post: | 16-Dec-2015 |
Category: |
Documents |
Upload: | corey-golden |
View: | 220 times |
Download: | 1 times |
A Statistical Mechanical Analysis of Online Learning:
Can Student be more Clever than Teacher ?
Seiji MIYOSHIKobe City College of Technology
2
Background (1)
• Batch Learning– Examples are used repeatedly– Correct answers for all examples– Long time– Large memory
• Online Learning– Examples used once are discarded– Cannot give correct answers for all examples– Large memory isn't necessary– Time variant teacher
6
Background (3)
Teacher Student
J1
x1 xN
JNB1
x1 xN
BN
Unlearnable Case( Inoue & Nishimori, Phys. Rev. E, 1997)( Inoue, Nishimori & Kabashima, TANC-97, cond-mat/9708096, 199
7)
11
J1
x1 xN
JN
Output
Inputs
Connection weights
)sgn(Output1
N
iiixJ
Simple Perceptron
N
iiixJ
1
Output
Linear Perceptron
17
Bm+1 = Bm + gm xm
+
NrBm+1 = NrB
m + gmym
NdtNrB
m+2 = NrBm+1 + gm+1ym+1
NrBm+Ndt = NrB
m+Ndt-1 + gm+Ndt-1ym+Ndt-1
NrBm+Ndt = NrB
m + Ndt <gy>
N(rB+drB) = NrB + Ndt <gy>
drB / dt = <gy>
24
Gen
era
liza
tio
n E
rro
r
t=m/N
J
B
0 5 10 15 20
1
2
1.5
0.5
Dynamical Behaviors of Generalization Errors
ηJ = 1.2
Gen
era
liza
tio
n E
rro
r
t=m/N
J
B
0 5 10 15 20
1
1.5
0.5
ηJ = 0.3
25
Dynamical Behaviors of R and l
ηJ = 1.2 ηJ = 0.3
t=m/N
lJ
RJ
RB
lB
R, l
0
0
0.2
0.40.6
0.8
1.01.2
1.41.61.8
2.0
5 10 15 20
lJ
RJ
RB
lB
R, l
t=m/N
0
0.2
0.4
0.6
0.8
1.0
1.2
0 5 10 15 20
27
Steady State
Gen
era
liza
tio
n E
rro
r
0.00.1
1
10
0.5 1.0 1.5 2.0
J
B
η J
0.0 0.5 1.0 1.5 2.0
η J
lJ
lB
l
1
1.5
2
2.5
3
4
3.5
0.0 0.5 1.0 1.5 2.0
η J
R
0
0.2
0.4
0.6
0.8
1.0
RJ
RB
29
Conclusions• Generalization errors of a model compose
d of a true teacher, a moving teacher, and a student that are all linear perceptrons with noises have been obtained analytically using statistical mechanics.
• Generalization errors of a student can be smaller than that of a moving teacher, even if the student only uses examples from the moving teacher.