A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ?...

A Statistical Mechanical Analysis of Online Learning:

Can Student be more Clever than Teacher ?

Seiji MIYOSHIKobe City College of Technology

[email protected]

2

Background (1)

• Batch Learning– Examples are used repeatedly– Correct answers for all examples– Long time– Large memory

• Online Learning– Examples used once are discarded– Cannot give correct answers for all examples– Large memory isn't necessary– Time variant teacher

3

Background (2)

Teacher Student

J1

x1 xN

JNB1

x1 xN

BN

4

Simple Perceptron

J1

x1 xN

JN

Output

Inputs

Connection weights

)sgn(Output1

N

iiixJ

+1

-1

5

Background (2)

Teacher Student

J1

x1 xN

JNB1

x1 xN

BN

B

J

BJ

B J

Learnable Case

6

Background (3)

Teacher Student

J1

x1 xN

JNB1

x1 xN

BN

Unlearnable Case（ Inoue & Nishimori, Phys. Rev. E, 1997)（ Inoue, Nishimori & Kabashima, TANC-97, cond-mat/9708096, 199

7)

7

Background (4)B

J

B

J

Hebbian Learning

Perceptron Learning

B

J

8

Model (1)

BMoving Teacher

JStudent

True Teacher

A

9

Model (2)

Length of Student

Length of Moving Teacher

A

B J

10

Model (3)

A

B J

11

J1

x1 xN

JN

Output

Inputs

Connection weights

)sgn(Output1

N

iiixJ

Simple Perceptron

N

iiixJ

1

Output

Linear Perceptron

12

Model (3)

Linear Perceptrons with Noises

A

B J

13fg

Model (4)Squared Errors

Gradient Method

A

B J

14

ErrorGaussian

Generalization Error

A

B J

15

Differential equations for order parameters

16fg

Model (4)Squared Errors

Gradient Method

A

B J

17

Bm+1 = Bm + gm xm

+

NrBm+1 = NrB

m + gmym

NdtNrB

m+2 = NrBm+1 + gm+1ym+1

NrBm+Ndt = NrB

m+Ndt-1 + gm+Ndt-1ym+Ndt-1

NrBm+Ndt = NrB

m + Ndt <gy>

N(rB+drB) = NrB + Ndt <gy>

drB / dt = <gy>

18


19

Sample Averages

20


21

Analytical Solutions of Order Parameters

22


23

ErrorGaussian

Generalization Error

A

B J

24

Gen

era

liza

tio

n E

rro

r

t=m/N

J

B

0 5 10 15 20

1

2

1.5

0.5

Dynamical Behaviors of Generalization Errors

ηJ ＝ 1.2

Gen

era

liza

tio

n E

rro

r

t=m/N

J

B

0 5 10 15 20

1

1.5

0.5

ηJ ＝ 0.3

25

Dynamical Behaviors of R and l

ηJ ＝ 1.2 ηJ ＝ 0.3

t=m/N

lJ

RJ

RB

lB

R, l

0

0

0.2

0.40.6

0.8

1.01.2

1.41.61.8

2.0

5 10 15 20

lJ

RJ

RB

lB

R, l

t=m/N

0

0.2

0.4

0.6

0.8

1.0

1.2

0 5 10 15 20

26

Analytical Solutions of Order Parameters

27

Steady State

Gen

era

liza

tio

n E

rro

r

0.00.1

1

10

0.5 1.0 1.5 2.0

J

B

η J

0.0 0.5 1.0 1.5 2.0

η J

lJ

lB

l

1

1.5

2

2.5

3

4

3.5

0.0 0.5 1.0 1.5 2.0

η J

R

0

0.2

0.4

0.6

0.8

1.0

RJ

RB

28ηJ

20

B J B J AB JB J

29

Conclusions• Generalization errors of a model compose

d of a true teacher, a moving teacher, and a student that are all linear perceptrons with noises have been obtained analytically using statistical mechanics.

• Generalization errors of a student can be smaller than that of a moving teacher, even if the student only uses examples from the moving teacher.

Date post:	16-Dec-2015
Category:	Documents
Upload:	corey-golden
View:	220 times
Download:	1 times

A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ?...

Documents