+ All Categories
Home > Documents > A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ?...

A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ?...

Date post: 16-Dec-2015
Category:
Upload: corey-golden
View: 220 times
Download: 1 times
Share this document with a friend
Popular Tags:
29
A Statistical Mechanical Analysis of Online Learning: Can Student be more Clever than Teacher ? Seiji MIYOSHI Kobe City College of Technology [email protected]
Transcript

A Statistical Mechanical Analysis of Online Learning:

Can Student be more Clever than Teacher ?

Seiji MIYOSHIKobe City College of Technology

[email protected]

2

Background (1)

• Batch Learning– Examples are used repeatedly– Correct answers for all examples– Long time– Large memory

• Online Learning– Examples used once are discarded– Cannot give correct answers for all examples– Large memory isn't necessary– Time variant teacher

3

Background (2)

Teacher Student

J1

x1 xN

JNB1

x1 xN

BN

4

Simple Perceptron

J1

x1 xN

JN

Output

Inputs

Connection weights

)sgn(Output1

N

iiixJ

+1

-1

5

Background (2)

Teacher Student

J1

x1 xN

JNB1

x1 xN

BN

B

J

BJ

B J

Learnable Case

6

Background (3)

Teacher Student

J1

x1 xN

JNB1

x1 xN

BN

Unlearnable Case( Inoue & Nishimori, Phys. Rev. E, 1997)( Inoue, Nishimori & Kabashima, TANC-97, cond-mat/9708096, 199

7)

7

Background (4)B

J

B

J

Hebbian Learning

Perceptron Learning

B

J

8

Model (1)

BMoving Teacher

JStudent

True Teacher

A

9

Model (2)

Length of Student

Length of Moving Teacher

A

B J

10

Model (3)

A

B J

11

J1

x1 xN

JN

Output

Inputs

Connection weights

)sgn(Output1

N

iiixJ

Simple Perceptron

N

iiixJ

1

Output

Linear Perceptron

12

Model (3)

Linear Perceptrons with Noises

A

B J

13fg

Model (4)Squared Errors

Gradient Method

A

B J

14

ErrorGaussian

Generalization Error

A

B J

15

Differential equations for order parameters

16fg

Model (4)Squared Errors

Gradient Method

A

B J

17

Bm+1 = Bm + gm xm

+

NrBm+1 = NrB

m + gmym

NdtNrB

m+2 = NrBm+1 + gm+1ym+1

NrBm+Ndt = NrB

m+Ndt-1 + gm+Ndt-1ym+Ndt-1

NrBm+Ndt = NrB

m + Ndt <gy>

N(rB+drB) = NrB + Ndt <gy>

drB / dt = <gy>

18

Differential equations for order parameters

19

Sample Averages

20

Differential equations for order parameters

21

Analytical Solutions of Order Parameters

22

Differential equations for order parameters

23

ErrorGaussian

Generalization Error

A

B J

24

Gen

era

liza

tio

n E

rro

r

t=m/N

J

B

0 5 10 15 20

1

2

1.5

0.5

Dynamical Behaviors of Generalization Errors

ηJ = 1.2

Gen

era

liza

tio

n E

rro

r

t=m/N

J

B

0 5 10 15 20

1

1.5

0.5

ηJ = 0.3

25

Dynamical Behaviors of R and l

ηJ = 1.2 ηJ = 0.3

t=m/N

lJ

RJ

RB

lB

R, l

0

0

0.2

0.40.6

0.8

1.01.2

1.41.61.8

2.0

5 10 15 20

lJ

RJ

RB

lB

R, l

t=m/N

0

0.2

0.4

0.6

0.8

1.0

1.2

0 5 10 15 20

26

Analytical Solutions of Order Parameters

27

Steady State

Gen

era

liza

tio

n E

rro

r

0.00.1

1

10

0.5 1.0 1.5 2.0

J

B

η J

0.0 0.5 1.0 1.5 2.0

η J

lJ

lB

l

1

1.5

2

2.5

3

4

3.5

0.0 0.5 1.0 1.5 2.0

η J

R

0

0.2

0.4

0.6

0.8

1.0

RJ

RB

28ηJ

20

B J B J AB JB J

29

Conclusions• Generalization errors of a model compose

d of a true teacher, a moving teacher, and a student that are all linear perceptrons with noises have been obtained analytically using statistical mechanics.

• Generalization errors of a student can be smaller than that of a moving teacher, even if the student only uses examples from the moving teacher.


Recommended