+ All Categories
Home > Documents > Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Date post: 24-Dec-2015
Category:
Upload: anthony-johnson
View: 223 times
Download: 0 times
Share this document with a friend
Popular Tags:
57
Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005
Transcript
Page 1: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Tutorial : Echo State Networks

Dan Popovici

University of Montreal (UdeM)

MITACS 2005

Page 2: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Overview

1. Recurrent neural networks: a 1-minute primer

2. Echo state networks

3. Examples, examples, examples

4. Open Issues

Page 3: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

1 Recurrent neural networks

Page 4: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Feedforward- vs. recurrent NN

......

...... ...

...

Input InputOutput Output

• connections only "from left to right", no connection cycle

• activation is fed forward from input to output through "hidden layers"

• no memory

• at least one connection cycle

• activation can "reverberate", persist even with no input

• system with memory

Page 5: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

recurrent NNs, main properties

• input time series output time series

• can approximate any dynamical system (universal approximation property)

• mathematical analysis difficult

• learning algorithms computationally expensive and difficult to master

• few application-oriented publications, little research

......

Page 6: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Supervised training of RNNs

A. Training

Teacher:

Model:

B. Exploitation

Input: Correct (unknown) output:

Model:

in

out

in

out

Page 7: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Backpropagation through time (BPTT)

• Most widely used general-purpose supervised training algorithm

• Idea: 1. stack network copies, 2. interpret as feedforward network, 3. use backprop algorithm.

. . .

original RNN

stack of copies

Page 8: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

What are ESNs?• training method for

recurrent neural networks

• black-box modelling of nonlinear dynamical systems

• supervised training, offline and online

• exploits linear methods for nonlinear modeling

... +

+

Previously

ESN training

Page 9: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Introductory example: a tone generator

Goal: train a network to work as a tuneable tone generator

input: frequency

setting

output: sines of desired frequency

20 40 60 80 1000.25

0.35

0.4

0.45

0.5

20 40 60 80 100

0.1

0.2

0.3

0.4

Page 10: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Tone generator, sampling• For sampling period, drive fixed "reservoir" network with teacher input

and output.

20 40 60 80 1000.25

0.35

0.4

0.45

0.5

20 40 60 80 100

0.1

0.2

0.3

0.4

• Observation: internal states of dynamical reservoir reflect both input and output teacher signals

20 40 60 80 100

-0.5-0.4-0.3-0.2-0.1

20 40 60 80 100

-0.8

-0.6

-0.4

-0.2

20 40 60 80 100

-0.75-0.5

-0.25

0.250.5

0.75

20 40 60 80 100

0.40.50.60.70.80.9

Page 11: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Tone generator: compute weights

• Determine reservoir-to-output weights such that training output is optimally reconstituted from internal "echo" signals.

20 40 60 80 1000.25

0.35

0.4

0.45

0.5

20 40 60 80 100

0.1

0.2

0.3

0.4

20 40 60 80 100

-0.5-0.4-0.3-0.2-0.1

20 40 60 80 100

-0.8

-0.6

-0.4

-0.2

20 40 60 80 100

-0.75-0.5

-0.25

0.250.5

0.75

20 40 60 80 100

0.40.50.60.70.80.9

Page 12: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Tone generator: exploitation

• With new output weights in place, drive trained network with input.

• Observation: network continues to function as in training.

– internal states reflect input and output

– output is reconstituted from internal states

• internal states and output create each other

20 40 60 80 1000.25

0.35

0.4

0.45

0.520 40 60 80 100

0.1

0.2

0.3

0.4

20 40 60 80 100

-0.5-0.4-0.3-0.2-0.1

20 40 60 80 100

-0.8

-0.6

-0.4

-0.2

20 40 60 80 100

-0.75-0.5

-0.25

0.250.5

0.75

20 40 60 80 100

0.40.50.60.70.80.9

echo

reconstitute

Page 13: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Tone generator: generalization

The trained generator network also works with input different from training input

50 100 150 200

0.1

0.2

0.3

0.4

50 100 150 2000.25

0.35

0.4

0.45

0.5

200

0.6

0.7

0.80.9

200

- 0.6- 0.5- 0.4- 0.3

200

- 0.7- 0.6- 0.5- 0.4

200

- 0.75- 0.5

- 0.25

0.250.50.75

A. step input B. teacher and learned output

C. some internal states

Page 14: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Dynamical reservoir

• large recurrent network (100 - units)

• works as "dynamical reservoir", "echo chamber"

• units in DR respond differently to excitation

• output units combine different internal dynamics into desired dynamics

......

input units output un its

recurrent "dynam ical reservoir"

Page 15: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Rich excited dynamics

...

10 20 30 40 50

-0.8-0.6-0.4-0.2

0.20.4

10 20 30 40 50-0.2

0.2

0.4

0.6

10 20 30 40 50-0.2

0.2

0.4

10 20 30 40 50

-0.4-0.2

0.20.40.6

10 20 30 40 50

0.2

0.40.6

0.81

excitation

responses

Unit impulse responses should vary greatly.

Achieve this by, e.g.,

• inhomogeneous connectivity

• random weights

• different time constants

• ...

Page 16: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Notation and Update Rules

))(),1(),1((()1(

))()()1(()1(

)(),(

),(),(

))'(),...,(()(

))'(),...,(()(

))'(),...,(()(

1

1

1

nynxnuWfny

nyWnWxnuWfnx

wWwW

wWwW

nynyny

nxnxnx

nununu

outout

backin

backij

backoutij

out

ijin

ijin

L

N

K

Page 17: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Learning: basic idea

Every stationary deterministic dynamical system can be defined by an equation like

),2(),1(,),1(),()( tdtdtutuhtd

where the system function h might be a monster.

Combine h from the I/O echo functions by selecting suitable DR-to-output weights :

iii

iii

tytuhw

txwtytd

),...)1(),...,((

)()()(

iw

)(txi iw)(ty)(tu

Page 18: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Offline training: task definition

i

ii txwty )()(

Let be the teacher output. . )(td

Compute weights such that mean square error

])()([])()([ 22 txwtdEtytdE ii

is minimized.

Recall

Page 19: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Offline training: how it works

1. Let network run with training signal teacher-forced.

2. During this run, collect network states , in matrix M

3. Compute weights , such that

)(td

)(txi

iw ])()([ 2 txwtdE ii

is minimized

MSE minimizing weight computation (step 3) is a standard operation.

Many efficient implementations available, offline/constructive and online/adaptive.

TMw 1

Page 20: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Practical Considerations

backin WWW ,, Chosen randomly

• Spectral radius of W < 1

• W should be sparse

• Input and feedback weights have to be scaled “appropriately”

• Adding noise in the update rule can increase generalization performance

Page 21: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Echo state network training, summary

• use large recurrent network as "excitable dynamical reservoir (DR)"

• DR is not modified through learning• adapt only DR output weights• thereby combine desired system function from I/O

history echo functions• use any offline or online linear regression algorithm

to minimize error

]))()([( 2tytdE

Page 22: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

3 Examples, examples, examples

Page 23: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Short-term memories

Page 24: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Delay line: scheme

...

t

in p u t: ( )s t

o u tp u ts : ( ) ,..., ( )s t-d 1 s t-d n

s t( )s t-d( 1 )s t-d n( ) ...

Page 25: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Delay line: example

• Network size 400• Delays: 1, 30, 60, 70, 80, 90, 100, 103, 106, 120 steps• Training sequence length N = 2000

10 20 30 40 50

-0.4

-0.2

0.2

0.4

training signal: random walk with resting states

Page 26: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

10 20 30 40 50-0.4-0.2

0.20.4

10 20 30 40 50-0.4-0.2

0.20.4

10 20 30 40 50-0.4-0.2

0.20.4

10 20 30 40 50-0.4-0.2

0.20.4

10 20 30 40 50-0.2

0.20.4

10 20 30 40 50-0.4-0.2

0.20.4

10 20 30 40 50-0.4-0.2

0.20.4

10 20 30 40 50-0.4-0.2

0.20.4

results

correct delayed signals ( ) and network outputs ( )

-1 -30 -60 -90

-100 -103 -106 -120

traces of some DR internal units

1020304050-0.001-0.00050.00050.0010.0015

1020304050-0.002-0.0015-0.001-0.00050.00050.0010.0015

1020304050-0.0004-0.00020.00020.0004

1020304050-0.0015-0.001-0.00050.00050.001

Page 27: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

10 20 30 40 50-0.2-0.10.10.20.30.4

10 20 30 40 50-0.2-0.1

0.10.2

10 20 30 40 50

-0.3-0.2-0.1

0.10.2

10 20 30 40 50-0.2-0.1

0.10.2

10 20 30 40 50-0.3-0.2-0.10.10.20.30.4

10 20 30 40 50-0.3-0.2-0.10.10.20.30.4

10 20 30 40 50

0.10.20.30.4

10 20 30 40 50-0.2-0.10.10.20.30.4

Delay line: test with different input

correct delayed signals ( ) and network outputs ( )

-1 -30 -60 -90

-100 -103 -106 -120

traces of some DR internal units

1020304050-0.001

-0.0005

0.00050.001

1020304050-0.0015-0.001-0.00050.00050.0010.0015

1020304050-0.0004-0.00020.00020.00040.0006

1020304050-0.0005-0.000250.000250.00050.000750.001

Page 28: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

3.2 Indentification of nonlinear systems

Page 29: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Identifying higher-order nonlinear systems

9,...0

1.0)9()(5.1)()(05.0)(3.0)1(k

nunuknynynyny

A tenth-order system

...

20 40 60 80 100

0.1

0.2

0.3

0.4

20 40 60 80 100

0.3

0.4

0.5

0.6

Training setup

)(ny)(nu

Page 30: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Results: offline learning

augmented ESN (800 Parameters) :

NMSEtest = 0.006

previous published state of the art1):

NMSEtrain = 0.24

D. Prokhorov, pers. communication2):

NMSEtest = 0.004

50 100 150 200

0.3

0.4

0.5

0.6

0.7

1) Atiya & Parlos (2000), IEEE Trans. Neural Networks 11(3), 697-708

2) EKF-RNN, 30 units, 1000 Parameters.

Page 31: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

The Mackey-Glass equation

• delay differential equation

• delay > 16.8: chaotic

• benchmark for time series prediction

)(1.0))(1(/)(2.0)( 10 txtxtxtx

50 100 150 200 250 300

0.4

0.6

0.8

1.2

50 100 150 200 250 300

0.6

0.8

1.2

0.4 0.6 0.8 1.2

0.4

0.6

0.8

1.2

0.4 0.6 0.8 1.2

0.4

0.6

0.8

1.2

= 17

= 30

Page 32: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Learning setup

• network size 1000• training sequence N = 3000• sampling rate 1

50 100 150 200 250 300

0.6

0.8

1.2

Page 33: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Results for = 17

Error for 84-step prediction:

NRMSE = 1E-4.2

(averaged over 100 training runs on independently created data)

With refined training method:

NRMSE = 1E-5.1

previous best:

NRMSE = 1E-1.7

original

0.4 0.6 0.8 1.2

0.4

0.6

0.8

1.2

0.4 0.6 0.8 1.2

0.4

0.6

0.8

1.2 learnt model

Page 34: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Prediction with model

visible discrepancy after about 1500 steps

. . .

. . .

Page 35: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Comparison: NRMSE for 84-step prediction

-1.2

-1.2

-1.3

-1.3

-1.7

-1.7

-4.2

-5.1

-1.7

ESN (refined) -5.1

ESN (1+2 K) -4.2

PCR Local Model (McNames 99, 2 K) -1.7

SOM (Vesanto 97, 3K) -1.7

DCS-LLM (Chudy & Farkas 98, 3K) -1.7

AMB (Bersini et al 98, ? K) * -1.3

Neural Gaz (Martinez et al 93, ~4K) -1.3

EPNet (Yao & Liu 97, 0.5 K) -1.2

BPNN (Lapedes & Farber 87, ? K) * -1.2

*) data from survey in Gers / Eck /Schmidhuber 2000

log10(NRMSE)

Page 36: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

3.3 Dynamic pattern recognition

Page 37: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Dynamic pattern detection1)

Training signal:

output jumps to 1 after occurence of pattern instance in input

)(ny)(nu

1) see GMD Report Nr 152 for detailed coverage

Page 38: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Single-instance patterns, training setup

1. A single-instance, 10-step pattern is randomly fixed

4 6 8 10

-0.4

-0.2

0.2

0.4

2. It is inserted into 500-step random signal at positions 200 (for training)350, 400, 450, 500 (for testing)

3. 100-unit ESN trained on first 300 steps (single positive instance! "single shot learning), tested on remaining 200 steps

50 100 150 200

-0.4

-0.2

0.2

0.4

test data: 200 steps with 4 occurances of pattern on random background, desired output: red impulses

the pattern

Page 39: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

50 100 150 200

-0.75-0.5

-0.25

0.250.5

0.751

50 100 150 200-0.02

0.020.040.060.080.1

50 100 150 200-0.05

0.050.1

0.150.2

0.250.3

Single-instance patterns, results

1. trained network response on test data

50 100 150 200-0.1

0.1

0.2

0.3

2. network response after training 800 more pattern-free steps ("negative examples")

3. like 2., but 5 positive examples in training data

DR: 12.4DR: 12.1DR: 6.4

4. comparison: optimal linear filter

DR: 3.5

discrimination ratio DR:

)]([/)]([ 22 ndEndE

Page 40: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Event detection for robots(joint work with J.Hertzberg & F. Schönherr)

Robot runs through office environment, experiences data streams (27 channels) like...

50 100 150 200

0.20.40.60.8

1

50 100 150 200-0.5-0.250.250.5

0.75

50 100 150 200

0.20.40.60.8

1

50 100 150 200

0.20.40.60.8

1

10 sec

infrared distance sensor

left motor speed

activation of "goThruDoor"

external teacher signal, marking event category

Page 41: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Learning setup

......

50 100 150 200

0.20.40.60.8

1

50 100 150 200

0.20.40.60.8

1

50 100 150 200

0.20.40.60.8

1

50 100 150 200

0.20.40.60.8

1

27 (raw) data channels unlimited number of event detector channels100 unit RNN

• simulated robot (rich simulation)

• training run spans 15 simulated minutes

• event categories like

• pass through door

• pass by 90° corner

• pass by smooth corner

Page 42: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Resultscorners w � o corridor end

200 400 600 800 1000-0.2

0.20.40.60.8

corner + corridor end

200 400 600 800 1000-0.2

0.20.40.60.8

corridor end

200 400 600 800 1000

0.20.40.60.8

enter Room from corridor

200 400 600 800 1000

0.20.40.60.8

go through any door

200 400 600 800 1000-0.2

0.20.40.60.8

• easy to train event hypothesis signals

• "boolean" categories possible

• single-shot learning possible

Page 43: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Network setup in training

......

_

a

z

29 input channelscode symbols

. . .

29 output channels for next symbol hypotheses

400 units

Page 44: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Trained network in "text" generation

......

decision mechanism, e.g. winner-take-all

!!

winning symbol is next input

Page 45: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Results

Selection by random draw according to output

yth_upsghteyshhfakeofw_io,l_yodoinglle_d_upeiuttytyr_hsymua_doey_sammusos_trll,t.krpuflvek_hwiblhooslolyoe,_wtheble_ft_a_gimllveteud_ ...

Winner-take-all selection

sdear_oh,_grandmamma,_who_will_go_and_the_wolf_said_the_wolf_said_the_wolf_said_the_wolf_said_the_wolf_said_the_wolf_said_the_wolf ...

Page 46: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

4 Open Issues

Page 47: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

4.2 Multiple timescales

4.3 Additive mixtures of dynamics

4.4 "Switching" memory

4.5 High-dimensional dynamics

Page 48: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Multiple time scales

This is hard to learn (Laser benchmark time series):

100 200 300 400 500

-1

-0.8

-0.6

-0.4

-0.2

0.2

0.4

Reason: 2 widely separated time scales

Approach for future research: ESNs with different time constants in their units

Page 49: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Additive dynamics

This proved impossible to learn:

Reason: requires 2 independent oscillators; but in ESN all dynamics are mutually coupled.

Approach for future research: modular ESNs and unsupervised multiple expert learning

50 100 150 200 250 300

0.1

0.2

0.3

0.4

0.5

0.6

0.7

)311.0Sin()2.0Sin()( nnny

Page 50: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

"Switching" memory

This FSA has long memory "switches":

Generating such sequences not possible with monotonic, area-bounded forgetting curves!

a a

b

c

baaa....aaacaaa...aaabaaa...aaacaaa...aaa...

boundedarea

unbounded width

An ESN simply is not a model for long-term memory!

Page 51: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

High-dimensional dynamics

High-dimensional dynamics would require very large ESN. Example: 6-DOF nonstationary time series one-step prediction

200-unit ESN: RMS = 0.2; 400-unit network: RMS = 0.1; best other training technique1): RMS = 0.02

Approach for future research: task-specific optimization of ESN

100 200 300 400 500 600

0.2

0.4

0.6

0.8

feudu

ucubuauny

21

2122

21)(

1)Prokhorov et al, extended Kalman filtering BPPT. Network size 40, 1400 trained links, training time 3 weeks

Page 52: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Spreading trouble...

• Signals xi(n) of reservoir can be interpreted as vectors in

(infinite-dimensional) signal space • Correlation E[xy] yields inner product < x, y > on this space

• Output signal y(n) is linear combination of these xi(n)

• The more orthogonal the xi(n), the smaller the output weights:

y

yx1x2

x2

x1y = 30 x1 28 x2 y = 0.5 x1 0.7 x2

Page 53: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

• Eigenvectors vk of correlation matrix R = (E[xi x j] ) are orthogonal

signals

• Eigenvalues k indicate what "mass" of reservoir signals xi (all

together) is aligned with vk

• Eigenvalue spread max/ min indicates overall "non-orthogonality"

of reservoir signals

vmaxx1

x2

x2

x1

vmin vmax

vmin max/ min 20 max/ min 1

Page 54: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Large eigenvalue spread large output weights ...

• harmful for generalization, because slight changes in reservoir signals will induce large changes in output

• harmful for model accuracy, because estimation error contained in reservoir signals is magnified (applies not to deterministic systems)

• renders LMS online adaptive learning useless vmaxx1

x2 vmin

max/ min 20

Page 55: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Summary

• Basic idea: dynamical reservoir of echo states + supervised teaching of output connections.

• Seemed difficult: in nonlinear coupled systems, every variable interacts with every other. BUT seen the other way round, every variable rules and echoes every other. Exploit this for local learning and local system analysis.

• Echo states shape the tool for the solution from the task.

Page 56: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

Thank you.

Page 57: Tutorial : Echo State Networks Dan Popovici University of Montreal (UdeM) MITACS 2005.

References

• H. Jaeger (2002): Tutorial on training recurrent neural networks, covering BPPT, RTRL, EKF and the "echo state network" approach. GMD Report 159, German National Research Center for Information Technology, 2002

• Slides used by Herbert Jaeger at IK2002


Recommended