+ All Categories
Home > Documents > Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning...

Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning...

Date post: 21-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
19
udies on Goal-Directed Feature Learnin Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition in Vision” rkshop at the Frankfurt Institute for Advanced Studies (FIAS) November 27-28, 2008
Transcript
Page 1: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

Studies on Goal-Directed Feature Learning

Cornelius Weber, FIAS

presented at:“Machine Learning Approachesto Representational Learning

and Recognition in Vision”

Workshop at the Frankfurt Institute for Advanced Studies (FIAS),November 27-28, 2008

Page 2: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

for taking action, we need only the relevant features

x

y

z

Page 3: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

models’ background & overview:

- unsupervised feature learning models are enslaved by bottom-up input

- reward-modulated activity leads to input selection Nakahara, Neur Comp 14, 819-44 (2002)

- reward-modulated STDP Izhikevich, Cereb Cortex 17, 2443-52 (2007), Florian, Neur Comp 19/6, 1468-502 (2007); Farries & Fairhall, Neurophysiol 98, 3648-65 (2007); ...

- RL models learn partitioning of input space e.g. McCallum, PhD Thesis, Rochester, NY, USA (1996)

- reward-modulated Hebb Triesch, Neur Comp 19, 885-909 (2007), Roelfsema & Ooyen, Neur Comp 17, 2176-214 (2005); Franz & Triesch, ICDL (2007)

(model 3 presented here, extending to delayed reward)

- feature-pruning models learn all features but forget the irrelevant ones (models 1 & 2 presented here)

Page 4: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

sensory input

reward

action

purely sensory data, in which one feature type is linked to reward

the action is not controlled by the network

Page 5: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

model 1: obtaining the relevant features

1) build a feature detecting model

2) learn associations between features

3) register the average features’ reward

4) spread value along associative connections

5) check whether actions in-/decrease value

6) remove features where action doesn’t matter

irrelevant relevant

Page 6: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

Földiák, Biol Cybern 64, 165-70 (1990)

→ homogeneous activity distr.

features

thresholds

late

ral w

eigh

ts (

deco

rrel

atio

n)

selectedfeatures

asso

ciat

ive

wei

ghts

actioneffect

Weber & Triesch, Proc ICANN, 740-9 (2008);Witkowski, Adap Behav, 15(1), 73-97 (2007);Toussaint, Proc NIPS, 929-36 (2003);Weber, Proc ICANN, 1147-52 (2001)

→ relevant features indentified

Page 7: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

sensory input reward

motor-sensory data (again, one feature type is linked to reward)

the network selects the action (to get reward)

irrelevantsubspace

relevantsubspace

Page 8: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

model 2: removing the irrelevant inputs

1) initialize feature detecting model

(but continue learning)

2) perform actor-critic RL, taking the features’

outputs as state representation

- works despite irrelevant features

- challenge: relevant features will occur

at different frequencies

- nevertheless, features may remain stable

3) observe the critic: puts negative value

on irrelevant features after long training

4) modulate (multiply) learning by critic’s value

frequency

value

Page 9: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

Lücke & Bouecke, Proc ICANN, 31-7 (2005)

features

criticvalue action weights

→ relevant subspace discovered

Page 10: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

model 3: learning only the relevant inputs

1) top level: reinforcement learning model (SARSA)

2) lower level: feature learning model (SOM / K-means)

3) modulate learning by δ, in both layers

RL weights

featureweights

input

action

Page 11: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

model 3: SARSA with SOM-like activation and update

Page 12: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

relevantsubspace RL action weights

subspacecoverage

feature weights

Page 13: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

RL action weights

feature weights

input reward 2 actions (not shown)

data

learning ‘long bars’ data

Page 14: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

RL action weights

feature weights

input data:bars controlled by actions‘up’, ‘down’, ‘left’, ‘right’

learning the ‘short bars’ data

reward

action

Page 15: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

short bars in 12x12 average # of steps to goal: 11

Page 16: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

cortex

striatum

GPi (output of basal ganglia)

biological interpretation

- no direct feedback from striatum to cortex

- convergent mapping → little receptive field overlap, consistent with subspace discovery

feature/subspace detection

action selection

Page 17: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

Discussion

- models 1 and 2 learn all features and identify the relevent features

- either requires homogeneous feature distribution (model 1)

- or can do only subspace- (no real feature) detection (model 2)

- model 3 is very simple: SARSA on SOM with δ-feedback

- learns only the relevant subspace or features in the first place

- link between unsupervised- and reinforcement learning

Sponsors

BernsteinFocusNeurotechnology

EU project 231722“IM-CLeVeR”call FP7-ICT-2007-3

Frankfurt Institutefor Advanced StudiesFIAS

Page 18: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

early learning late learning

Jog et al,Science, 286,1158-61 (1999)

relevant features change during learning

units in the basal ganglia are active at the junctionduring early task acquisition but not at a later stage

T - maze decision task (rat)

Page 19: Studies on Goal-Directed Feature Learning Cornelius Weber, FIAS presented at: “Machine Learning Approaches to Representational Learning and Recognition.

evidence for reward/action modulated learning in the visual system

Shuler & Bear, "Reward timing in the primary visual cortex", Science, 311, 1606-9 (2006)

Schoups et al. "Practising orientation identification improves orientation coding in V1 neurons" Nature, 412, 549-53 (2001)


Recommended