Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

Supervised and Unsupervised learning

and application to NeuroscienceCours CA6b-4

Machine Learning 2

A Generic System

System… …

1x2x

Nx

1y2y

Ly1 2, ,..., Kh h h

1 2, ,..., Nx x xx 1 2, ,..., Kh h hh 1 2, ,..., Ly y yy

Input Variables:Hidden Variables:Output Variables:

Training examples: 1 1 2 2, , , ,..., ,D Dx t x t x t

Parameters: 1 2, ,..., Mw w ww

Machine Learning 3

A Generic System

System… …

1x2x

Nx

1y2y

Ly1 2, ,..., Kh h h

1 2, ,..., Nx x xx 1 2, ,..., Kh h hh 1 2, ,..., Ly y yy

Input Variables:Hidden Variables:Output Variables:

Training examples: ,u ux t

Parameters: 1 2, ,..., Mw w ww

Machine Learning 4

Different types of learning

• Supervised learning: 1. Classification (discrete y), 2. Regression (continuous y).

• Unsupervised learning (no target y). 1. Clustering (h = different groups of types of data).2. Density estimation (h = parameters of probability dist.)3. Reduction (h= a few latent variable describing high

dimensional data).

• Reinforcement learning (y = actions).

Digit recognition (supervised)

Handwritten Digit Recognition

x: pixelized or pre-processed image.t: classs of pre-classified digits (training example.)y: digit class (computed by ML algorithm).h: contours, left/right handed…

Regression (supervised)

Target output

Parameters

Linear classifier

0t

1x

2x

?

1t

1 1 2 2, , , ,..., ,U Ux t x t x t

Training examples

Linear classifier

1x

2x

Decision boundary

w

H x

x

Heavyside function:

0

1

Linear classifier

1x

2x

Decision boundary

w

H x

x

Heavyside function:

0

1

Assumptions

1x

2x

0

1

Multivariate Gaussians

Same covariance

Two classes equiprobable

0 1 0.5p t p t

How do we compute the output?

1x

2x

0

1

1| ,log

0 | ,Tp t

p t

x θw x

x θPositive: Class 1Negative: Class 0

w

Tw x

Orthogonal to decision boundary

How do we compute the output?

1x

2x

0

1

1| ,log

0 | ,Tp t

p t

x θw x

x θ

w

Tw x


Ty H w x

How do we learn the parameters?

1x

2x

0

1

11 0w

wOrthogonal to decision boundary

Linear discriminant analysis = Direct parameter estimation

How do we learn the parameters?

1x

2x

0

1

w


Minimize mean-squared error:

2u u

u

E t y w

u T uy H w x

How do we learn the parameters?Minimize mean-squared error:

2u u

u

E t y w

i

i

Ew

w

w

Gradient descent:

iw

E w

How do we learn the parameters?Minimize mean-squared error:

2u u

u

E t y w

i

i

Ew

w

w

Gradient descent:

Stochastic gradient descent:

2u u ue t y w u

u

E ew w iw

E w

How do we learn the parameters?Stochastic gradient descent:

2u u ue t y w

Problem: is not differentiable

3. How do we learn the parameters?Solution: change y to expected class:

1

1| , 1 exp Tp t

w x w x

The output is now the expected class Logistic function

3. How do we learn the parameters?


2u u ue t y w



2u u ue t y w

1u

u ui i

i

ew w

w

w

Always positive

iw

E w


Learning based on expected class:

with

Perceptron learning rule

with

Application 1: Neural population decoding


w


a

How to find ?w

w

right leftr r

Linear Discriminant Analysis (LDA)

1 1 2

1 2 2

Var Cov ,

Cov , Var

r r r

r r r

Covariance Matrix:

Mean responses:

1

2right

right

rr

r

1

2left

left

rr

r

1right lefta r r

Inverse Covariance matrix

Average neural responses when motion is right

Average neural responses when motion is leftright leftr r

Linear Discriminant Analysis (LDA)

w

Neural network interpretation:

Learning the connections with « Delta rule »:ijw

ix

Each neuron is a classifier

Limitation of 1 layer perceptron:

ijwix

Linearly separable: AND Non linearly separable: XOR

0 1

11x

0 1

1

2x

1x

2x

Extension: multilayer perceptron Towards a universal computer

0 1

1 11x

12x

0 1

1 21x

22x

Learning a multi-layer neural network with backpropTowards a universal computer


Initial error:


Backpropagate errors

Initial error:


1n n nij j iw x e

Backpropagate errors

Apply delta rule:

Initial error:

Big problem: overfitting...

… Backprop was abandoned in the late eighties…

Compensate with very large datasets

9th Order Polynomial

… Resurgence of backprop with big data

Deep convulational networks

Google: Image recognition, speech recognition.

Trained on billions of examples…

Single neurons as 2 layer perceptron

Poirazi and Mel, 2001, 2003

Regression (supervised)

Target output

Parameters

Regression in general

Target output

i ii

y wx,w x

Basis functions

Gaussian noise assumption

How to learn the parameters?

ij ij

ij

Ew w

w

w

Gradient descent:

2

u ui i

u i

E t w

w x

u u uij iw t y x x ,w

But: overfitting...

How to learn the parameters?

ij ij

ij

Ew w

w

w

Gradient descent:

2

2u ui i i

u i i

E t w w

w x

u u uij i ijw t y w x x ,w

Application 3: Neural coding: function approximation with tuning curves

Application 3: Neural coding: function approximation with tuning curves

“Classical view”: multiple spatial maps

Application 3: function approximation in sensorimotor area

In Parietal cortex:

Retinotopic cells gain modulated by eye position

And also head position, arm position …

Snyder and Pouget, 2000

i s

j g ,k s g

Multisensory integration = multidirectional coordinate transform

Experimental validation

Model prediction:Pouget, Duhamel and Deneve, 2004

Avillac et al, 2005

Partially shifting tuning curves

Unsupervised learning ….First example of many

Principal component analysis

1w2w

Orthogonal basis

Principal component analysis (unsupervised learning)

1w2w

Orthogonal basis

x

1h2h

Principal component analysis

Tx w h

Orthogonal basis:

0ik il ki

w w Uncorrelated components:

T Ihh

Note: not the same as independent

y wx

Principal component analysis and dimensionality reduction

Tx w h

K<<N 1 2, ,..., Nx x xx 1 2, ,..., Kh h hh

+ “Noise”

Principal component analysis (unsupervised learning)

1w

Orthogonal basis

x

1hN=2K=1

One solution: eigenvalue decomposition of covariance matrix

D

D

One solution: eigenvalue decomposition of covariance matrix

How do we “learn” the parameters?

K<<N 1 2, ,..., Nx x xx 1 2, ,..., Kh h hh

Standard iterative method

First component:

other components:

PCA: gradient descent

2

,

u ui ij j

i u j

E x w y

w

TT w w y x w y

y wx

« Maximization »

« Expectation »

Generalized Oja rule

Natural images: Weights learnt by PCA

Application of PCA: analysis of large neural datasets

Machens, Brody and Romo, 2010

Application of PCA: analysis of large neural datasets

Time Frequency

Machens, Brody and Romo, 2010

Date post:	19-Dec-2015
Category:	Documents
Upload:	lenard-mccormick
View:	215 times
Download:	1 times

Supervised and Unsupervised learning and application to Neuroscience Cours CA6b-4.

Documents