Date post: | 19-Dec-2015 |
Category: |
Documents |
Upload: | lenard-mccormick |
View: | 215 times |
Download: | 1 times |
Supervised and Unsupervised learning
and application to NeuroscienceCours CA6b-4
Machine Learning 2
A Generic System
System… …
1x2x
Nx
1y2y
Ly1 2, ,..., Kh h h
1 2, ,..., Nx x xx 1 2, ,..., Kh h hh 1 2, ,..., Ly y yy
Input Variables:Hidden Variables:Output Variables:
Training examples: 1 1 2 2, , , ,..., ,D Dx t x t x t
Parameters: 1 2, ,..., Mw w ww
Machine Learning 3
A Generic System
System… …
1x2x
Nx
1y2y
Ly1 2, ,..., Kh h h
1 2, ,..., Nx x xx 1 2, ,..., Kh h hh 1 2, ,..., Ly y yy
Input Variables:Hidden Variables:Output Variables:
Training examples: ,u ux t
Parameters: 1 2, ,..., Mw w ww
Machine Learning 4
Different types of learning
• Supervised learning: 1. Classification (discrete y), 2. Regression (continuous y).
• Unsupervised learning (no target y). 1. Clustering (h = different groups of types of data).2. Density estimation (h = parameters of probability dist.)3. Reduction (h= a few latent variable describing high
dimensional data).
• Reinforcement learning (y = actions).
Digit recognition (supervised)
Handwritten Digit Recognition
x: pixelized or pre-processed image.t: classs of pre-classified digits (training example.)y: digit class (computed by ML algorithm).h: contours, left/right handed…
Regression (supervised)
Target output
Parameters
Linear classifier
0t
1x
2x
?
1t
1 1 2 2, , , ,..., ,U Ux t x t x t
Training examples
Linear classifier
1x
2x
Decision boundary
w
H x
x
Heavyside function:
0
1
Linear classifier
1x
2x
Decision boundary
w
H x
x
Heavyside function:
0
1
Assumptions
1x
2x
0
1
Multivariate Gaussians
Same covariance
Two classes equiprobable
0 1 0.5p t p t
How do we compute the output?
1x
2x
0
1
1| ,log
0 | ,Tp t
p t
x θw x
x θPositive: Class 1Negative: Class 0
w
Tw x
Orthogonal to decision boundary
How do we compute the output?
1x
2x
0
1
1| ,log
0 | ,Tp t
p t
x θw x
x θ
w
Tw x
Orthogonal to decision boundary
Ty H w x
How do we learn the parameters?
1x
2x
0
1
11 0w
wOrthogonal to decision boundary
Linear discriminant analysis = Direct parameter estimation
How do we learn the parameters?
1x
2x
0
1
w
Orthogonal to decision boundary
Minimize mean-squared error:
2u u
u
E t y w
u T uy H w x
How do we learn the parameters?Minimize mean-squared error:
2u u
u
E t y w
i
i
Ew
w
w
Gradient descent:
iw
E w
How do we learn the parameters?Minimize mean-squared error:
2u u
u
E t y w
i
i
Ew
w
w
Gradient descent:
Stochastic gradient descent:
2u u ue t y w u
u
E ew w iw
E w
How do we learn the parameters?Stochastic gradient descent:
2u u ue t y w
Problem: is not differentiable
3. How do we learn the parameters?Solution: change y to expected class:
1
1| , 1 exp Tp t
w x w x
The output is now the expected class Logistic function
3. How do we learn the parameters?
Stochastic gradient descent:
2u u ue t y w
3. How do we learn the parameters?
Stochastic gradient descent:
2u u ue t y w
1u
u ui i
i
ew w
w
w
Always positive
iw
E w
3. How do we learn the parameters?
Learning based on expected class:
with
Perceptron learning rule
with
Application 1: Neural population decoding
Application 1: Neural population decoding
w
Application 1: Neural population decoding
a
How to find ?w
w
right leftr r
Linear Discriminant Analysis (LDA)
1 1 2
1 2 2
Var Cov ,
Cov , Var
r r r
r r r
Covariance Matrix:
Mean responses:
1
2right
right
rr
r
1
2left
left
rr
r
1right lefta r r
Inverse Covariance matrix
Average neural responses when motion is right
Average neural responses when motion is leftright leftr r
Linear Discriminant Analysis (LDA)
w
Neural network interpretation:
Learning the connections with « Delta rule »:ijw
ix
Each neuron is a classifier
Limitation of 1 layer perceptron:
ijwix
Linearly separable: AND Non linearly separable: XOR
0 1
11x
0 1
1
2x
1x
2x
Extension: multilayer perceptron Towards a universal computer
0 1
1 11x
12x
0 1
1 21x
22x
Learning a multi-layer neural network with backpropTowards a universal computer
Extension: multilayer perceptron Towards a universal computer
Initial error:
Extension: multilayer perceptron Towards a universal computer
Backpropagate errors
Initial error:
Extension: multilayer perceptron Towards a universal computer
1n n nij j iw x e
Backpropagate errors
Apply delta rule:
Initial error:
Big problem: overfitting...
… Backprop was abandoned in the late eighties…
Compensate with very large datasets
9th Order Polynomial
… Resurgence of backprop with big data
Deep convulational networks
Google: Image recognition, speech recognition.
Trained on billions of examples…
Single neurons as 2 layer perceptron
Poirazi and Mel, 2001, 2003
Regression (supervised)
Target output
Parameters
Regression in general
Target output
i ii
y wx,w x
Basis functions
Gaussian noise assumption
How to learn the parameters?
ij ij
ij
Ew w
w
w
Gradient descent:
2
u ui i
u i
E t w
w x
u u uij iw t y x x ,w
But: overfitting...
How to learn the parameters?
ij ij
ij
Ew w
w
w
Gradient descent:
2
2u ui i i
u i i
E t w w
w x
u u uij i ijw t y w x x ,w
Application 3: Neural coding: function approximation with tuning curves
Application 3: Neural coding: function approximation with tuning curves
“Classical view”: multiple spatial maps
Application 3: function approximation in sensorimotor area
In Parietal cortex:
Retinotopic cells gain modulated by eye position
And also head position, arm position …
Snyder and Pouget, 2000
i s
j g ,k s g
Multisensory integration = multidirectional coordinate transform
Experimental validation
Model prediction:Pouget, Duhamel and Deneve, 2004
Avillac et al, 2005
Partially shifting tuning curves
Unsupervised learning ….First example of many
Principal component analysis
1w2w
Orthogonal basis
Principal component analysis (unsupervised learning)
1w2w
Orthogonal basis
x
1h2h
Principal component analysis
Tx w h
Orthogonal basis:
0ik il ki
w w Uncorrelated components:
T Ihh
Note: not the same as independent
y wx
Principal component analysis and dimensionality reduction
Tx w h
K<<N 1 2, ,..., Nx x xx 1 2, ,..., Kh h hh
+ “Noise”
Principal component analysis (unsupervised learning)
1w
Orthogonal basis
x
1hN=2K=1
One solution: eigenvalue decomposition of covariance matrix
D
D
One solution: eigenvalue decomposition of covariance matrix
How do we “learn” the parameters?
K<<N 1 2, ,..., Nx x xx 1 2, ,..., Kh h hh
Standard iterative method
First component:
other components:
PCA: gradient descent
2
,
u ui ij j
i u j
E x w y
w
TT w w y x w y
y wx
« Maximization »
« Expectation »
Generalized Oja rule
Natural images: Weights learnt by PCA
Application of PCA: analysis of large neural datasets
Machens, Brody and Romo, 2010
Application of PCA: analysis of large neural datasets
Time Frequency
Machens, Brody and Romo, 2010