+ All Categories
Home > Documents > Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is...

Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is...

Date post: 12-Apr-2018
Category:
Upload: dinhlien
View: 230 times
Download: 4 times
Share this document with a friend
33
Supervised Learning in Neural Networks Sumio Watanabe Tokyo Institute of Technology Advanced Topics in Mathematical Information Sciences II April 24, May 1, 2015
Transcript
Page 1: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

Supervised Learning in Neural Networks

Sumio WatanabeTokyo Institute of Technology

Advanced Topics in Mathematical Information Sciences II

April 24, May 1, 2015

Page 2: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

Quick Review

Page 3: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory 3

Supervised Learning

Y=f(x,w)

X1, X2, …, Xn

Y1, Y2, …, Yn

Supervisor

Learner

Samples

Page 4: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

Mathematics of Supervised Learning

Training Samples

X1, X2, …, Xn

Y1, Y2, …, Yn

Test Samples

X

Y

TrueInformationSource

NeuralNetwork

y=f(x,w) q(x,y)

Page 5: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

x1 x2 x3 xN

w1 w2 w3 wN

∑ wi xiN

i=1

σ( ∑ wi xi + θ)N

i=1

Neuron

Synapse weight

biasθ

One Neuron Model

Output

Input

Page 6: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

Three-Layered Neural Network

x1 x2 xM

f1 f2 fN

Output Layer

Hidden Layer

Input Layer

Page 7: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

1. Deep neural network

2. Sequential learning and auto-encoder

3. Convolution learning

Contents

Page 8: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

Deep neural network (DNN)

Recently, neural networks which havedeep layers are being studied.

It is reported that DNNs have bettergeneralization performance.

f1 f2 fN

x1 x2 xM

2015

1960

x1 x2 xM

f1 f2 fN

1985

x1 x2 xM

f1 f2 fN

Page 9: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

DefinitionIt is easy to define a deep network.

fi =σ( ∑uij xj + θj )H

j=1Simpleperceptron

M

k=1fi =σ( ∑uij σ( ∑ wjkxk + θj) + φi)

H

j=1

Three-layerNeural network

H2

k=1fi =σ( ∑uij σ( ∑ wjk σ( ∑ vkl ( ……. )l+ θj) + φi)

H1

j=1

M

l=1DNN

Page 10: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

Learning and Generalization

E(w) = (1/n) Σ (Yi-f(Xi,w))2n

i=1

Training Error

G(w) = ∫∫ (y-f(x,w))2 q(x,y) dxdy

Generalization Error

The main purpose of learning is to minimize G(w), but we haveonly training samples. Minimizing E(w) is not equivalent to minimizing G(w).

Page 11: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

Steepest Descent : Error back-propagation

E(w) = ― ∑(fi -yi )2N

i=1

12

oj=σ(∑wjkok+θj)M

k=1fi =σ(∑uijoj+φi)

H

j=1Inference

∂E∂wjk

= ∑ (fi -yi ) ∂ fi∂wjk

N

i=1

∂ fi∂wjk

= ∂ fi∂oj

∂ oj

∂wjk

All parameters can be optimized by steepest descent of E(w) by

Square error

recursive

Page 12: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Regularization : Ridge and Lasso

E(w) = (1/n) Σ (Yi-f(Xi,w))2 + R(w)n

i=1

Ridge R(w) = λ Σ |wj|2

Lasso R(w)= λ Σ |wj|

λ>0 : Hyperparameter

DNN has many parameters which are to be optimized.Regularization terms are necessary.

Remark. It is still difficult to find the optimal hyperparameter.

Page 13: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

Steepest Descent ?

Minimize this error by optimizing all parameters.

f1 f2 fN

x1 x2 xM

Supervised data

Outputs are far from inputs.Mathematically speaking, all parameters can be optimized by steepest descent, but it is difficult for a neural network to find the nonlinear relation between distant inputs and outputs.

input

output

We need methodology to build a deep neural network.

Page 14: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

1. Deep neural network

2. Sequential learning and auto-encoder

3. Convolution learning

Contents

Page 15: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Deep Learning Methodology

(2) Auto-encoder

(1) Sequential layer learning

(3) Convolution network

Three methods are being studied.

Page 16: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

(1) Sequential Layer Learning

x1 x2 xM

f1 f2 fN

x1 x2 xM

f1 f2 fN

x1 x2 xM

f1 f2 fN

Synapse weights in lower layers are copied from a trained shallow network to deeper one.

copy copy

SupervisorSupervisor

Supervisor

Page 17: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Parameter SpaceE(w)

W is in the higher dimensional Euclidean Space

Many local and complicated structure

E(w) is minimized at |w|=infinity.We need an appropriatefinite and local parameter.

Sequential layer learning maylead the training result to some appropriate point.

Page 18: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

(2) Auto-encoder

X1 X2 XM

f1 f2 fN

Firstly, a bottleneck network is trained,then its weight is copied.

Smaller than M

X1 X2 XM

X1 X2 XM

SupervisorInput

Input

Page 19: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Bottleneck neural network

X1 X2 XM

If a set of inputs are on the K dimensionalmanifold in M dimensional Euclidean space, then its essential coordinates can be extracted automatically.

= Nonlinear Principal Component Analysis

X1 X2 XMK dimensional

manifold Input

Input

M dimensionalEuclidean space

Same

Page 20: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

ExampleInput 5 * 5Training samples 2000Testining samples 2000

0

6 image

Input 25

A network

Hidden 8

0 6

Output 2

Hidden 6

Hidden 4

Page 21: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

(0) Only Error backpropagation

mean 213.5

mean 265.5

std 414.7

std 388.0

Training Error

Generalization Error

Training results strongly depend on initial synapse weights.

Page 22: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

(1) Sequential Layer Learning

Training error Testing errorMean 4.1Std 1.8

mean 61.6std 7.0

Page 23: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

(2) Auto-encoder

Std 3.4Mean 61.3Std 8.1

mean 5.3Training Error Test Error

Page 24: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

1. Deep neural network

2. Sequential learning and auto-encoder

3. Convolution learning

Contents

Page 25: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Data structureIn several data such as images or time series,neighborhoods have local covariance.

Image: a pixel depends on its neighbors.

Time series: a future value can be predicted by the past.

Convolutional network is useful to analyze such data.

fi =σ( ∑uij σ( ∑ wjkxk + θj) + φi)|i-j|<3 |j-k|<3

Synapse weights outside of neighbors are zero.

Page 26: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Convolutional network

In image analysis,a network is made by nonlinear convolution processingfrom local to global.

LocalInformation

Globalinformation

Page 27: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Multi-resolution Analysis

Multi-resolution analysis (MRA)

is a method of analyzing images by integration from local to global data.

Convolution network can beunderstood as a kind ofMRA.

Page 28: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1Mathematical Learning Theory

Time Delay Neural Network

Human’s speech containslocal abbreviation, expansion, and contraction.

A layered neural network wasproposed so as to be adapted to such local nonlinear changing.

This is called TDNN.

speech sound

timeRecognitionresult

time

Page 29: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

Example: Time series

x(t) = f(x(t-1),x(t-2),…,x(t-27)) + noise,

Time series prediction problem : how to find a nonlinear function

where { x(t) } is the set of prices of Hakusai (Japanese vegetable like cabbage) of each month 1970 – 2013.

Before processing, a linear prediction was optimized

x(t) = a1 x(t-1) + a2x(t-2) +・・・+ a27 x(t-27).

Linear prediction: Generalization Error 1.55Training Error 1.29

2015/5/1

Page 30: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Example

Month

Month

Price

Price

Training resultRed: TrueBlue: prediction

The data in e-stats of Japanese Government are used. http://www.e-stat.go.jp/SG1/estat/eStatTopPortal.do

Test resultRed: TrueBlue: Prediction

Page 31: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Comparison of DNN and ConvNN

A deep neural Network Convolution Network

TimeTime

Generalization Error 1.56Training Error 1.01

Generalization Error 1.35Training Error 1.28

Page 32: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Deep Learning and Feature extraction

(1) Automatic extraction of feature

(2) Preparing feature by human

By using the deep neural network, the optimal feature representation may be found. Discovery of unknown structure enables us to “mine data”. However, it may be difficult or if it is possible, it needs heavy computational costs.

If an appropriate feature is found by human before training, then computational cost in learning can be reduced.However, discovery of unknown feature does not occur.

Page 33: Supervised Learning in Neural Networks - Watanabe Lab.watanabe- · In image analysis, a network is made ... Deep Learning and Feature extraction (1) ... Supervised learning in neural

2015/5/1 Mathematical Learning Theory

Summary

(1) Supervised learning in neural networks is introduced. (April,24th)

(2) Methodology of deep neural network (May, 1st)

(a) Definitions of Training and Generalization Errors

(b) Steepest Descent as an learning algorithm

(a) Sequential layer learning

(b) Auto-encoder

(c) Convolution network


Recommended