Copyright by Nguyen, Hotta and Nakagawa 1
Pattern recognition and Machine Learning Introduction to Neural Networks
Introduction to Neural Networks
CUONG TUAN NGUYEN
SEIJI HOTTA
MASAKI NAKAGAWA
Tokyo University of Agriculture and Technology
Copyright by Nguyen, Hotta and Nakagawa 2
Pattern recognition and Machine Learning Introduction to Neural Networks
Pattern classification
Which category of an input?
Example: Character recognition for input images
Classifier
Output the category of an input
abc
โฆ
y
x
z
Classifier
Feature
extraction
๐ฅ1
๐ฅ๐
๐ฅ2
input output
Copyright by Nguyen, Hotta and Nakagawa 3
Pattern recognition and Machine Learning Introduction to Neural Networks
Supervised learning
Learning by a training dataset:
pair<input, target>
Testing on unseen dataset
Generalization ability
a
b
c
Input
Training dataset
Target
Copyright by Nguyen, Hotta and Nakagawa 4
Pattern recognition and Machine Learning Introduction to Neural Networks
Supervised learning
Classifier
a
abcโฆ
yx
z
output
Prediction
Learning
Learning by a training dataset:
pair<input, target>
Testing on unseen dataset
Generalization ability
Copyright by Nguyen, Hotta and Nakagawa 5
Pattern recognition and Machine Learning Introduction to Neural Networks
Human neuron
Neural Networks, A Simple Explanationhttps://www.youtube.com/watch?v=gcK_
5x2KsLA
Copyright by Nguyen, Hotta and Nakagawa 6
Pattern recognition and Machine Learning Introduction to Neural Networks
Artificial neuron
๐
๐ฅ1
๐ฅ๐
๐ฅ2
Input
๐ค2
๐ค1
๐ค๐
Weights
๐๐๐ก๐๐๐ก =
๐=1
๐
๐ฅ๐๐ค๐๐ฆ
๐ฆ = ๐(๐๐๐ก)
Activation function
Weighted connections
Copyright by Nguyen, Hotta and Nakagawa 7
Pattern recognition and Machine Learning Introduction to Neural Networks
Activation function
Controls when neuron should be activated
tanhsigmoid
ReLU Leaky ReLU
linear
Copyright by Nguyen, Hotta and Nakagawa 8
Pattern recognition and Machine Learning Introduction to Neural Networks
Weighted connection + Activation function
A neuron is a feature detector: it is activated for
a specific feature
๐
๐ฅ1
๐ฅ2
Generated by:
https://playground.tensorflow.org
๐ฅ1
๐ฅ2
-0.82
0.49
โ0.82๐ฅ1 + 0.49๐ฅ2 = 0
ReLU
Copyright by Nguyen, Hotta and Nakagawa 9
Pattern recognition and Machine Learning Introduction to Neural Networks
Multi-layer perceptron (MLP)
Neurons are arrange into layers
Each neuron in a layer share the same input from
preceding layer
๐ฅ1
๐ฅ2
Generated by:
https://playground.tensorflow.org
Layers of neurons
Complex featuresSimple features
Copyright by Nguyen, Hotta and Nakagawa 10
Pattern recognition and Machine Learning Introduction to Neural Networks
MLP as a learnable classifier
Output corresponding to an input is constrained
by weighted connection
These weights are learnable (adjustable)
Input
layer
Hidden
layer Output
layer
Weights
(W)
๐ฅ1
๐ฅ๐
๐ฅ2
input output
๐ง1
๐ง2
๐ Neural Networks (W) ๐
๐ = โ(๐,๐)
Output Input Weight
Copyright by Nguyen, Hotta and Nakagawa 11
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning ability of neural networks
Linear vs Non-linear
With linear activation function: can only learn linear
function
With non-linear activation function: can learn non-
linear function
sigmoid tanh relulinear
Copyright by Nguyen, Hotta and Nakagawa 12
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning ability of neural network
Universal approximation theorem [Hornik, 1991]:
MLP can learn arbitrary function with a single
hidden layer
For complex functions, however, may require large
hidden layer
Deep neural network
Contains many hidden layers, can extract complex
features Hidden
layers Output layer
Input
layer
Copyright by Nguyen, Hotta and Nakagawa 13
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning in Neural Networks
Weighted connection is tuned using the training
data <input, target>
Objective: Networks could output correct targets
corresponding to inputs
Input
patternTarget
Training
dataset
b
Copyright by Nguyen, Hotta and Nakagawa 14
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning in Neural Networks
Loss function (objective function)
Difference between output and target
Learning: optimization process
Minimize the loss (make output match target)
๐ฅ1
๐ฅ๐
๐ฅ2
inputoutput
๐ง1
๐ง๐
Target
๐ก1
๐ก๐
Loss๏ผL๏ผ
๐ฟ = ๐ โ ๐= ๐ โ โ ๐,๐= ๐(๐)
Loss
InputWeights
OutputTarget
Input
layer
Hidden
layer Output
layer
Weights
(W)
Copyright by Nguyen, Hotta and Nakagawa 15
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning in Neural Networks
Gradient vector of ๐ for ๐๏ผ๐ป๐๐
Weight update
Reverse gradient direction
๐ป๐๐ =๐๐ ๐
๐๐
๐๐ข๐๐๐๐ก๐ = ๐๐๐ข๐๐๐๐๐ก โ ๐๐๐ ๐
๐๐
๐ ๐
๐ป
๐
๐:learning rate
Copyright by Nguyen, Hotta and Nakagawa 16
Pattern recognition and Machine Learning Introduction to Neural Networks
Loss function
Logistic regression
Probabilistic loss function
Binary entropy
Cross entropy
Multimodal
Mean square error
Copyright by Nguyen, Hotta and Nakagawa 17
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning & converge
By update weight using gradient, loss is reduced
and converge to minima
๐ค
๐ ๐ค
๐ค0๐ค1
โ๐ค
๐ค2
๐ค3
Copyright by Nguyen, Hotta and Nakagawa 18
Pattern recognition and Machine Learning Introduction to Neural Networks
Learning through all training samples
After updating weights, new training samples is
fed to the networks to continue learning
When all training samples is learnt, networks has
completed one epoch. Networks must run
through many epochs to converge.
Weight update strategy
Stochastic gradient descent (SGD)
Batch update
Mini-batch
Copyright by Nguyen, Hotta and Nakagawa 19
Pattern recognition and Machine Learning Introduction to Neural Networks
Momentum Optimizer
Learning may stuck on a local
minima.
Momentum: โ๐ค retains the latest
optimizing direction. It may help
the optimizer overcome the local
minima.
๐ค
๐ ๐ค
๐ค0๐ค1
โ๐ค
๐๐ข๐๐๐๐ก๐ = ๐๐๐ข๐๐๐๐๐ก โ ๐๐๐ ๐
๐๐+ ๐ผโ๐ค
๐: learning rate๐ผ: momentum parameter
Copyright by Nguyen, Hotta and Nakagawa 20
Pattern recognition and Machine Learning Introduction to Neural Networks
Overfitting & Generalization
While training, model complexity increases
through each epoch
Overfitting:
โข Model is over-complex
โข Poor generalization: good performance on train set but poor
on test set
Loss
Epochs
Accuracy1.0
0
test
train
Copyright by Nguyen, Hotta and Nakagawa 21
Pattern recognition and Machine Learning Introduction to Neural Networks
Prevent overfitting: Regularization
Weight decaying
Weight noise
Early stopping
Evaluate performance on a validation set
Stop while there is no improvement on validation set
validation
train
Loss
Copyright by Nguyen, Hotta and Nakagawa 22
Pattern recognition and Machine Learning Introduction to Neural Networks
Prevent overfitting: Regularization
Dropout
Randomly drop the neurons with a predefined
probability
Good regularization: large ensembles of networks
Bayesian perspective
Copyright by Nguyen, Hotta and Nakagawa 23
Pattern recognition and Machine Learning Introduction to Neural Networks
Adaptive learning rate
Adam optimizer