+ All Categories
Home > Documents > Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks...

Introduction to Neural Networksweb.tuat.ac.jp/~s-hotta/IR/NN_intro.pdfLearning in Neural Networks...

Date post: 20-May-2020
Category:
Upload: others
View: 10 times
Download: 0 times
Share this document with a friend
24
Copyright by Nguyen, Hotta and Nakagawa 1 Pattern recognition and Machine Learning Introduction to Neural Networks Introduction to Neural Networks CUONG TUAN NGUYEN SEIJI HOTTA MASAKI NAKAGAWA Tokyo University of Agriculture and Technology
Transcript

Copyright by Nguyen, Hotta and Nakagawa 1

Pattern recognition and Machine Learning Introduction to Neural Networks

Introduction to Neural Networks

CUONG TUAN NGUYEN

SEIJI HOTTA

MASAKI NAKAGAWA

Tokyo University of Agriculture and Technology

Copyright by Nguyen, Hotta and Nakagawa 2

Pattern recognition and Machine Learning Introduction to Neural Networks

Pattern classification

Which category of an input?

Example: Character recognition for input images

Classifier

Output the category of an input

abc

โ€ฆ

y

x

z

Classifier

Feature

extraction

๐‘ฅ1

๐‘ฅ๐‘›

๐‘ฅ2

input output

Copyright by Nguyen, Hotta and Nakagawa 3

Pattern recognition and Machine Learning Introduction to Neural Networks

Supervised learning

Learning by a training dataset:

pair<input, target>

Testing on unseen dataset

Generalization ability

a

b

c

Input

Training dataset

Target

Copyright by Nguyen, Hotta and Nakagawa 4

Pattern recognition and Machine Learning Introduction to Neural Networks

Supervised learning

Classifier

a

abcโ€ฆ

yx

z

output

Prediction

Learning

Learning by a training dataset:

pair<input, target>

Testing on unseen dataset

Generalization ability

Copyright by Nguyen, Hotta and Nakagawa 5

Pattern recognition and Machine Learning Introduction to Neural Networks

Human neuron

Neural Networks, A Simple Explanationhttps://www.youtube.com/watch?v=gcK_

5x2KsLA

Copyright by Nguyen, Hotta and Nakagawa 6

Pattern recognition and Machine Learning Introduction to Neural Networks

Artificial neuron

๐‘“

๐‘ฅ1

๐‘ฅ๐‘›

๐‘ฅ2

Input

๐‘ค2

๐‘ค1

๐‘ค๐‘›

Weights

๐‘›๐‘’๐‘ก๐‘›๐‘’๐‘ก =

๐‘–=1

๐‘›

๐‘ฅ๐‘–๐‘ค๐‘–๐‘ฆ

๐‘ฆ = ๐‘“(๐‘›๐‘’๐‘ก)

Activation function

Weighted connections

Copyright by Nguyen, Hotta and Nakagawa 7

Pattern recognition and Machine Learning Introduction to Neural Networks

Activation function

Controls when neuron should be activated

tanhsigmoid

ReLU Leaky ReLU

linear

Copyright by Nguyen, Hotta and Nakagawa 8

Pattern recognition and Machine Learning Introduction to Neural Networks

Weighted connection + Activation function

A neuron is a feature detector: it is activated for

a specific feature

๐‘“

๐‘ฅ1

๐‘ฅ2

Generated by:

https://playground.tensorflow.org

๐‘ฅ1

๐‘ฅ2

-0.82

0.49

โˆ’0.82๐‘ฅ1 + 0.49๐‘ฅ2 = 0

ReLU

Copyright by Nguyen, Hotta and Nakagawa 9

Pattern recognition and Machine Learning Introduction to Neural Networks

Multi-layer perceptron (MLP)

Neurons are arrange into layers

Each neuron in a layer share the same input from

preceding layer

๐‘ฅ1

๐‘ฅ2

Generated by:

https://playground.tensorflow.org

Layers of neurons

Complex featuresSimple features

Copyright by Nguyen, Hotta and Nakagawa 10

Pattern recognition and Machine Learning Introduction to Neural Networks

MLP as a learnable classifier

Output corresponding to an input is constrained

by weighted connection

These weights are learnable (adjustable)

Input

layer

Hidden

layer Output

layer

Weights

(W)

๐‘ฅ1

๐‘ฅ๐‘›

๐‘ฅ2

input output

๐‘ง1

๐‘ง2

๐‘‹ Neural Networks (W) ๐‘

๐‘ = โ„Ž(๐‘‹,๐‘Š)

Output Input Weight

Copyright by Nguyen, Hotta and Nakagawa 11

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning ability of neural networks

Linear vs Non-linear

With linear activation function: can only learn linear

function

With non-linear activation function: can learn non-

linear function

sigmoid tanh relulinear

Copyright by Nguyen, Hotta and Nakagawa 12

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning ability of neural network

Universal approximation theorem [Hornik, 1991]:

MLP can learn arbitrary function with a single

hidden layer

For complex functions, however, may require large

hidden layer

Deep neural network

Contains many hidden layers, can extract complex

features Hidden

layers Output layer

Input

layer

Copyright by Nguyen, Hotta and Nakagawa 13

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning in Neural Networks

Weighted connection is tuned using the training

data <input, target>

Objective: Networks could output correct targets

corresponding to inputs

Input

patternTarget

Training

dataset

b

Copyright by Nguyen, Hotta and Nakagawa 14

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning in Neural Networks

Loss function (objective function)

Difference between output and target

Learning: optimization process

Minimize the loss (make output match target)

๐‘ฅ1

๐‘ฅ๐‘›

๐‘ฅ2

inputoutput

๐‘ง1

๐‘ง๐‘˜

Target

๐‘ก1

๐‘ก๐‘˜

Loss๏ผˆL๏ผ‰

๐ฟ = ๐‘‡ โˆ’ ๐‘= ๐‘‡ โˆ’ โ„Ž ๐‘‹,๐‘Š= ๐‘™(๐‘Š)

Loss

InputWeights

OutputTarget

Input

layer

Hidden

layer Output

layer

Weights

(W)

Copyright by Nguyen, Hotta and Nakagawa 15

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning in Neural Networks

Gradient vector of ๐‘™ for ๐‘Š๏ผš๐›ป๐‘Š๐‘™

Weight update

Reverse gradient direction

๐›ป๐‘Š๐‘™ =๐œ•๐‘™ ๐‘Š

๐œ•๐‘Š

๐‘Š๐‘ข๐‘๐‘‘๐‘Ž๐‘ก๐‘’ = ๐‘Š๐‘๐‘ข๐‘Ÿ๐‘Ÿ๐‘’๐‘›๐‘ก โˆ’ ๐œ‚๐œ•๐‘™ ๐‘Š

๐œ•๐‘Š

๐‘™ ๐‘Š

๐›ป

๐‘Š

๐œ‚:learning rate

Copyright by Nguyen, Hotta and Nakagawa 16

Pattern recognition and Machine Learning Introduction to Neural Networks

Loss function

Logistic regression

Probabilistic loss function

Binary entropy

Cross entropy

Multimodal

Mean square error

Copyright by Nguyen, Hotta and Nakagawa 17

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning & converge

By update weight using gradient, loss is reduced

and converge to minima

๐‘ค

๐‘™ ๐‘ค

๐‘ค0๐‘ค1

โˆ†๐‘ค

๐‘ค2

๐‘ค3

Copyright by Nguyen, Hotta and Nakagawa 18

Pattern recognition and Machine Learning Introduction to Neural Networks

Learning through all training samples

After updating weights, new training samples is

fed to the networks to continue learning

When all training samples is learnt, networks has

completed one epoch. Networks must run

through many epochs to converge.

Weight update strategy

Stochastic gradient descent (SGD)

Batch update

Mini-batch

Copyright by Nguyen, Hotta and Nakagawa 19

Pattern recognition and Machine Learning Introduction to Neural Networks

Momentum Optimizer

Learning may stuck on a local

minima.

Momentum: โˆ†๐‘ค retains the latest

optimizing direction. It may help

the optimizer overcome the local

minima.

๐‘ค

๐‘™ ๐‘ค

๐‘ค0๐‘ค1

โˆ†๐‘ค

๐‘Š๐‘ข๐‘๐‘‘๐‘Ž๐‘ก๐‘’ = ๐‘Š๐‘๐‘ข๐‘Ÿ๐‘Ÿ๐‘’๐‘›๐‘ก โˆ’ ๐œ‚๐œ•๐‘™ ๐‘Š

๐œ•๐‘Š+ ๐›ผโˆ†๐‘ค

๐œ‚: learning rate๐›ผ: momentum parameter

Copyright by Nguyen, Hotta and Nakagawa 20

Pattern recognition and Machine Learning Introduction to Neural Networks

Overfitting & Generalization

While training, model complexity increases

through each epoch

Overfitting:

โ€ข Model is over-complex

โ€ข Poor generalization: good performance on train set but poor

on test set

Loss

Epochs

Accuracy1.0

0

test

train

Copyright by Nguyen, Hotta and Nakagawa 21

Pattern recognition and Machine Learning Introduction to Neural Networks

Prevent overfitting: Regularization

Weight decaying

Weight noise

Early stopping

Evaluate performance on a validation set

Stop while there is no improvement on validation set

validation

train

Loss

Copyright by Nguyen, Hotta and Nakagawa 22

Pattern recognition and Machine Learning Introduction to Neural Networks

Prevent overfitting: Regularization

Dropout

Randomly drop the neurons with a predefined

probability

Good regularization: large ensembles of networks

Bayesian perspective

Copyright by Nguyen, Hotta and Nakagawa 23

Pattern recognition and Machine Learning Introduction to Neural Networks

Adaptive learning rate

Adam optimizer

Copyright by Nguyen, Hotta and Nakagawa 24

Pattern recognition and Machine Learning Introduction to Neural Networks

Practice

GPU implementation

Keras + Tensorflow


Recommended