+ All Categories
Home > Documents > Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview...

Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview...

Date post: 26-Mar-2015
Category:
Upload: molly-mclain
View: 247 times
Download: 6 times
Share this document with a friend
Popular Tags:
33
Artificial Neural Networks
Transcript
Page 1: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Artificial Neural Networks

Page 2: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Overview

Motivation & Goals Perceptron-Learning Gradient Algorithms & the -Rule Multi Layer Nets The Backpropagation Algorithm Example Application: Recognition of Faces More Network Architectures Application Areas of ANNs

Page 3: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Model: The Brain

A complex learning system with simple learning units: the neurons.

A network of ~ neurons where each of the neurons has ~ connections.

Transmission time of a neuron: ~(speed versus flexibility)

Observation: face recognition time = ~ parallelism.

1010

10 1 sec

10 3 sec

104

Page 4: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Goals of ANNs

Learning instead of programming Learning complex functions with simple learning units Parallel computation (e.g. layer model) Network parameter shall be automatically

found by a learning algorithm An ANN black box.

output

input

Page 5: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

When are ANNs used? Input instances are described as a vector

of discrete or real values The output of a target function is

a single value or a vector of discrete or real valued attributes

Input data contains noise Target function unknown

or difficult to describe

output

input

Page 6: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The Perceptron (as a NN Unit) (1/2)

A linear unit with threshold.

x1

i 1

n

wi x i 1 no x ,..., x

x2

xn 1 no x ,..., x = 1 if wx > Θ

1 otherwise

w1

w2

wn

Page 7: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The Perceptron (as a NN Unit) (2/2)

x1

n

i ii=0

w x 1 no x ,..., x

x2

xn

w1

w2

wn

1 1 n n 0 0 n n 0 0w x +....+ w , x > θ w x +....+ w , x > 0; x = 1, w = θ

0 no x ,..., x = 1 if wx > 0

1 otherwise

θx0

Page 8: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Geometrical Classification (Decision Surface)

A perceptron can classify only linear separable training data. We need networks of these units.

+ +

- +

linear separable

Ex. OR-Function

0.30.5

0.5

+

+

-

-

not linear separable

Ex. XOR-Function

x2

x1

x2

x1

Page 9: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The Perceptron Learning Rule (1/2)

Training of a perceptron = Learning the best hypothesis, which classifies all training data

A hypothesis = a vector of weights

Page 10: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The Perceptron Learning Rule (2/2)

Idea:

1.Initialise the weights with random values

2.Apply the perceptron iterative to each training example and modify the weights according to the learning rule

where: t : target output o: actual output : the learning rate

3.Step 2 is repeated for all training examples until all of them are correctly classified.

i iΔw = η t o x

i i iw w + Δw

Page 11: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The Perceptron-Learning Rule: Convergence

The perceptron learning rule converges if: The training examples are linear separable and

is chosen small enough (e.g. 0.1). Intuitive explanation:

it = o Δw = 0 ok

i i i

i i i

w for x > 0 Δw > 0t = 1 o = 1 ok

w for x < 0 Δw < 0

Page 12: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The Gradient Descend Algorithm & the -Rule (1/5)

Better: the -Rule converges even if the training examples are not linear separable.

Idea: Use the gradient descend algorithm to search for the best hypothesis in hypothesis space. The best hypothesis is the one which maximally minimises the square error.

Basis of the backpropagation algorithm.

Page 13: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The Gradient Descend Algorithm & the -Rule (2/5)

Because of steadiness the -learning rule is applied on a linear unit instead of on the perceptron.

Linear unit:

The square error to be minimised:

,where:

x1

1

xn o x = w x

2

d dd D

1E w = t o

2

D: set of training examples : target output of example d : computed output of example dt d

od

Page 14: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

● Geometric Interpretation:H-Space, error function (e.g. 2-dimensional).

The Gradient Descend Algorithm & the -Rule (3/5)

Page 15: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The Gradient Descend Algorithm & the -Rule (4/5)

Gradient:

Learning Rule:

Derivation

0 n

E EE w = ...

w w

ii

EΔw = η E w Δw = η

w

2

d ddi i

E 1= t o

w w 2

d d d dd i

1= 2 t o t o

2 w

d d d dd i

= t o t w xw

d d i,dd

= t o x

Page 16: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Standard methode:Do until termination criterion is satisfied

1. Initialise

2. For all Compute o For all

3. For all

i i iΔw Δw + η t o x

i i iw w + Δw

iΔw = 0

x, t

wi :

iw

The Gradient Descend Algorithm & the -Rule (5/5)

Page 17: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The -rule

Stochastic methode:Do until termination criterion is satisfied

1. Initialise

2. For all Compute o For all

iΔw = 0

d = x, t D

wi : i i d d i,dw w + η t o x the Rule

Page 18: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Remarks

Advantages of the stochastic approximation of the gradient:

quicker convergence (incremental update of the weights).

less likely to stuck in a local minimum.

Page 19: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Remarks

Single perceptrons learn only linear separable training data. We need multi layer networks of several 'neurons'.

Example: the XOR problem:

0.5

0.5

0.5

x1

x2

1.

1.

-1. -1.

1.

1.x1+

+

-

-

not linear separable

x2

Page 20: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

XOR-Function

0.5

0.50.5

0

0

1.

1.

-1. -1.1.

1.1.

0.5

0.50.5

0

11.

-1. -1.1.

1.

0.5

0.50.5

1

0

1.

1.

-1. -1.1.

1. 0.5

0.50.5

1

1

1.

1.

-1. -1.1.

1.

Page 21: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Supervised Learning Backpropagation NN

Since 1985 the BP algorithm has become one of the widely spread and successful learning algorithms for NNs.

Idea: The minimum of the error function of a learning function is searched by descending in direction of the gradient.

The vector of weights which minimises the error in the network is seen as the solution of the learning problem.

So the gradient of the error function must exist for all points inside the weight space. must be differentiableo w , x

Page 22: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Learning in Backpropagation Networks

The sigmoid unit:

Properties of the sigmoid unit:

with dσ x

= σ x 1 σ xdx

x

1σ x =

1+ e

Page 23: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Definitions used by the BP Algorithm

Hidden units

Output units

Input units

Bac

kpro

paga

tion

i

j

j,iw

j,ix

: input from node i to unit j : weight of the jth input to unit Ioutputs: set of output units : output of unit i : target output of unit i : error term of unit n

j,ix

j,iw

oi

t i

nn

Eδ =

net

Page 24: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

The Backpropagation Algorithm

Initialise all weights to small random numbers Until termination criterion satisfied do

For each training example do

1. Compute the network's outcome

2. For each output unit k

3. For each hidden unit h

4. Update each network weightwhere

j,i j j,iΔw = ηδ xj,i j,i j,iw w + Δw

h h h k,h kk outputs

δ o 1 o w δ

k k k k kδ o 1 o t o

Page 25: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Derivation of the BP Algorithm For each training example d:

where

i

j

Output unitsInput units

Hidden units

jd d dj,i

j,i j j,i j

netE E E= = x

w net w net

w j ,i

j j,i j,ii

net = w x(weighted sum of inputs for unit j)

j,i j,i j,iw w + Δw with dj,i

j,i

EΔw = η

w

2

d k kk outputs

1E w = t o

2

and

x j , i

Page 26: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Derivation of the BP Algorithm Output layer:

Hidden layer:

Downstream(j): the set of units whose immediate inputs include the output of unit j

jd d

j j j

j

oE E=

net o net

j j

d d kj j k k, j

k Downstream j k Downstream jj k j

E E net= = ... = o (1 o ) w

net net net

And therefore j,i j j,iΔw = ηδ x

dj j

j

jjj j

j j

E= .... = t o

o

σ neto= = o 1 o

net net

j

j,i j j j j j,iΔw = η(t o ) o (1 o ) x

dσ x= σ x 1 σ x

dx

Page 27: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Derivation of the BP Algorithm (Explanation)

22 j jdk k j j j j j j

k outputsj j j j

t oE 1 1 1= t o = t o = 2 t o = t o

o o 2 o 2 2 o

d d k

k Downstream jj k j

E E net= =

net net net

j j j k k, jk Downstream j

δ = o 1 o δ w

dj

j

Ewhy δ =

net

jk kk k

k Downstream j k Downstream jj j j

onet netδ = δ =

net o net

j

k k, j k k, j j jk Downstream j k Downstream jj

oδ w = δ w o 1 o

net

nn

Eδ =

net

k k,i k,ii

net = w x

Page 28: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Convergence of the BP Algorithm

Generalisation to arbitrary acyclic directed network architectures is simple.

In practice it works well, but it sometimes sticks in a local but not always global minimum introduction of a momentum (“escape routes”) :

Disadvantage: global minima can be left out by this “jumping”!

Training can take thousands of iterations slow (accelerated by momentum).

Over-fitting versus adaptability of the NN.

i, j n i, j i, jΔw n = ηδ x + αΔw n 1

Page 29: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Example: Recognition of Faces

Given: 32 photos of 20 persons, in different positions:

Direction of view: right, left, up or straight.

With and without sunglasses.

Expression: happy, sad, neutral...

Page 30: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Goal: Classification of the photos concerning the direction of view

Preparation of the input:

• Rastering the photos acceleration of the learning process

• Input vector = the grayscale values of the 30 * 32 pixels.

• Output vector = (left, straight, right, up).

Solution = max(left, right, up, straight).

e.g. o = (0.9, 0.1, 0.1, 0.1) looking to the left

Example: Recognition of Faces

Page 31: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Recognition of the direction of view

Page 32: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

Recurrent Neural Networks They are directed cyclic networks “with memory”

Outputs at time t = Inputs at time t+1 The cycles allow to feed results back into the network.

(+) They are more expressive than acyclic networks (-) Training of recurrent networks is expensive. In some cases recurrent networks can be trained using a

variant of the Backpropagation algorithm. Example: Forecast of the next stock market prices y(t+1),

based on the current indicator x(t) and the last indicator x(t-1).

Page 33: Artificial Neural Networks. Lehrstuhl für Informatik 2 Gabriella Kókai: Maschine Learning Overview Motivation & Goals Perceptron-Learning Gradient Algorithms.

Lehrstuhl für Informatik 2

Gabriella Kókai: Maschine Learning

x(t-1)

b

x(t) c(t)

y(t+1)y(t+1)

x(t)

x(t)

x(t-2)

c(t)

c(t-1)

c(t-2)

Feedforward network

Recurrentnetwork

Recurrent network(unfolded in time)

Recurrent NNs


Recommended