+ All Categories
Home > Documents > Intelligent Systems Discriminative Learning, Neural...

Intelligent Systems Discriminative Learning, Neural...

Date post: 03-Jan-2021
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
27
WS2014/2015, 28.01.2015 Intelligent Systems Discriminative Learning, Neural Networks Carsten Rother, Dmitrij Schlesinger
Transcript
Page 1: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

WS2014/2015, 28.01.2015

Intelligent Systems

Discriminative Learning, Neural Networks

Carsten Rother, Dmitrij Schlesinger

Page 2: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks

1. Discriminative learning

2. Neurons and linear classifiers:

1) Perceptron-Algorithm

2) Non-linear decision rules

3. Feed-Forward Neural Networks:

1) Architecture

2) Modeling abilities

3) Learning — Error Back-Propagation

2

Outline

Page 3: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 3

Discriminant Functions

• Let a parameterized family of probability distributions be given. • Each particular p.d. leads to a classifier (for a fixed loss). • The final goal is the classification (applying the classifier).

Generative approach: 1. Learn the parameters of the probability distribution (e.g. ML) 2. Derive the corresponding classifier (e.g. Bayes) 3. Apply the classifier for test data

Discriminative approach: 1. Learn the unknown parameters of the classifier directly 2. Apply the classifier for test data

If the family of classifiers is “well parameterized”, it is not necessary to consider the underlying probability distribution at all !!!

Page 4: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks

Assume, we know the probability model: two Gaussians of equal variance, i.e. , ,

Assume, at the end we want to do the Maximum A-posteriori Decision

Consequently (see the previous lecture), the classifier is a separating plane (a linear classifier).

Let us search during the learning just for a “good” separating plane instead to learn the unknown parameters of the underlying probability model

4

Example: two Gaussians

k 2 {1, 2} x 2 Rnp(k, x) = p(k) · p(x|k)

p(x|k) = 1

(

p2⇡�)

nexp

� ||x� µk||2

2�

2

Page 5: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 5

Neuron

Hunan vs. Computer (two nice pictures from Wikipedia)

Page 6: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 6

Neuron (McCulloch and Pitt, 1943)

Input: Weights: Activation: Output:

Step-function

Sigmoid-function (differentiable!!!)

x 2 Rn

w 2 Rn

b 2 Ry = f(y0 � b) = f(hw, xi � b)

f(y0) =

⇢1 if y0 > 0

0 otherwise

f(y0) =1

1 + exp(�y0)

hx,wi 7 b

Page 7: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 7

Geometric interpretation

Let be normalized, i.e.

is the length of the projection of onto .

Separating plane:

Neuron implements a linear classifier

hx,wi = ||x|| · ||w|| · cos�

w ||w|| = 1

) ||x|| · cos�x w

hx,wi = const

Page 8: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 8

A special case — boolean functions

Input: Output:

Find and so, that

Disjunction, other boolean functions, but XOR

x = (x1, x2), xi 2 {0, 1}y = x1&x2

w b step(w1x1 + w2x2 � b) = x1&x2

x1 x2 y

0 0 00 1 01 0 01 1 1

w1 = w2 = 1, b = 1.5

Page 9: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 9

The (one possible) learning taskGiven: training data Find: so that for all

For a step-neuron: system of linear inequalities

Solution is not unique in general !!!

L =�(x1

, y

1), (x2, y

2) . . . (xL, y

L)�, x

l 2 Rn, y

l 2 {0, 1}w 2 Rn, b 2 R f(hxl

, wi � b) = y

l

l = 1, . . . , L

⇢hxl

, wi > b if y

l = 1hxl

, wi < b if y

l = 0

Page 10: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 10

“Preparation 1”

Eliminate the bias: The trick − modify the training data

Example in 1D

non-separable without the bias separable without the bias

x = (x1, x2, . . . , xn) ) x̃ = (x1, x2, . . . , xn, 1)

w = (w1, w2, . . . , wn) ) w̃ = (w1, w2, . . . , wn, b)

hxl, wi 7 b ) hx̃l

, w̃i 7 0

Page 11: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 11

“Preparation 2”

Remove the sign: The trick − the same

All in all:

l= x̃

lfor all l with y

l= 1

l= �x̃

lfor all l with y

l= 0

⇢hxl

, wi > b if y

l = 1hxl

, wi < b if y

l = 0) hx̂l

, w̃i > 0

Page 12: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 12

Perceptron Algorithm (Rosenblatt, 1958)Solution of a system of linear inequalities:

1. Search for an equation that is not satisfied, i.e.

2. If not found − Stop else update go to 1.

• The algorithm terminates in a finite number of steps if a solution exists (the training data are separable)

• The solution is a convex combination of the data points

hxl, wi 0

w

new = w

old + x

l

Page 13: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 13

An example problem

Consider another decision rule for a real valued feature :

It is not a linear classifier anymore but a polynomial one.

The task is again to learn the unknown coefficients given the training data

Is it also possible to do that in a “Perceptron-like” fashion ?

x 2 R

anxn + an�1x

n�1 + . . .+ a1x+ a0 =X

i

aixi 7 0

ai�(xl

, y

l) . . .�, x

l 2 R, y

l 2 {0, 1}

Page 14: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 14

An example problem

The idea: reduce the given problem to the Perceptron-task. Observation: although the decision rule is not linear with respect to , it is still linear with respect to the unknown coefficients

The same trick again − modify the data:

In general, it is very often possible to learn non-linear decision rules by the Perceptron algorithm using an appropriate transformation of the input space (more examples at seminars).

Extension — Support Vector Machines, Kernels

x

ai

w = (an, an�1, . . . , a1, a0)

x̃ = (xn, x

n�1, . . . , x, 1)

)X

i

aixi = hx̃, wi

Page 15: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 15

Many classes

Before: two classes − a mapping Now: many classes − a mapping

How to generalize ? How to learn ? Two simple (straightforward) approaches:

The first one: “one vs. all” − there is one binary classifier per class, that separates this class from all others.

The classification is ambiguous in some areas.

Rn ! {0, 1}Rn ! {1, 2, . . . ,K}

Page 16: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 16

Many classes

Another one: “pairwise classifiers” − there is a classifier for each class pair

The goal: • no ambiguities, • parameter vectors

Less ambiguous, better separable.

However:

binary classifiers instead of in the previous case.

Fisher Classifier

K(K � 1)/2K

K)

Page 17: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 17

Fisher classifier

Idea: in the binary case the output is the “more likely to be 1” the greater is the scalar product → generalization:

The input space is partitioned into the set of convex cones.

Geometric interpretation (let be normalized)

Consider projections of an input vector onto vectors

yhx,wi

y = argmax

khx,wki

wk

wk

x

Page 18: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 18

Feed-Forward Neural Networks

Output level

i-th level

First level

Input level

Special case: , Step-neurons – a mapping

yij = f⇣X

j0

wijj0yi�1j0 � bij⌘

m = 1 Rn ! {0, 1}

Page 19: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 19

What we can do with it ?

One level – single step-neuron – linear classifier

Page 20: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 20

What we can do with it ?Two levels, “&”-neuron as the output – intersection of half-spaces

If the number of neurons is not limited, all convex subspaces can be implemented with an arbitrary precision.

Page 21: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 21

What we can do with it ?

Three levels– all possible mappings as union of convex subspaces:

Three levels (really even less) are enough to implement all possible mappings !!!

Rn ! {0, 1}

Page 22: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 22

Learning – Error Back-Propagation

Learning task:

Given: training data Find: all weights and biases of the net.

Error Back-Propagation is a gradient descent method for Feed-Forward-Networks with Sigmoid-neurons

First, we need an objective (error to be minimized)

Now: derive, build the gradient and go.

F (w, b) =X

l

�k

l � y(xl;w, b)�2 ! min

w,b

�(xl

, k

l), . . .�, x

l 2 Rn, k

l 2 R

Page 23: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 23

Error Back-PropagationWe start from a single neuron and just one example . Remember:

Derivation according to the chain-rule:

F (w, b) = (k � y)

2

y =

1

1 + exp(�y

0)

y

0= hx,wi =

X

j

xjwj

@F (w, b)

@wj=

@F

@y· @y

@y0· @y0

@wj=

= (y � k) · exp(�y

0)

(1 + exp(�y

0))

2· xj = � · d(y0) · xj

(x, k)

Page 24: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 24

Error Back-PropagationIn general: compute “errors” at the -th level from all -s at the -th level – propagate the error.

The Algorithm (for just one example ):

1. Forward: compute all and (apply the network), compute the output error ;

2. Backward: compute errors in the intermediate levels:

3. Compute the gradient and apply it:

For many examples – just sum them up.

� �(i+ 1)

i

(x, k)

y y0

�n = yn � k

�ij =X

j0

�i+1j0 · d(y0i+1j0) · wi+1j0j

@F

@wijj0= �ij · d(y0ij) · yi�1j0

Page 25: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 25

A special case – Convolutional Networks

Local features – convolutions with a set of predefined masks (more detailed in lectures “Computer Vision”).

Page 26: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 26

Convolutional Networks

Yann LeCun, Koray Kavukcuoglu and Clement Farabet Convolutional Networks and Applications in Vision

Page 27: Intelligent Systems Discriminative Learning, Neural Networksds24/lehre/is_ws_2014/is_06...Intelligent Systems: Discriminative Learning, Neural Networks 28.01.2015 3 Discriminant Functions

28.01.2015Intelligent Systems: Discriminative Learning, Neural Networks 27

Neural Networks — Summary

1. Discriminative learning: learn decision strategies directly without to consider the underlying probability model at all

3. Neurons are linear classifiers. 4. Learning by the Perceptron-algorithm. 5. Many non-linear decision rules can be transformed to linear ones by

the corresponding transformation of the input space. 6. Classification into more than two classes is possible either by naïve

approaches (e.g. by a set of “one-vs-all” simple classifiers) or more principled by Fisher-Classifier.

5. Feed-Forward Neural Networks implement (arbitrary complex) decision strategies.

6. Error Back-Propagation for learning (gradient descent). 7. Some special cases are useful in Computer Vision.


Recommended