+ All Categories
Home > Documents > Optimization Methods for Machine Learning Radial Basis...

Optimization Methods for Machine Learning Radial Basis...

Date post: 10-Nov-2020
Category:
Upload: others
View: 5 times
Download: 0 times
Share this document with a friend
31
Interpolation RBF Regularized RBF Generalized RBF XOR problem References Optimization Methods for Machine Learning Radial Basis function Laura Palagi http://www.dis.uniroma1.it/palagi Dipartimento di Ingegneria informatica automatica e gestionale A. Ruberti Sapienza Universit` a di Roma Via Ariosto 25 RBF Networks L. Palagi 1 / 29
Transcript
Page 1: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Optimization Methods for Machine LearningRadial Basis function

Laura Palagihttp://www.dis.uniroma1.it/∼palagi

Dipartimento di Ingegneria informatica automatica e gestionale A. RubertiSapienza Universita di Roma

Via Ariosto 25

RBF Networks L. Palagi 1 / 29

Page 2: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Interpolation problem

Given p distinct points in Rn:

X = {x i ∈ Rn, i = 1, . . . ,p},

and a corresponding set of real numbers

Y = {y i ∈ R, i = 1, . . . ,p}.

The interpolation problem consists in finding a function f : Rn→ R, in agiven class of real functions F , which satisfies:

f (x i ) = y i i = 1, . . . ,P. (1)

RBF Networks L. Palagi 2 / 29

Page 3: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Interpolation properties

For n = 1 the Interpolation pb. can be solved explicitly using polynomials

f (x) =P−1

∑i=0

ci ti

For n > 1, the 2-layer MLP with g not polynomial satisfies

P

∑j=1

v jg(w j T x i −bj) = y i , i = 1, . . . ,P

for some w j ∈ Rn, and v j ,bj ∈ R.

MLP possesses the universal approximation property, i.e. can approximatearbitrarily well a continuous function (provided that an arbitrarily largenumber of units is available).

RBF Networks L. Palagi 3 / 29

Page 4: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Interpolation properties

Being an universal approximator may be not enough from theoretical pointof view. An important property is the

existence of a best approximation

Informally: given a function f belonging to some set of functions F andgiven a subset A of F find an element of A which is closest to f . Ifd(f ,g) is the distance between two elements f ,g in F , we consider theproblem

d∗A = infa∈A

d(f ,a)

If there exists a∗ ∈A that attains the infimum, namely d∗A = d(f ,a∗) thena∗ is the best approximation to f from A .

RBF Networks L. Palagi 4 / 29

Page 5: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Best approximation properties

MLP does not have the best approximation property [1].

Consider another approximation scheme

f (x) =P

∑j=1

vjφ(‖x−x j‖), (2)

where‖ · ‖ is the euclidean norm and φ(‖x−x j‖) (φ : R+→ R) is a suitablecontinuous function, called radial basis function (RBF) since it is assumedthat the argument of φ is the radius r = ‖x−x j‖.The data points x j ∈ X are the referred as the centers.

RBF Networks L. Palagi 5 / 29

Page 6: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Gaussian RBF

φ(r) = e−(r/σ)2

with radius r > 0 and spread σ > 0 (very sensitive parameter)

It is a localized function in the sense it goes to zero with increasing radius(far from the centers)

RBF Networks L. Palagi 6 / 29

Page 7: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Multiquadric

φ(r) = (r2 + σ2)1/2

with radius r > 0 and spread σ > 0 (very sensitive parameter)

It grows with the distance from the centers

RBF Networks L. Palagi 7 / 29

Page 8: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Inverse Multiquadric

φ(r) = (r2 + σ2)−1/2

with radius r > 0 and spread σ > 0 (very sensitive parameter)

It goes to zero with increasing radius (as the Gaussian)

RBF Networks L. Palagi 8 / 29

Page 9: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Other RBF

φ(r) = r linear splineφ(r) = r3 cubic splineφ(r) = r2 log r , thin plate spline.

RBF Networks L. Palagi 9 / 29

Page 10: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Interpolation by RBF

By imposing the interpolation conditions we get:

P

∑j=1

vjφ(‖x i −x j‖) = y i , i = 1, . . . ,P. (3)

Let define the vectors v =(v1 · · · vP

)T, and y =

(y1 · · · yP

)T, and

the symmetric P×P matrix Φ with elements

Φi ,j = φ(‖x i −x j‖), 1≤ i , j ≤ P,

system (3) can be written as:

Φv = y .

It is a linear system of P equations in P unknowns.

RBF Networks L. Palagi 10 / 29

Page 11: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Matrix Φ is non singular, provided that P ≥ 2, that the interpolationpoints x j , j = 1, . . . ,P are distinct and using

Gaussian (Φ positive definite)

the multiquadric

the inverse multiquadric (Φ positive definite)

linear spline

Thus, the interpolation problem Φv = y admits a unique solution.

When φ

pos. def. it can be computed by minimizing the (strictly) convex quadraticfunction in RP

F (v) =1

2vTΦv −yT v ,

whose gradient is given by ∇F (v) = Φv −y .

RBF Networks L. Palagi 11 / 29

Page 12: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Matrix Φ is non singular, provided that P ≥ 2, that the interpolationpoints x j , j = 1, . . . ,P are distinct and using

Gaussian (Φ positive definite)

the multiquadric

the inverse multiquadric (Φ positive definite)

linear spline

Thus, the interpolation problem Φv = y admits a unique solution. When φ

pos. def. it can be computed by minimizing the (strictly) convex quadraticfunction in RP

F (v) =1

2vTΦv −yT v ,

whose gradient is given by ∇F (v) = Φv −y .

RBF Networks L. Palagi 11 / 29

Page 13: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

From Interpolation to approximation properties

Because of the remarkable properties of the RBFs, the RBF method is oneof the most often applied approaches in multivariable interpolation.

This has motivated the attempt of employing RBFs also withinapproximation algorithms for the solution of classification and regressionproblems in data mining.�� ��What does it change ?

RBF Networks L. Palagi 12 / 29

Page 14: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Regularized RBF neural networksConsider the data set {(xp,yp), p = 1, . . . ,P} obtained by randomsampling (in the presence of noise) of a function belonging to some spaceof functions F .

We already discussed that the problem of recovering the function or anestimate of it from a finite set of data is ill posed in the sense that it hasan infinite number of solutions.

In order to choose one particular and stable solution we need to imposesome regularization property on the function, that is the problem is

Minimization of the Regularized Empirical Risk

minf

P

∑p=1

`(f (xp;ω)−yp)︸ ︷︷ ︸empirical error

+ R(λ , f )︸ ︷︷ ︸regularization term

RBF Networks L. Palagi 13 / 29

Page 15: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Regularized RBF neural networks

The most common form of a priori knowledge consists in assuming thatthe function is smooth enough in the sense that two similar inputscorrespond to two similar outputs.

Smoothness is a measure of the ”oscillatory” behavior of f . Within a classof differentiable functions, one function is said to be smoother thananother one if it oscillates less. A smoothness functional R(f ,λ ) is definedand we consider

minf

1

2

P

∑p=1

[yp− f (xp)]2 + λE (f ),

where the first term is enforcing closeness to the data and the secondsmoothness while the regularization parameter λ > 0 controls the tradeoffbetween these two terms.

RBF Networks L. Palagi 14 / 29

Page 16: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Regularized RBF neural networks

It can be shown that for a wide class of smoothness functionals E2(f ), thesolutions of the minimization all have the same form

f (x) =P

∑i=1

viφ(‖x− c i‖)

Centers coincides with inputs

c i = x i , i = 1, . . . ,P

and weights solve the regularized system

(Φ + λ I )v = y

whereΦ = {Φij}i ,j=1,...,P = {φ(‖x i −x j‖)}i ,j=1,...,P

RBF Networks L. Palagi 15 / 29

Page 17: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Equivalent convex quadratic optimization

Assume that the matrix Φ is positive semidefinite, then the unique solutionof the system

(Φ + λ I )v = y

can be computed by minimizing the (strictly) convex quadratic function inRP

FR(v) =1

2vT (Φ + λ I )v −yT v =

1

2vTΦv −yT v︸ ︷︷ ︸

F (v)

+1

2λ‖v‖2︸ ︷︷ ︸

regularization

whose gradient is given by ∇FR(v) = (Φ + λ I )v −y .

RBF Networks L. Palagi 16 / 29

Page 18: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

2-layer Regularized RBF network

f (x) =P

∑i=1

viφ(‖x− c i‖)

y(x)

-x t����

�������3

�����

������:

QQQQQQQQQQQs

φ(‖x−x1‖)

φ(‖x−x2‖)

φ(‖x−xP‖)

v1

v2

vP

XXXXXXXXXXz

����������3

QQQQQQQQQQs n+ -uu

RBF Networks L. Palagi 17 / 29

Page 19: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

2-layer Regularized RBF network

RBF are universal approximator: any continuous function can beapproximated arbitrarily well on a compact set, provided a sufficientlylarge number of units, and for an appropriate choice of the parameters

RBF possess the best approximation property, namely there exists thebest approximation and in most cases (under assumptions oftensatisfied) is unique (RBF is linear in parameters v) [1]

The value of λ can be selected by employing cross validationtechniques and this may require that system (Φ + λ I )v = y is solvedseveral times.

The spread σ is a hyper-parameter too

RBF Networks L. Palagi 18 / 29

Page 20: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

2-layer Generalized RBF network

When P is very large, the cost of constructing a regularized RBF networkcan be prohibitive. Indeed, the computation of the weights v ∈ RP requiresthe solution of a possible ill conditioned linear system, which costs O(P3).

Generalized RBF neural network are used where the number N of neuralunits is much less than P.

The output of the network can be defined by

y(x) =N

∑j=1

vjφj(‖x− cj‖), (4)

where both the centers cj ∈ Rn and the weights vj j = 1, . . . ,N must beselected appropriately.

RBF Networks L. Palagi 19 / 29

Page 21: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

2-layer Generalized RBF network

y(x)

-x t����

�������3

������

�����:

QQQQQQQQQQQs

φ(‖x− c1‖)

φ(‖x− c2‖)

φ(‖x− cN‖)

v1

v2

vN

XXXXXXXXXXz

����������3

QQQQQQQQQQs n+ -uu

RBF Networks L. Palagi 20 / 29

Page 22: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

2-layer Generalized RBF network

GRBF are universal approximator: any continuous function can beapproximated arbitrarily well on a compact set, provided a sufficientlylarge number of units, and for an appropriate choice of the parameters

GRBF may NOT possess the best approximation property. However ifthe centers are fixed, the approximation problem becomes linear withrespect to w and the existence of a best approximation is guaranteed

In the general case, both the centers and the weights are treated asvariable parameters and the approximation is nonlinear

As N << P, GRBF performs inherently a structural stabilizationwhich may prevent the occurrence of overtraining.

RBF Networks L. Palagi 21 / 29

Page 23: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

2-layer Generalized RBF network

GRBF are universal approximator: any continuous function can beapproximated arbitrarily well on a compact set, provided a sufficientlylarge number of units, and for an appropriate choice of the parameters

GRBF may NOT possess the best approximation property. However ifthe centers are fixed, the approximation problem becomes linear withrespect to w and the existence of a best approximation is guaranteed

In the general case, both the centers and the weights are treated asvariable parameters and the approximation is nonlinear

As N << P, GRBF performs inherently a structural stabilizationwhich may prevent the occurrence of overtraining.

RBF Networks L. Palagi 21 / 29

Page 24: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

An example: Exclusive OR

The logical function XOR

XORp x1 x2 yp

1 -1 -1 -12 -1 1 13 1 -1 14 1 1 -1

ccs

s1

2

3

4

-

6x2

x1

Perceptron (linear separator) doesn’t work

RBF Networks L. Palagi 22 / 29

Page 25: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Two layer MLP

w

w

-

-

x2

-��������7x1 S

SSSSSSSw

-

sign(·)

sign(·)

w22

w12

w21

w11

b1

b2

v1

v2i+a2-

6

i+a1-

6QQQQQQs

������3

i+6

-sign(·)

-

wb3

y(x)

w

w

RBF Networks L. Palagi 23 / 29

Page 26: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Two layer MLP

Choose w11 = w22 = 1 and w12 = w21 =−1, b1 = b2 =−1, v1 = v2 = 1b3 = 0.1 (output bias). We get

a1 = x1−x2−1 z1 = sign(a1)a2 =−x1 + x2−1 z2 = sign(a2)y = sign(z1 + z2 + 0.1)

input p a1 a2 z1 z2 z1 + z2 + 0.1 y

1 -1 -1 -1 -1 -1.9 -12 -3 1 -1 1 0.1 13 1 -3 1 -1 0.1 14 -1 -1 -1 -1 -1.9 -1

RBF Networks L. Palagi 24 / 29

Page 27: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Two layer MLP

This MLP network with two hidden nodes realizes a nonlinear separation(each hidden node describes one of the two lines). The output nodecombines the outputs of the two hidden layer.

eeu

u1

2

3

4

-

@@@@@@@

@@@@@@@

6x2

x1

RBF Networks L. Palagi 25 / 29

Page 28: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

RBF network

Consider a RBF network with two units (N = 2) with centers c1,c2 andassume the activation function is a gaussian gj = e−(‖x−cj‖/σ)2

w

w

-

-

x2

-��������7x1S

SSSSSSSw

-

v1

v2

z2 = e−‖x−c2‖2

σ2������3

z1 = e−‖x−c1‖2

σ2 QQQQQQs i+6

-sign(·)

-

wb y(x)

RBF Networks L. Palagi 26 / 29

Page 29: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

RBF network

Choose σ =√

2 and c1 =

(11

)c2 =

(−1−1

)We transform the problem

into a linearly separable form.

XOR

p e−‖x−c1‖2

σ2 e−‖x−c1‖2

σ2 yp

1 e−4 1 -12 e−2 e−2 13 e−2 e−2 14 1 e−4 -1

fv

v1

423

@@@@@@@@@

-

6z2

z1

RBF Networks L. Palagi 27 / 29

Page 30: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

The output takes the form

y(x) = v1e− ‖x−c1‖2

σ2 + v2e− ‖x−c2‖2

σ2 +b

Minimizing the training error

minv ,b

4

∑p=1

(y(xp)−yp)2

we get the optimal solution (v∗,b∗) that gives E = 0 v1

v2

b

=

−2,675065656−2,675065656

1,72406123

and the RBF network has been trained.

RBF Networks L. Palagi 28 / 29

Page 31: Optimization Methods for Machine Learning Radial Basis ...users.diag.uniroma1.it/.../files/allegati/OMML_15th_lect_19-20_RBF_X… · N XXX XXX XXXz 3 Q Q Q Q Q Q Q Q Qs +n-u u RBF

Interpolation RBF Regularized RBF Generalized RBF XOR problem References

Girosi, F., Poggio, T. (1990). Networks and the best approximation property. Biological cybernetics, 63(3), 169-176.

RBF Networks L. Palagi 29 / 29


Recommended