+ All Categories
Home > Documents > Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of...

Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of...

Date post: 19-Dec-2015
Category:
View: 214 times
Download: 0 times
Share this document with a friend
41
Regularization Instructor : Dr. Saeed Shiry
Transcript
Page 1: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Regularization

Instructor : Dr. Saeed Shiry

Page 2: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Hypothesis Space

The hypothesis space H is the space of functions allow our algorithm to provide. in the space the algorithm is allowed to search. it is often important to choose the hypothesis

space as a function of the amount of data available.

Page 3: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Learning As Function Approximation From Samples: Regression and Classification

The basic goal of supervised learning: to use the training set S to “learn” a function For a new x value predict the associated value of y:

Regression : If y is a real-valued random variable

Pattern classification : If y takes values from an unordered finite set, In two-class pattern classification problems, we

assign one class a y value of 1, and the other class a y value of −1.

Page 4: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Loss Functions

In order to measure goodness of our function, we need a loss function V.

In general, we let V(f , z) = V(f (x), y) price we pay when we see x and guess that the

associated y value is f (x) when it is actually y.

Page 5: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Common Loss Functions For Regression The most common loss function is square

loss or L2 loss: V(f (x), y) = (f (x) − y)^2

L1 loss: V(f (x), y) = |f (x) − y|

Vapnik’s more general -insensitive loss:

Page 6: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Problem of risk minimization

In order to choose the best available approximation to the supervisor's response, one measures the loss or discrepancy L(y, f(x, a)) between the response y of the supervisor to a given input x and the response f(x, a) provided by the learning machine. Consider the expected value of the loss, given by the risk functional

The goal is to find the function f(x, , a) which minimizes the risk functional R(a) over the class of functions f(x,), A in the situation where the joint probability distribution P(x,y) is unknown and the only available information is contained in the training set.

Page 7: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Three Main Learning Problems

1. Pattern Recognition Let the supervisor's output y take only two values y = {0,1} and

let f(x,), A, be a set of indicator functions (functions which take only two values: zero and one).

Consider the following loss function:

For this loss function, the functional (1.2) determines the probability of different answers given by the supervisor and by the indicator function f(x, ). We call the case of different answers a classification error.

The problem, therefore, is to find a function that minimizes the probability of classification error when the probability measure F(x, y) is unknown, but the data are given.

Page 8: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Three Main Learning Problems

2. Regression Estimation Let the supervisor's answer y be a real value, and let f(x, ),

A, be a set of real functions that contains the regression function

It is known that the regression function is the one that minimizes the functional (1.2) with the following loss function:

Thus the problem of regression estimation is the problem of minimizing the risk functional (1.2) with the above loss function in the situation where the probability measure P(x,y) is unknown but the data are given.

Page 9: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Three Main Learning Problems

3. Density Estimation (Fisher-Wald Setting) Finally, consider the problem of density estimation from the set of

densities p(x, ) A. For this problem we consider the following loss function:

It is known that the desired density minimizes the risk functional (1.2) with the above loss function .

Thus, again, to estimate the density from the data one has to minimize the risk functional under the condition that the corresponding probability measure P(x) is unknown, but i.i.d. data

are given.

Page 10: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Expected error, empirical error The expected or true error of f is:

Given a function f , a loss function V, and a probability distribution μ over Z,

the expected loss on a new example drawn at random from μ. We would like to make I[f ] small, but in general we do not know

μ. The empirical error of f is:

Given a function f , a loss function V, and a training set S consisting of n data points

Page 11: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

A reminder: convergence in probability

Let {Xn} be a sequence of bounded random variables. We say that

Page 12: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Generalization

Page 13: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

A learning algorithm should be well-posed, eg stable

In addition to the key property of generalization, a “good” learning algorithm should also be stable: fs should depend continuously on the training

set S. In particular, changing one of the training points

should affect less and less the solution as n goes to infinity.

Page 14: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

General definition of Well-Posed and Ill-Posed problems

A problem is well-posed if its solution: exists is unique depends continuously on the data (e.g. it is

stable)

A problem is ill-posed if it is not well-posed. well-posedness is mainly used to mean stability of

the solution.

Page 15: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Theory of Solving Ill-Posed Problems In the early 1900s Hadamard observed that under some (very general)

circumstances the problem of solving (linear) operator equations

(finding f F that satisfies the equality), is ill-posed; even if there exists a unique solution to this equation,

a small deviation on the right-hand side of this equation (Fδ instead of F, where ||F- Fδ ||< δ is arbitrarily small) can cause large deviations in the solutions (it can happen that ||fδ -f||< is large).

In this case if the right-hand side F of the equation is not exact (e.g., it equals Fδ , where Fδ differs from F by some level δ of noise), the functions fδ that minimize the function

do not guarantee a good approximation to the desired solution even if δ tends to zero.

Page 16: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Real-life problems were found to be ill-posed

Hadamard thought that ill-posed problems are a pure mathematical phenomenon and that all real-life problems are "well-posed.“

However, in the second half of the century a number of very important real-life problems were found to be ill-posed. it is important that one of main problems of

statistics, estimating the density function from the data, is ill-posed.

Page 17: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Regularization theory Regularization theory was one of the first signs of the existence

of intelligent inference: In the middle of the 1960s it was discovered that if instead of the

functional R(f) one minimizes another so-called regularized functional

where Ω(f) is some function (that belongs to a special type of functions) and (δ) is an appropriately chosen constant (depending on the level of noise), then one obtains a sequence of solutions that converges to the desired one as δ tends to zero

Page 18: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

ERM

Given a training set S and a function space H, empirical risk minimization (Vapnik introduced the term) is the class of algorithms that look at S and select fs as

For example linear regression is ERM when V(z) = (f (x) − y)^2 and H is space of linear functions f = ax.

Page 19: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

THE EMPIRICAL RISK MINIMIZATION (ERM) INDUCTIVE PRINCIPLE

In order to minimize the risk functional for an unknown probability measure P(z) the following induction principle is usually employed.

The expected risk functional R() is replaced by the empirical risk functional

Constructed on the basis of the training set.

The principle is to approximate the function Q(z, ) which minimizes the risk by the function Q(z, l) which miniminimizes the empirical risk (1.8).

This principle is called the Empirical Risk Minimization induction principle (ERM principle).

Page 20: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Generalization and Well-posedness of Empirical Risk Minimization

For ERM to represent a “good” class of learning algorithms, the solution should generalize exist, be unique and – especially – be stable

(well-posedness).

Page 21: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

ERM and generalization: given a certain number of samples...

Page 22: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

...suppose this is the “true” solution...

Page 23: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

... but suppose ERM gives this solution.

Page 24: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Under which conditions the ERM solution converges with increasing number of examples to the true solution? In other words...what are the conditions for generalization of ERM?

Page 25: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

ERM and stability: given 10 samples...

Page 26: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

...we can find the smoothest interpolating polynomial (which degree?).

Page 27: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

But if we perturb the points slightly...

Page 28: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

...the solution changes a lot!

Page 29: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

If we restrict ourselves to degree two polynomials...

Page 30: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

...the solution varies only a small amount under a small perturbation.

Page 31: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

ERM: conditions for well-posedness (stability) and predictivity (generalization)

Since Tikhonov, it is well-known that a generally ill-posed problem such as ERM, can be guaranteed to be well-posed and therefore stable by an appropriate choice of H. For example, compactness of H guarantees stability.

It seems intriguing that the classical conditions for consistency of ERM – thus quite a different property – consist of appropriately restricting H.

Page 32: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

ERM: conditions for well-posedness (stability) and predictivity (generalization)

We would like to have a hypothesis space that yields generalization. Loosely speaking this would be a H for which the solution of ERM, say fs is such that |Is[fs] −I[fs]| converges to zero in probability for n increasing.

Note that the above requirement is NOT the law of large numbers; the requirement for a fixed f that |Is[f ] − I[f ]| converges to zero in probability for n increasing Is the law of large numbers.

Page 33: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

ERM: conditions for well-posedness (stability) and predictivity (generalization)

Page 34: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

The theorem says that a proper choice of the hypothesis space H ensures generalization of ERM (and consistency since for ERM generalization is necessary and sufficient for consistency and viceversa).

A separate theorem guarantees also stability (defined in a specific way) of ERM.

Thus with the appropriate definition of stability, stability and generalization are equivalent for ERM.

Other results characterize uGC classes in terms of measures of complexity or capacity of H (such as VC dimension).

Thus the two desirable conditions for a learning algorithm –generalization and stability – are equivalent (and they correspond to the same constraints on H).

ERM: conditions for well-posedness (stability) and predictivity (generalization)

Page 35: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Regularization

A method of improving stability of solutions of ill-conditioned inverse problems, called regularization.

The basic idea in the treatment of ill-conditioned problems use some a priori knowledge about solutions to disqualify

meaningless ones. such knowledge can be:

some regularity condition on the solution expressed existence of derivatives up to a certain order with bounds on the magnitudes of these derivatives

some localization condition such as a bound on the support of the solution or its behavior at infinity.

Tikhonov’s regularization: penalizes undesired solutions by adding a term called a stabilizer.

Page 36: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Regularization

Generally speaking, any regularization method tries to analyze a related well-posed problem whose solution approximates the original ill-posed problem.

The well-posedness is achieved by implementing one or more of the following basic ideas restriction of the data; change of the space and/or topologies; modification of the operator itself; the concept of regularization operators; and well-posed stochastic extensions of ill-posed problems.

Page 37: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Regularization

Regularized cost function = empirical cost function +regularization parameter *regularizer function

Page 38: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Image restoration – An ill-posed problem

Degradation model

H is ill-conditioned which makes image restoration problem an ill-posed problem Solution is not stable

),(),(),(),( vuNvuFvuHvuG

),(

),(),(

),(

),(),(ˆ

vuH

vuNvuF

vuH

vuGvuF

Page 39: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Tikhonov’s Regularization

Theory Proposed by Tikhonov in 1963 Proposes the use of prior knowledge to regularize

mappings

Most common application: utilize the smoothness property: “Similar inputs produce similar outputs for an

input-output mapping to be smooth”

Page 40: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Ivanov and Tikhonov Regularization

Page 41: Regularization Instructor : Dr. Saeed Shiry. Hypothesis Space The hypothesis space H is the space of functions allow our algorithm to provide. in the.

Tikhonov Regularization

As we will see in future classes Tikhonov regularization ensures well-posedness

eg existence, uniqueness and especially stability (in a very strong form) of the solution

Tikhonov regularization ensures generalization Tikhonov regularization is closely related to – but different from – Ivanov regularization, eg ERM on a hypothesis space H which is a ball in a RKHS.


Recommended