+ All Categories
Home > Documents > Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université...

Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université...

Date post: 05-Jan-2016
Category:
Upload: earl-hutchinson
View: 215 times
Download: 1 times
Share this document with a friend
32
Margin-Sparsity Trade- off for the Set Covering Machine ECML 2005 ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université d’Ottawa)
Transcript
Page 1: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

Margin-Sparsity Trade-offfor

the Set Covering Machine

ECML 2005ECML 2005

François Laviolette (Université Laval)

Mario Marchand (Université Laval)

Mohak Shah (Université d’Ottawa)

Page 2: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

PLAN Margin-Sparsity trade-off for Sample Compressed Classifiers

The “classical” Set Covering Machine (Classical-SCM) Definition Tight Risk Bound and model selection The learning algorithm

The modified Set Covering Machine (SCM2) Definition A non trivial Margin-Sparsity trade-off expressed by the risk bound The learning algorithm Empirical results

Conclusions

Page 3: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

The Sample compression Framework In the sample compression setting, each classifier is

identified by 2 different sources of information: The compression set: an (ordered) subset of the training set A message string of additional information needed to identify a

classifier

To be more precise: In the sample compression setting, there exists a

“reconstruction” function R that gives a classifier

h = R(, Si) when given a compression set Si and a message string .

Page 4: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

The Sample compression Framework (2) The examples are supposed i.i.d.

The risk (or generalization error) of a classifier h (noted R(h)) is the probability that h misclassified a new example.

The empirical risk (noted RS(h)) on a training set S is the frequency of errors of h on S.

Page 5: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

Examples of sample-compressed classifiers

Set Covering Machines (SCM) [Marchand and Shaw-Taylor JMLR 2002]

Decision List Machines (DLM) [Marchand and Sokolova JMLR 2005]

Support Vector Machines (SVM)

Page 6: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

Margin-Sparsity trade-off There is a widespread belief that in the sample compression

setting learning algorithms should somehow try to find a non-trivial margin-sparsity trade-off.

SVM are looking for Margin. But some efforts as been done in order to find a sparser SVM

(Bennett (1999), and Bi et al. (2003)). This seems a difficult task.

SCM are looking for Sparsity. To force a classifier which is a conjunction of “geometric” Boolean

features to have no training example within a distance of its decision surface seems a much easier task.

Moreover, we will see that in our setting, both sparsity and margin can be considered as different forms of data-compression

Page 7: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

The “Classical” Set Covering Machine(Marchand and Shawe-Taylor 2002)

Construct the “smallest possible” conjunction of (Boolean-valued) features

Each feature h is a ball identified by two training examples (the center (xc, yc) and the border point (xb, yb) ) and defined for any input example x as:

(Dually, one can consider to construct “smallest possible” disjunction of features, but we will only consider the conjunction case in this talk)

Page 8: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

-

An Example of a “Classical”-SCM

-

-

-

-

+

+

+

+

+

-

++

+

-

-

-

- -

-

-

--

--

Page 9: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

-

An Example of a “Classical”-SCM

-

-

-

-

+

+

+

+

+

-

++

+

-

-

-

- -

-

-

--

--

Page 10: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

-

An Example of a “Classical”-SCM

-

-

-

-

+

+

+

+

+

-

++

+

-

-

-

- -

-

-

--

--

Page 11: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

-

An Example of a “Classical”-SCM

-

-

-

-

+

+

+

+

-

++

+

-

-

-

+

- -

-

-

--

--

Page 12: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

-

An Example of a “Classical”-SCM

-

-

-

-

+

+

+

+

+

-

++

+

-

-

-

- -

-

-

--

--

Page 13: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

-

An Example of a “Classical”-SCM

-

-

-

-

+

+

+

+

+

-

++

+

-

-

-

- -

-

-

--

--

Page 14: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

-

An Example of a “Classical”-SCM

-

-

-

-

+

+

+

+

+

-

++

+

-

-

-

- -

-

-

--

--

Page 15: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

-

But SCM is looking for sparsity !!

-

-

-

-

+

+

+

+

+

-

++

+

-

-

-

- -

-

-

--

--

Page 16: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

But SCM is looking for sparsity !!

-

-

-

-

+

+

+

+

+

-

++

+

-

-

-

- -

-

-

--

--

Page 17: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

A risk bound

Page 18: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

For “classical” SCMs If we choose the following Prior:

Then Corollary 1 becomes:

Which almost expresses a symmetry between k and d. Because

PM (Zi)(), is small for the “classical” SCMs compared to and

Idem for ln(d+1)

Page 19: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

Model selection by the bound

Empirical results showed that looking for a SCM that minimise this risk bound is a slightly better model selection’s strategy than the cross-validation approach

The reasons are not totally clear This bound is tight There is a symmetry between d and k ???

Page 20: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

A Learning algorithm for the “Classical” SCM

Ideally we would like a to find a SCM that minimizes the risk bound

Unfortunately, this is NP-Hard (at least)

We will therefore use a greedy heuristic based on the following observation

Page 21: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

+

Adding one ball at the time, a classification error on an example “+” can not be fixed by adding other balls

-

-

-

-

+

+

+

+

+

-

++

+

-

-

-

- -

-

But, for an example “-” it is possible.

+

-

-

--

--

Page 22: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

A Learning algorithm for the “Classical” SCM(Marchand and Shawe-Taylor 2002)

Define a list p1,p2,…,pl , and for each such p (called the learning parameter) DO STEP 1

STEP 1:Suppose i balls (Bp,0, Bp,1, … Bp,i-1) already have been construct by the algorithm UNTIL every “-” is assign correctly by the SCM (Bp,1, Bp,2,

… Bp,i-1) DO Choose a new ball Bp,i that maximizes qi - p ¢ ri where

qi is the number of “-” correctly assign by Bp,i but not correctly assign by the SCM (Bp,0, Bp,1, … Bp,i-1)

ri is the number of “+” not correctly assign by Bp,i but correctly assign by the SCM (Bp,0, Bp,1, … Bp,i-1)

Page 23: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

A Learning algorithm for the “Classical” SCM(continued)

Among the following SCMs,

OUTPUT the one that have the best risk bound

Note: the algorithm can be adapt to a cross-validation approach

Page 24: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

SCM2, SCM with radii coded by message strings In the “classical” SCM: centers and radii are defined

by examples of training set

Another alternative: to code each radius value by a message string (but still use examples of the training set to define the centers)

Objective: to construct the “smallest possible” conjunction of balls each of which having the “smallest possible” number of bits in the message string that define its radius

Page 25: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

What kind of radius can be described with l bits?

Let us choose a scale

Then with l = 0 bit, we can define:

R

+ R/2

Page 26: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

3R/4

What kind of radius can be described with l bits?

Let us choose a scale

Then with l = 1 bit, we can define:

R

+ R/4

Page 27: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

3R/8 5R/8 7R/8

What kind of radius can be described with l bits?

Let us choose a scale

Then with l = 2 bits, we can define:

R

+ R/8

Page 28: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

More precisely Under some parameter R (that will be our scale), the radius of

any ball of a SCM2 will be code by a pair (l, s) such that

0 < 2s-1 < 2l+1

the code (l,s) means that the radius of the ball is

Thus the possible radius value for l=2 are

Note that l is the number of bits of the radius

R/8, 3R/8, 5R/8 and 7R/8

Page 29: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

--

-

--

-

-

-

-

--

-

Observe that if we have a large margin, among all the “interesting” balls, there will be one whose radius (l,s) of small number of bits

+

+

+

+

+

++

+

Page 30: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

For SCM2 If we choose the following Priors:

Then Corollary 1 becomes:

Which expresses a non trivial margin-sparsity trade-off !!!

The learning algorithm is similar to the classical one, except it need two extra learning parameter: R and the maximum of bits allowed by message strings (noted l*)

Page 31: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

Empirical results SVMs and SCMs on UCI data sets:

We observe: For SCMs, model selection by the bound is almost always

better than by cross-validation SCM2 is almost always better than SCM1 SCM2 tends to produce more balls than SCM1.

Hence SCM2 sacrifices sparsity to obtain a larger margin

Page 32: Margin-Sparsity Trade-off for the Set Covering Machine ECML 2005 François Laviolette (Université Laval) Mario Marchand (Université Laval) Mohak Shah (Université.

ConclusionWe have proposed:

A new representation for the SCM that use two distinct sources of information: A compression set to represent the centers of the balls A message string to encode the radius value of each ball

A general data-compression risk bound that depend explicitly on these two information sources which exhibits a non trivial trade-off between sparsity (the inverse of the

compression set size) and the margin (the inverse of the message length) Seems to be an effective guide for choosing the proper margin-sparsity trade-

off of a classifier


Recommended