Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft...

transcript

Microsoft Research Ltd.

Semidefinite Programming Machines

Thore Graepel and Ralf Herbrich

Microsoft Research Cambridge

Overview

Invariant Pattern RecognitionSemidefinite Programming (SDP)From Support Vector Machines (SVMs)

to Semidefinite Programming Machines (SDPMs)

Experimental IllustrationFuture Work

Typical Invariances for Images

Translation

Rotation

Typical Invariances for Images

Translation

Rotation

Toy Features for Handwritten Digits

1 =0.48

3=0.37

2=0.58

Warning: Highly Non-Linear

0.2 0.3 0.4 0.5 0.60.2

Motivation: Classification Learning

0.1 0.2 0.3 0.4 0.50.2

Can we learn with infinitely many examples?

Motivation: Classification Learning

0.1 0.2 0.3 0.4 0.50.2

Motivation: Version Spaces

Original patterns Transformed patterns

Semidefinite Programs (SDPs)

Linear objective functionPositive semidefinite (psd) constraints

Infinitely many linear constraints

SVM as a Quadratic Program

Given: A sample ((x1,y1),…,(xm,ym)).

SVMs find the weight vector w that maximises the margin on the sample

SVM as a Semidefinite Program (I)

A (block)-diagonal matrix is psd if and only if all its blocks are psd.

SVM as a Semidefinite Program (I)

A (block)-diagonal matrix is psd if and onlyif all its blocks are psd.

SVM as a Semidefinite Program (II)

Transform quadratic into linear objective

Adds new (n+1)£(n+1) block to Aj and B

Use Schur’s complement lemma

Taylor Approximation of Invariance

Let T (x,µ) be an invariance transformation with parameter µ (e.g., angle of rotation).

Taylor Expansion about 0=0 gives

Polynomial approximation to trajectory.

Extension to Polynomials

Consider polynomial trajectory x(µ):

Infinite number of constraints from training example (x(0),…, x(r),y):

Non-Negative Polynomials (I)

Theorem (Nesterov,2000): If r=2l then 1. For every psd matrix P the polynomial

p(µ)=µTP µ is non-negative everywhere.

2. For every non-negative polynomial p there exists a psd matrix P such that p(µ)=µTPµ.

Example:

Non-Negative Polynomials (II)

(1) follows directly from psd definition(2) follows from sum-of-squares lemma.Note that (2) states the mere existence:

Polynomial of degree r: r+1 parametersCoefficient matrix P:(r+2) (r+4)/8 parameters

For r >2, we have to introduce another r(r-2)/8 auxiliary variables to find P.

Extension of SVMs as (non-trivial) SDP.

Example: Second-Order SDPMs

2nd order Taylor expansion:

Resulting polynomial in µ:

Set of constraint matrices:

Example: Second-Order SDPMs

2nd order Taylor expansion:

Resulting polynomial in µ:

Set of constraint matrices:

Non-Negative on Segment

Given a polynomial p of degree 2l, consider the polynomial

Note that q is a polynomial of degree 4l.If q is positive everywhere, then p is

positive everywhere in [-¿,+¿].-5 0 5 10-10

Non-Negative on Segment

-5 0 5 10-10

Truly Virtual Support Vectors

Dual complementarity yields expansion:

The truly virtual support vectors are linear combinations of derivatives:

Truly Virtual Support Vectors

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.06

0.31 0.315 0.32 0.325

“1”

“9”

Visualisation: USPS “1” vs. “9”

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.06

¿ = 20º

0.31 0.315 0.32 0.325

Results: Experimental Setup

All 45 USPS classification tasks (1-v-1).20 training images; 250 test images.Rotation is applied to all training images

with ¿ = 10º.All results are averaged over 50 random

training sets.Compared to SVM and virtual SVM.

Results: SDPM vs. SVM

0 0.05 0.1 0.15 0.20

SVM error

Results: SDPM vs. Virtual SVM

0 0.02 0.04 0.06 0.08 0.1 0.12 0.140

VSVM error

Results: Curse of Dimensionality

1 parameter 2 parameters

Extensions & Future Work

Multiple parameters µ1, µ2,..., µD.

(Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A.

Kharechko and J. Shawe-Taylor). Sparsification by efficiently finding the example

x and transformation µ with maximal information (idea of Neil Lawrence).

Expectation propagation for BPMs (idea of Tom Minka).

Conclusions & Future Work

Learning from infinitely many examples.Truly virtual support vectors xi(µi*).

Multiple parameters µ1, µ2,..., µD.

(Efficient) adaptation to kernel space.Semidefinite Perceptrons (NIPS poster

with A. Kharechko and J. Shawe-Taylor).

Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft...

Documents