Microsoft Research Ltd. Semidefinite Programming Machines Thore Graepel and Ralf Herbrich Microsoft...

Post on 26-Mar-2015

214 views 0 download

Tags:

transcript

Microsoft Research Ltd.

Semidefinite Programming Machines

Thore Graepel and Ralf Herbrich

Microsoft Research Cambridge

Microsoft Research Ltd.

Overview

Invariant Pattern RecognitionSemidefinite Programming (SDP)From Support Vector Machines (SVMs)

to Semidefinite Programming Machines (SDPMs)

Experimental IllustrationFuture Work

Microsoft Research Ltd.

Typical Invariances for Images

Translation

Rotation

Shear

Microsoft Research Ltd.

Typical Invariances for Images

Translation

Rotation

Shear

Microsoft Research Ltd.

Toy Features for Handwritten Digits

1 =0.48

3=0.37

2=0.58

Microsoft Research Ltd.

Warning: Highly Non-Linear

Á1

Á2

Microsoft Research Ltd.

Warning: Highly Non-Linear

0.2 0.3 0.4 0.5 0.60.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

1

2

Microsoft Research Ltd.

Motivation: Classification Learning

0.1 0.2 0.3 0.4 0.50.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

1(x)

2(x)

Can we learn with infinitely many examples?

Microsoft Research Ltd.

Motivation: Classification Learning

0.1 0.2 0.3 0.4 0.50.2

0.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65

1(x)

2(x)

Microsoft Research Ltd.

Motivation: Version Spaces

Original patterns Transformed patterns

Microsoft Research Ltd.

Semidefinite Programs (SDPs)

Linear objective functionPositive semidefinite (psd) constraints

Infinitely many linear constraints

Microsoft Research Ltd.

SVM as a Quadratic Program

Given: A sample ((x1,y1),…,(xm,ym)).

SVMs find the weight vector w that maximises the margin on the sample

Microsoft Research Ltd.

SVM as a Semidefinite Program (I)

A (block)-diagonal matrix is psd if and only if all its blocks are psd.

Aj:=

g1,j

gi,j

gm,j

B:=

1

1

1

1

1

Microsoft Research Ltd.

SVM as a Semidefinite Program (I)

A (block)-diagonal matrix is psd if and onlyif all its blocks are psd.

Aj:=

g1,j

gi,j

gm,j

B:=

1

1

1

1

1

Microsoft Research Ltd.

SVM as a Semidefinite Program (II)

Transform quadratic into linear objective

Adds new (n+1)£(n+1) block to Aj and B

Use Schur’s complement lemma

Microsoft Research Ltd.

Taylor Approximation of Invariance

Let T (x,µ) be an invariance transformation with parameter µ (e.g., angle of rotation).

Taylor Expansion about 0=0 gives

Polynomial approximation to trajectory.

Microsoft Research Ltd.

Extension to Polynomials

Consider polynomial trajectory x(µ):

Infinite number of constraints from training example (x(0),…, x(r),y):

Microsoft Research Ltd.

Non-Negative Polynomials (I)

Theorem (Nesterov,2000): If r=2l then 1. For every psd matrix P the polynomial

p(µ)=µTP µ is non-negative everywhere.

2. For every non-negative polynomial p there exists a psd matrix P such that p(µ)=µTPµ.

Example:

Microsoft Research Ltd.

Non-Negative Polynomials (II)

(1) follows directly from psd definition(2) follows from sum-of-squares lemma.Note that (2) states the mere existence:

Polynomial of degree r: r+1 parametersCoefficient matrix P:(r+2) (r+4)/8 parameters

For r >2, we have to introduce another r(r-2)/8 auxiliary variables to find P.

Microsoft Research Ltd.

Semidefinite Programming Machines

Extension of SVMs as (non-trivial) SDP.

Aj:=

g1,j

gi,j

gm,j

B:=

1

1

1

1

1

1

G1,j

Gi,j

Gm,j

1 0

0 0

1 0

0 0

1 0

0 0

Microsoft Research Ltd.

Semidefinite Programming Machines

Extension of SVMs as (non-trivial) SDP.

Aj:=

g1,j

gi,j

gm,j

B:= 1

1

1

1

G1,j

Gi,j

Gm,j

1 0

0 0

1 0

0 0

1 0

0 0

Microsoft Research Ltd.

Example: Second-Order SDPMs

2nd order Taylor expansion:

Resulting polynomial in µ:

Set of constraint matrices:

Microsoft Research Ltd.

Example: Second-Order SDPMs

2nd order Taylor expansion:

Resulting polynomial in µ:

Set of constraint matrices:

Microsoft Research Ltd.

Non-Negative on Segment

Given a polynomial p of degree 2l, consider the polynomial

Note that q is a polynomial of degree 4l.If q is positive everywhere, then p is

positive everywhere in [-¿,+¿].-5 0 5 10-10

-5

0

5

10

f()

Microsoft Research Ltd.

Non-Negative on Segment

-5 0 5 10-10

-5

0

5

10

f( )

Microsoft Research Ltd.

Truly Virtual Support Vectors

Dual complementarity yields expansion:

The truly virtual support vectors are linear combinations of derivatives:

Microsoft Research Ltd.

Truly Virtual Support Vectors

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.31 0.315 0.32 0.325

0.188

0.189

0.19

0.191

0.192

0.193

“1”

“9”

Microsoft Research Ltd.

Visualisation: USPS “1” vs. “9”

0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.40.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

¿ = 20º

0.31 0.315 0.32 0.325

0.188

0.189

0.19

0.191

0.192

0.193

Microsoft Research Ltd.

Results: Experimental Setup

All 45 USPS classification tasks (1-v-1).20 training images; 250 test images.Rotation is applied to all training images

with ¿ = 10º.All results are averaged over 50 random

training sets.Compared to SVM and virtual SVM.

Microsoft Research Ltd.

Results: SDPM vs. SVM

0 0.05 0.1 0.15 0.20

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

SVM error

SD

PM

err

or

Microsoft Research Ltd.

Results: SDPM vs. Virtual SVM

0 0.02 0.04 0.06 0.08 0.1 0.12 0.140

0.02

0.04

0.06

0.08

0.1

0.12

0.14

VSVM error

SD

PM

err

or

Microsoft Research Ltd.

Results: Curse of Dimensionality

Microsoft Research Ltd.

Results: Curse of Dimensionality

1 parameter 2 parameters

Microsoft Research Ltd.

Extensions & Future Work

Multiple parameters µ1, µ2,..., µD.

(Efficient) adaptation to kernel space. Semidefinite Perceptrons (NIPS poster with A.

Kharechko and J. Shawe-Taylor). Sparsification by efficiently finding the example

x and transformation µ with maximal information (idea of Neil Lawrence).

Expectation propagation for BPMs (idea of Tom Minka).

Microsoft Research Ltd.

Conclusions & Future Work

Learning from infinitely many examples.Truly virtual support vectors xi(µi*).

Multiple parameters µ1, µ2,..., µD.

(Efficient) adaptation to kernel space.Semidefinite Perceptrons (NIPS poster

with A. Kharechko and J. Shawe-Taylor).