PEGASOS Primal Estimated sub-GrAdient Solver for SVM

Post on 09-Jan-2016

86 views 6 download

Tags:

description

PEGASOS Primal Estimated sub-GrAdient Solver for SVM. Ming TIAN 04-20-2012. Reference. [1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos: primal estimated sub-gradient solver for svm. ICML, 807-814. Mathematical Programming, Series B, 127(1):3-30, 2011. - PowerPoint PPT Presentation

transcript

1

PEGASOSPrimal Estimated sub-GrAdient Solver for SVM

Ming TIAN 04-20-2012

2

[1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos:

primal estimated sub-gradient solver for svm. ICML, 807-814.

Mathematical Programming, Series B, 127(1):3-30, 2011.

[2] Zhuang Wang, Koby Crammer, Slobodan Vucetic (2010).

Multi-Class Pegasos on a Budget. ICML.

[3] Crammer, K & Singer. Y. (2001). On the algorithmic implemen-

tation of multiclass kernel-based vector machines. JMLR, 2,

262-292.

[4] Crammer, K., Kandola, J. & Singer, Y. (2004). Online classifi-

cation on a budget. NIPS, 16, 225-232.

Reference

3

Outline

Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

4

Outline

Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

5

Review of SVM optimization

Empirical lossRegularization term

Q1:

6

Review of SVM optimization

7

Review of SVM optimization Dual-based methods

Interior Point methods Memory: m2, time: m3, log(log(1/))

Decomposition methods Memory: m, Time: super-linear in m

Online learning & Stochastic Gradient Memory: O(1), Time: 1/2 (linear kernel) Memory: 1/2, Time: 1/4 (non-linear kernel) Typically, online learning algorithms do not

converge to the optimal solution of SVM

Better rates for finite dimensional instances (Murata, Bottou)

8

Outline

Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

9

PEGASOS

Subgradient

Projection

A_t = S

Subgradient method

|A_t| = 1

Stochastic gradient

10

Run-Time of Pegasos Choosing |At|=1 and a linear kernel over Rn

Run-time required for Pegasos to find accurate solution with probability 1-

Run-time does not depend on #examples Depends on “difficulty” of problem ( and )

11

Formal Properties

Definition: w is accurate if

Theorem 1: Pegasos finds accurate solution w.p. 1- after at most iterations.

Theorem 2: Pegasos finds log(1/) solutions s.t. w.p. 1-, at least one of them is accurate after iterations

12

Proof Sketch A second look on the update step:

13

Proof Sketch

Denote:

Logarithmic Regret for OCP

Take expectation:

f(wr)-f(w*) 0 Markov gives that w.p. 1-

Amplify the confidence

14

Proof Sketch

15

Proof SketchA function f is called strongly convex if is a convex function.-

16

Proof Sketch

17

Proof Sketch

18

Experiments

3 datasets (provided by Joachims) Reuters CCAT (800K examples, 47k features) Physics ArXiv (62k examples, 100k features) Covertype (581k examples, 54 features)

4 competing algorithms SVM-light (Joachims) SVM-Perf (Joachims’06) Norma (Kivinen, Smola, Williamson ’02) Zhang’04 (stochastic gradient descent)

19

Training Time (in seconds)

Pegasos SVM-Perf SVM-Light

Reuters 2 77 20,075

Covertype 6 85 25,514

Astro-Physics 2 5 80

20

Compare to Norma (on Physics)

obj. value

test error

21

Compare to Zhang (on Physics)O

bje

ctiv

e

But, tuning the parameter is more expensive than learning …

22

Effect of k=|At| when T is fixedO

bje

ctiv

e

23

Effect of k=|At| when kT is fixedO

bje

ctiv

e

24

bias term

Popular approach: increase dimension of x

Cons: “pay” for b in the regularization term

Calculate subgradients w.r.t. w and w.r.t b:

Cons: convergence rate is 1/2

Define:

Cons: |At| need to be large

Search b in an outer loop

Cons: evaluating objective is 1/2

25

Outline

Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

26

multi-class SVM (Crammer & Singer, 2001)

multi-class model :

27

multi-class SVM (Crammer & Singer, 2001)

multi-class SVM objective function:

where

and the multi-class hinge-loss function is defined as:

where

28

multi-class Pegasos

use the instantaneous objective function :

multi-class Pegasos works by iteratively executing the two-step updates :

Where:

Step 1:

29

multi-class Pegasos

project the weight wt+1 into the closed convex set:

If loss is equal to zero then:

Else:

Step 2:

30

Budgeted Multi-Class Pegasos

31

Budget Maintenance Strategies

Budget maintenance through removal the optimal removal always selects the oldest SV

Budget maintenance through projection projecting an SV onto all the remaining SVs and thus re

sults in smaller weight degradation. Budget maintenance through Merging

merging two SVs to a newly created one The total cost of finding the optimal merging for the n-th

and m-th SV is O(1).

32

Experiments

33

Outline

Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works

34

Further works

Distribution_aware Pegasos?

Online structural regularized SVM?

35

Thanks! Q&A

36