1
PEGASOSPrimal Estimated sub-GrAdient Solver for SVM
Ming TIAN 04-20-2012
2
[1] Shalev-Shwartz, S., Singer, Y., & Srebro, N. (2007). Pegasos:
primal estimated sub-gradient solver for svm. ICML, 807-814.
Mathematical Programming, Series B, 127(1):3-30, 2011.
[2] Zhuang Wang, Koby Crammer, Slobodan Vucetic (2010).
Multi-Class Pegasos on a Budget. ICML.
[3] Crammer, K & Singer. Y. (2001). On the algorithmic implemen-
tation of multiclass kernel-based vector machines. JMLR, 2,
262-292.
[4] Crammer, K., Kandola, J. & Singer, Y. (2004). Online classifi-
cation on a budget. NIPS, 16, 225-232.
Reference
3
Outline
Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works
4
Outline
Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works
5
Review of SVM optimization
Empirical lossRegularization term
Q1:
6
Review of SVM optimization
7
Review of SVM optimization Dual-based methods
Interior Point methods Memory: m2, time: m3, log(log(1/))
Decomposition methods Memory: m, Time: super-linear in m
Online learning & Stochastic Gradient Memory: O(1), Time: 1/2 (linear kernel) Memory: 1/2, Time: 1/4 (non-linear kernel) Typically, online learning algorithms do not
converge to the optimal solution of SVM
Better rates for finite dimensional instances (Murata, Bottou)
8
Outline
Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works
9
PEGASOS
Subgradient
Projection
A_t = S
Subgradient method
|A_t| = 1
Stochastic gradient
10
Run-Time of Pegasos Choosing |At|=1 and a linear kernel over Rn
Run-time required for Pegasos to find accurate solution with probability 1-
Run-time does not depend on #examples Depends on “difficulty” of problem ( and )
11
Formal Properties
Definition: w is accurate if
Theorem 1: Pegasos finds accurate solution w.p. 1- after at most iterations.
Theorem 2: Pegasos finds log(1/) solutions s.t. w.p. 1-, at least one of them is accurate after iterations
12
Proof Sketch A second look on the update step:
13
Proof Sketch
Denote:
Logarithmic Regret for OCP
Take expectation:
f(wr)-f(w*) 0 Markov gives that w.p. 1-
Amplify the confidence
14
Proof Sketch
15
Proof SketchA function f is called strongly convex if is a convex function.-
16
Proof Sketch
17
Proof Sketch
18
Experiments
3 datasets (provided by Joachims) Reuters CCAT (800K examples, 47k features) Physics ArXiv (62k examples, 100k features) Covertype (581k examples, 54 features)
4 competing algorithms SVM-light (Joachims) SVM-Perf (Joachims’06) Norma (Kivinen, Smola, Williamson ’02) Zhang’04 (stochastic gradient descent)
19
Training Time (in seconds)
Pegasos SVM-Perf SVM-Light
Reuters 2 77 20,075
Covertype 6 85 25,514
Astro-Physics 2 5 80
20
Compare to Norma (on Physics)
obj. value
test error
21
Compare to Zhang (on Physics)O
bje
ctiv
e
But, tuning the parameter is more expensive than learning …
22
Effect of k=|At| when T is fixedO
bje
ctiv
e
23
Effect of k=|At| when kT is fixedO
bje
ctiv
e
24
bias term
Popular approach: increase dimension of x
Cons: “pay” for b in the regularization term
Calculate subgradients w.r.t. w and w.r.t b:
Cons: convergence rate is 1/2
Define:
Cons: |At| need to be large
Search b in an outer loop
Cons: evaluating objective is 1/2
25
Outline
Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works
26
multi-class SVM (Crammer & Singer, 2001)
multi-class model :
27
multi-class SVM (Crammer & Singer, 2001)
multi-class SVM objective function:
where
and the multi-class hinge-loss function is defined as:
where
28
multi-class Pegasos
use the instantaneous objective function :
multi-class Pegasos works by iteratively executing the two-step updates :
Where:
Step 1:
29
multi-class Pegasos
project the weight wt+1 into the closed convex set:
If loss is equal to zero then:
Else:
Step 2:
30
Budgeted Multi-Class Pegasos
31
Budget Maintenance Strategies
Budget maintenance through removal the optimal removal always selects the oldest SV
Budget maintenance through projection projecting an SV onto all the remaining SVs and thus re
sults in smaller weight degradation. Budget maintenance through Merging
merging two SVs to a newly created one The total cost of finding the optimal merging for the n-th
and m-th SV is O(1).
32
Experiments
33
Outline
Review of SVM optimization The Pegasos algorithm Multi-Class Pegasos on a Budget Further works
34
Further works
Distribution_aware Pegasos?
Online structural regularized SVM?
35
Thanks! Q&A
36