Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | carter-burris |
View: | 16 times |
Download: | 4 times |
Support Vector Machine
Le Do Hoang Nam – CNTN08
Linear Programming
General Form with x in Rn
Linear objective, Linear constraints, …
Linear Programming
An example: The Diet Problem
How to come up with a cheapest meal that meets all nutrition standards?
Linear Programming
Let x1, x2 and x3 be the amount in kilos of carrot, cabbage and cucumber in the dish.
Mathematically,
Linear Programming
In canonical form:
How to solve? Simplex. Newton method. Gradient descend.
LP and Classification
Given a set of N samples (mi, li) mi is the feature set.
li = -1 or 1 is the label.
If a sample is correctly classified by a hyper-plane wTx + c then:
li (wTmi + c) ≥ 1
linear function
LP and Classification
(w, c) is a good classification if it satisfies:
li (wTmi + c) ≥ 1 , i = 1..nwhich are linear constraints
LP form:
LP and Classification
Without any objective function, we have ALL possible solutions:
Class 1
Class 2
Class 1
Class 2
LP and Classification
If data is not linearly separable:
Minimize number of errors
Class 1
Class 2
LP and Classification
Our objective becomes:
But, cardinal function is non-linear not an LP
LP and Classification
Cardinal function:
x
f(x)
1
O 1
Solution: Approximate it with Hinge-loss function.
LP and Classification
Hinge-loss function:
x
f(x)
1
O 1
Or:
LP and Classification
Classification problem now becomes:
which can be solved as an LP
LP and Classification
Geometry view:
Class 1
Class 2
mi
mj
εi
εj
wTx + c = 0
wTx + c = -1
wTx + c = 1
LP and Classification
Another problem: Some samples are uncertain
Class 1
Class 2
LP and Classification
Solution: Maximum the margin d.
Class 1
Class 2
d
LP and Classification
All samples are outside the margin
All the distances from samples to boundary are bigger than d/2. That means:
LP and Classification
Because hyper-plane is homogenous, we choose w such as:
The objective function:
LP and Classification
The problem now becomes:
Support Vector Machine
Together with the error minimization, we have the SVM:
λ means the trade-off between error and robustness
Kernel Method