Date post: | 31-Dec-2015 |
Category: |
Documents |
Upload: | merritt-walton |
View: | 52 times |
Download: | 0 times |
CS 59000 Statistical Machine learningLecture 18
Yuan (Alan) QiPurdue CS
Oct. 30 2008
Outline
• Review of Support Vector Machines for Linearly Separable Case
• Support Vector Machines for Overlapping Class Distributions
• Support Vector Machines for Regression
Support Vector Machines
Support Vector Machines: motivated by statistical learning theory.
Maximum margin classifiers
Margin: the smallest distance between the decision boundary and any of the samples
Maximizing Margin
Since scaling w and b together will not change the above ratio, we set
In the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.
Optimization Problem
Quadratic programming:
Subject to
Lagrange Multiplier
Maximize
Subject to
Gradient of constraint:
Geometrical Illustration of Lagrange Multiplier
Lagrange Multiplier with Inequality Constraints
Karush-Kuhn-Tucker (KKT) condition
Lagrange Function for SVM
Quadratic programming:Subject to
Lagrange function:
Dual Variables
Setting derivatives over L to zero:
Dual Problem
Prediction
KKT Condition, Support Vectors, and Bias
The corresponding data points in the latter case are known as support vectors. Then we can solve the bias term as follows:
Computational Complexity
Quadratic programming:
When Dimension < Number of data points, Solving the Dual problem is more costly.
Dual representation allows the use of kernels
Example: SVM Classification
Classification for Overlapping Classes
Soft Margin:
New Cost Function
To maximize margin and softly penalize points that lies on the wrong side of margin (not decision) boundary, we minimize
Lagrange Function
Where we have Lagrange multipliers:
KKT Condition
Gradients
Dual Lagrangian
Since and , we have
Dual Lagrangian with Constraints
Maximize
Subject to
Support Vectors
Discussions on two cases of support vectors.
Solve Bias Term
Discussion on solving SVMs...
Interpretation from Regularization Framework
Regularized Logistic Regression
For logistic regression, we have
Visualization of Hinge Error Function
SVM for Regression
Using sum of square errors, we have
However, the solution for ridge regression is not sparse.
Є-insensitive Error Function
Minimize
Slack Variables
How many slack variables do we need?
Minimize
Visualization of SVM Regression
Support Vectors for Regression
Which points will be support vectors for regression?
Why?
Sparsity Revisited
Discussion: Error function or regularizer (Lasso)