+ All Categories
Home > Documents > Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM...

Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM...

Date post: 17-Oct-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
24
Support Vector Machine Hung Le University of Victoria February 12, 2019 Hung Le (University of Victoria) Support Vector Machine February 12, 2019 1 / 15
Transcript
Page 1: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine

Hung Le

University of Victoria

February 12, 2019

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 1 / 15

Page 2: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Binary Classification

You are given a set of n data points D = {(x1, y1), x2, yy ), . . . , (xn, yn)}where each xi ∈ Rd and yi ∈ {−1, 1}. Find a classifierf (.) : Rd → {−1, 1} such that:

f (xi ) =

{1 if yi = 1

−1 if yi = −1

Figure: Sprial data1

1https://www.classes.cs.uchicago.edu/archive/2013/winter/12200-1/

assignments/pa4/index.html

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 2 / 15

Page 3: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Applications

Spam email classification:I Each data point is (xi , yi ) where xi is a vector representation of i-th

email, and yi = 1/− 1 indicates the email is spam/non-spam.

Testing disease: determine a person as a certain disease or not.

Weather forecasting: tomorrow is rainy or not.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 3 / 15

Page 4: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine

You are given a set of n data points D = {(x1, y1), x2, yy ), . . . , (xn, yn)}where each xi ∈ Rd and yi ∈ {−1, 1}. Find a separating hyperplanewTx + b = 0 such that:

wTxi + b > 0 if yi = 1

wTxi + b < 0 if yi = −1

We assume that our data is linearly separable, i.e, there exists such aseparating hyperplane.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 4 / 15

Page 5: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine

You are given a set of n data points D = {(x1, y1), x2, yy ), . . . , (xn, yn)}where each xi ∈ Rd and yi ∈ {−1, 1}. Find a separating hyperplanewTx + b = 0 such that:

wTxi + b > 0 if yi = 1

wTxi + b < 0 if yi = −1

We assume that our data is linearly separable, i.e, there exists such aseparating hyperplane.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 4 / 15

Page 6: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine - An Example

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 5 / 15

Page 7: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine - A Toy Example

Given four points (

[12

],−1), (

[21

],−1), (

[34

], 1), (

[43

], 1). Find a

separating line w1 · x1 + w2 · x2 + b = 0 for these points.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 6 / 15

Page 8: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine - A Toy Example

There are several possible lines:

(L1) : x1 + x2 − 4 = 0 (L2) : x1 + x2 − 5 = 0

(L3) : x1 + x2 − 6 = 0 (L4) : x1 + 2x2 − 6 = 0(1)

Which line should we choose? In theory, any line is acceptable.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 7 / 15

Page 9: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

SVM separating principle

Choose a line that that maximizes the margin of the point set to theseparating line.

Margin of a separating line (L) w.r.t the point set D is the minimumdistance of the point set to the line:

γ(L) = min(xi ,yi )∈D

d(xi , L) (2)

Recall, distance from a point x0 ∈ Rd to the line (L) : wTx + b = 0 is:

d(x0, L) =|wTx0 + b|||w||2

(3)

where ||w||2 =√∑d

i=1 w [i ]2

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 8 / 15

Page 10: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

SVM separating principle

Choose a line that that maximizes the margin of the point set to theseparating line.

Margin of a separating line (L) w.r.t the point set D is the minimumdistance of the point set to the line:

γ(L) = min(xi ,yi )∈D

d(xi , L) (2)

Recall, distance from a point x0 ∈ Rd to the line (L) : wTx + b = 0 is:

d(x0, L) =|wTx0 + b|||w||2

(3)

where ||w||2 =√∑d

i=1 w [i ]2

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 8 / 15

Page 11: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

SVM separating principle

Choose a line that that maximizes the margin of the point set to theseparating line.

Margin of a separating line (L) w.r.t the point set D is the minimumdistance of the point set to the line:

γ(L) = min(xi ,yi )∈D

d(xi , L) (2)

Recall, distance from a point x0 ∈ Rd to the line (L) : wTx + b = 0 is:

d(x0, L) =|wTx0 + b|||w||2

(3)

where ||w||2 =√∑d

i=1 w [i ]2

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 8 / 15

Page 12: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Back to our Toy Example

(L1) : x1 + x2 − 4 = 0 (L2) : x1 + x2 − 5 = 0

(L3) : x1 + x2 − 6 = 0 (L4) : x1 + 2x2 − 6 = 0(4)

SVM will choose (L2) with γ(L2) =√

2 (see the board calculation)

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 9 / 15

Page 13: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine

You are given a set of n data points D = {(x1, y1), x2, yy ), . . . , (xn, yn)}where each xi ∈ Rd and yi ∈ {−1, 1}. Find a separating hyperplane(L) : wTx + b = 0 such that:

wTxi + b > 0 if yi = 1

wTxi + b < 0 if yi = −1

and the margin γ(L) is maximum among all possible separatinghyperplanes.

Points xj that have d(L, xi ) = γ(L) are called support vectors.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 10 / 15

Page 14: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine

The problem is equivalent to:

Find w, b that:

maximize(mini

|wTxi + b|||w ||2

) (5)

Observation

If (w, b) defines a valid SVM hyperplane, then (c ·w, c · b) also defines avalid SVM hyperplane.

Thus, we can assume that:

wTxj + b = 1 for all support vectors xj of 1-class.

wTxj + b = −1 for all support vectors xj of (−1)-class.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 11 / 15

Page 15: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine

The problem is equivalent to:

Find w, b that:

maximize(mini

|wTxi + b|||w ||2

) (5)

Observation

If (w, b) defines a valid SVM hyperplane, then (c ·w, c · b) also defines avalid SVM hyperplane.

Thus, we can assume that:

wTxj + b = 1 for all support vectors xj of 1-class.

wTxj + b = −1 for all support vectors xj of (−1)-class.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 11 / 15

Page 16: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine

The problem is equivalent to:

Find w, b that:

maximize(mini

|wTxi + b|||w ||2

) (5)

Observation

If (w, b) defines a valid SVM hyperplane, then (c ·w, c · b) also defines avalid SVM hyperplane.

Thus, we can assume that:

wTxj + b = 1 for all support vectors xj of 1-class.

wTxj + b = −1 for all support vectors xj of (−1)-class.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 11 / 15

Page 17: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Support Vector Machine

The problem becomes (see the board calculation):

Find w, b that :

minimize1

2||w||22

subject to yi (wTxi + b) ≥ 1 ∀i

(6)

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 12 / 15

Page 18: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Regularization Variant of SVMTransform the constrained optimization problem from SVM to:

Find w, b that :

minimize f (w, b) =1

2||w||22 + C (

n∑i=1

max(0, 1− yi (wTxi + b))) (7)

where C is a chosen positive number.

When C is sufficiently big, we force the optimization algorithmreturning (w, b) such that yi (w

Txi + b) ≥ 1 for all i . This is onlypossible when the data is linearly separable.When C is chosen appropriately, the optimization problem 7 has aregularizing effect.

I We accept mis-classified points, but most other points are far awayfrom the hyperplane.

I The problem is well-defined even if the data is NOT linearly separable.

C is called the regularization parameter.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 13 / 15

Page 19: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Regularization Variant of SVMTransform the constrained optimization problem from SVM to:

Find w, b that :

minimize f (w, b) =1

2||w||22 + C (

n∑i=1

max(0, 1− yi (wTxi + b))) (7)

where C is a chosen positive number.

When C is sufficiently big, we force the optimization algorithmreturning (w, b) such that yi (w

Txi + b) ≥ 1 for all i . This is onlypossible when the data is linearly separable.

When C is chosen appropriately, the optimization problem 7 has aregularizing effect.

I We accept mis-classified points, but most other points are far awayfrom the hyperplane.

I The problem is well-defined even if the data is NOT linearly separable.

C is called the regularization parameter.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 13 / 15

Page 20: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Regularization Variant of SVMTransform the constrained optimization problem from SVM to:

Find w, b that :

minimize f (w, b) =1

2||w||22 + C (

n∑i=1

max(0, 1− yi (wTxi + b))) (7)

where C is a chosen positive number.

When C is sufficiently big, we force the optimization algorithmreturning (w, b) such that yi (w

Txi + b) ≥ 1 for all i . This is onlypossible when the data is linearly separable.When C is chosen appropriately, the optimization problem 7 has aregularizing effect.

I We accept mis-classified points, but most other points are far awayfrom the hyperplane.

I The problem is well-defined even if the data is NOT linearly separable.

C is called the regularization parameter.

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 13 / 15

Page 21: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Optimization by SGD

LetLi (w, b) = max(0, 1− yi (w

Txi + b)) (8)

Li (w, b) is called a hinge function and its value is called a hinge loss.We have:

∂Li (w, b)

∂w [j ]=

{−yixi [j ] if yi (w

T x + b) < 1

0 otherwise(9)

and∂Li (w, b)

∂b=

{−yi if yi (w

T x + b) < 1

0 otherwise(10)

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 14 / 15

Page 22: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Optimization by SGD

LetLi (w, b) = max(0, 1− yi (w

Txi + b)) (8)

Li (w, b) is called a hinge function and its value is called a hinge loss.

We have:∂Li (w, b)

∂w [j ]=

{−yixi [j ] if yi (w

T x + b) < 1

0 otherwise(9)

and∂Li (w, b)

∂b=

{−yi if yi (w

T x + b) < 1

0 otherwise(10)

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 14 / 15

Page 23: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Optimization by SGD

LetLi (w, b) = max(0, 1− yi (w

Txi + b)) (8)

Li (w, b) is called a hinge function and its value is called a hinge loss.We have:

∂Li (w, b)

∂w [j ]=

{−yixi [j ] if yi (w

T x + b) < 1

0 otherwise(9)

and∂Li (w, b)

∂b=

{−yi if yi (w

T x + b) < 1

0 otherwise(10)

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 14 / 15

Page 24: Support Vector Machine · jwTx i + bj jjwjj 2) (5) Observation If (w;b) de nes a valid SVM hyperplane, then (c w;c b) also de nes a valid SVM hyperplane. Thus, we can assume that:

Optimization by SGD

Since:

f (w, b) =1

2||w||22 + C

n∑i=1

Li (w) (11)

we have:∂f (w, b)

∂w [j ]= w [j ] + C

n∑i=1

∂Li (w, b)

∂w [j ](12)

and∂f (w, b)

∂b= C

n∑i=1

∂Li (w, b)

∂b(13)

Hung Le (University of Victoria) Support Vector Machine February 12, 2019 15 / 15


Recommended