+ All Categories
Home > Documents > CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this...

CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this...

Date post: 10-Jul-2020
Category:
Upload: others
View: 1 times
Download: 0 times
Share this document with a friend
48
CS 1674: Intro to Computer Vision Support Vector Machines Prof. Adriana Kovashka University of Pittsburgh October 31, 2016
Transcript
Page 1: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

CS 1674: Intro to Computer Vision

Support Vector Machines

Prof. Adriana KovashkaUniversity of Pittsburgh

October 31, 2016

Page 2: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Plan for today

• Support vector machines

– Separable case / non-separable case

– Linear / non-linear (kernels)

• The importance of generalization

– The bias-variance trade-off (applies to all classifiers)

Page 3: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Lines in R2

0 bcyax

c

aw

y

xxLet

Kristen Grauman

Page 4: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Lines in R2

0 bxw

c

aw

y

xx

0 bcyax

Let

w

Kristen Grauman

Page 5: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Lines in R2

0 bxw

c

aw

y

xx

0 bcyax

Let

w

00, yx

Kristen Grauman

Page 6: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Lines in R2

0 bxw

c

aw

y

xx

0 bcyax

Let

w

00, yx

D

w

xw b

ca

bcyaxD

22

00 distance from

point to line

Kristen Grauman

Page 7: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Lines in R2

0 bxw

c

aw

y

xx

0 bcyax

Let

w

00, yx

D

w

xw ||

22

00 b

ca

bcyaxD

distance from

point to line

Kristen Grauman

Page 8: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Linear classifiers

• Find linear function to separate positive and

negative examples

0:negative

0:positive

b

b

ii

ii

wxx

wxx

Which line

is best?

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 9: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Support vector machines

• Discriminative

classifier based on

optimal separating

line (for 2d case)

• Maximize the

margin between the

positive and

negative training

examples

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 10: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Support vector machines

• Want line that maximizes the margin.

1:1)(negative

1:1)( positive

by

by

iii

iii

wxx

wxx

MarginSupport vectors

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

For support, vectors, 1 bi wx

Page 11: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Support vector machines

• Want line that maximizes the margin.

1:1)(negative

1:1)( positive

by

by

iii

iii

wxx

wxx

Support vectors

For support, vectors, 1 bi wx

Distance between point

and line: ||||

||

w

wx bi

www

211

M

ww

xw 1

For support vectors:

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Margin

Page 12: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Support vector machines

• Want line that maximizes the margin.

1:1)(negative

1:1)( positive

by

by

iii

iii

wxx

wxx

MarginSupport vectors

For support, vectors, 1 bi wx

Distance between point

and line: ||||

||

w

wx bi

Therefore, the margin is 2 / ||w||

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 13: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Finding the maximum margin line

1. Maximize margin 2/||w||

2. Correctly classify all training data points:

Quadratic optimization problem:

Minimize

Subject to yi(w·xi+b) ≥ 1

wwT

2

1

1:1)(negative

1:1)( positive

by

by

iii

iii

wxx

wxx

One constraint for each

training point.

Note sign trick.

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 14: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Finding the maximum margin line

• Solution: i iii y xw

Support

vector

Learned

weight

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 15: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Finding the maximum margin line

• Solution:

b = yi – w·xi (for any support vector)

• Classification function:

• Notice that it relies on an inner product between the test

point x and the support vectors xi

• (Solving the optimization problem also involves

computing the inner products xi · xj between all pairs of

training points)

i iii y xw

by

xf

ii

xx

xw

i isign

b)(sign )(

If f(x) < 0, classify as negative, otherwise classify as positive.

C. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery, 1998

Page 16: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

• Datasets that are linearly separable work out great:

• But what if the dataset is just too hard?

• We can map it to a higher-dimensional space:

0 x

0 x

0 x

x2

Nonlinear SVMs

Andrew Moore

Page 17: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Φ: x→ φ(x)

• General idea: the original input space can always be mapped to some higher-dimensional feature space where the training set is separable:

Andrew Moore

Nonlinear SVMs

Page 18: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Nonlinear kernel: Example

• Consider the mapping ),()( 2xxx

22

2222

),(

),(),()()(

yxxyyxK

yxxyyyxxyx

x2

Svetlana Lazebnik

Page 19: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

The “Kernel Trick”

• The linear classifier relies on dot product between vectors K(xi,xj) = xi · xj

• If every data point is mapped into high-dimensional space via some transformation Φ: xi → φ(xi ), the dot product becomes: K(xi,xj) = φ(xi ) · φ(xj)

• A kernel function is similarity function that corresponds to an inner product in some expanded feature space

• The kernel trick: instead of explicitly computing the lifting transformation φ(x), define a kernel function K such that: K(xi,xj) = φ(xi ) · φ(xj)

Andrew Moore

Page 20: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Examples of kernel functions

Linear:

Polynomials of degree up to d:

Gaussian RBF:

Histogram intersection:

)2

exp()(2

2

ji

ji

xx,xxK

k

jiji kxkxxxK ))(),(min(),(

j

T

iji xxxxK ),(

Andrew Moore / Carlos Guestrin

𝐾(𝑥𝑖, 𝑥𝑗) = (𝑥𝑖𝑇𝑥𝑗 + 1)𝑑

Page 21: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Allowing misclassifications: Before

Maximize margin

The w that minimizes…

Page 22: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Allowing misclassifications: After

Maximize margin Minimize misclassification

Slack variable

The w that minimizes…

Misclassification cost

# data samples

Page 23: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

What about multi-class SVMs?

• Unfortunately, there is no “definitive” multi-class SVM formulation

• In practice, we have to obtain a multi-class SVM by combining multiple two-class SVMs

• One vs. others– Training: learn an SVM for each class vs. the others– Testing: apply each SVM to the test example, and assign it to the

class of the SVM that returns the highest decision value

• One vs. one– Training: learn an SVM for each pair of classes– Testing: each learned SVM “votes” for a class to assign to the

test example

Svetlana Lazebnik

Page 24: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Multi-class problems

• One-vs-all (a.k.a. one-vs-others)– Train K classifiers

– In each, pos = data from class i, neg = data from classes other than i

– The class with the most confident prediction wins

– Example: • You have 4 classes, train 4 classifiers

• 1 vs others: score 3.5

• 2 vs others: score 6.2

• 3 vs others: score 1.4

• 4 vs other: score 5.5

• Final prediction: class 2

Page 25: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Multi-class problems

• One-vs-one (a.k.a. all-vs-all)

– Train K(K-1)/2 binary classifiers (all pairs of classes)

– They all vote for the label

– Example:

• You have 4 classes, then train 6 classifiers

• 1 vs 2, 1 vs 3, 1 vs 4, 2 vs 3, 2 vs 4, 3 vs 4

• Votes: 1, 1, 4, 2, 4, 4

• Final prediction is class 4

Page 26: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

SVMs for recognition1. Define your representation for each

example.

2. Select a kernel function.

3. Compute pairwise kernel values

between labeled examples

4. Use this “kernel matrix” to solve for

SVM support vectors & weights.

5. To classify a new example: compute

kernel values between new input

and support vectors, apply weights,

check sign of output.

Kristen Grauman

Page 27: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Example: learning gender with SVMs

Moghaddam and Yang, Learning Gender with Support Faces,

TPAMI 2002.

Moghaddam and Yang, Face & Gesture 2000.

Kristen Grauman

Page 28: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

• Training examples:

– 1044 males

– 713 females

• Experiment with various kernels, select

Gaussian RBF

Learning gender with SVMs

)2

exp(),(2

2

ji

ji

xxxx

K

Kristen Grauman

Page 29: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Support Faces

Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

Page 30: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Moghaddam and Yang, Learning Gender with Support Faces, TPAMI 2002.

Page 31: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Gender perception experiment:

How well can humans do?

• Subjects:

– 30 people (22 male, 8 female)

– Ages mid-20’s to mid-40’s

• Test data:

– 254 face images (6 males, 4 females)

– Low res and high res versions

• Task:

– Classify as male or female, forced choice

– No time limit

Moghaddam and Yang, Face & Gesture 2000.

Page 32: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Moghaddam and Yang, Face & Gesture 2000.

Gender perception experiment:

How well can humans do?

Error Error

Page 33: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Human vs. Machine

• SVMs performed

better than any

single human

test subject, at

either resolution

Kristen Grauman

Page 34: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

SVMs: Pros and cons

• Pros• Many publicly available SVM packages:

http://www.csie.ntu.edu.tw/~cjlin/libsvm/

or use built-in Matlab version (but slower)

• Kernel-based framework is very powerful, flexible

• Often a sparse set of support vectors – compact at test time

• Work very well in practice, even with very small training

sample sizes

• Cons• No “direct” multi-class SVM, must combine two-class SVMs

• Can be tricky to select best kernel function for a problem

• Computation, memory

– During training time, must compute matrix of kernel values for

every pair of examples

– Learning can take a very long time for large-scale problems

Adapted from Lana Lazebnik

Page 35: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Precision / Recall / F-measure

• Precision = 2 / 5 = 0.4

• Recall = 2 / 4 = 0.5

• F-measure = 2*0.4*0.5 / 0.4+0.5 = 0.44

True positives

(images that contain people)

True negatives

(images that do not contain people)

Predicted positives

(images predicted to contain people)

Predicted negatives

(images predicted not to contain people)

Accuracy: 5 / 10 = 0.5

Page 36: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Generalization

• How well does a learned model generalize from

the data it was trained on to a new test set?

Training set (labels known) Test set (labels

unknown)

Slide credit: L. Lazebnik

Page 37: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Generalization• Components of generalization error

– Bias: how much the average model over all training sets differs

from the true model

• Error due to inaccurate assumptions/simplifications made by

the model

– Variance: how much models estimated from different training

sets differ from each other

• Underfitting: model is too “simple” to represent all the

relevant class characteristics

– High bias and low variance

– High training error and high test error

• Overfitting: model is too “complex” and fits irrelevant

characteristics (noise) in the data

– Low bias and high variance

– Low training error and high test error

Slide credit: L. Lazebnik

Page 38: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Bias-Variance Trade-off

• Models with too few parameters are inaccurate because of a large bias (not enough flexibility).

• Models with too many parameters are inaccurate because of a large variance (too much sensitivity to the sample).

Slide credit: D. Hoiem

Page 39: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Fitting a model

Figures from Bishop

Is this a good fit?

Page 40: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

With more training data

Figures from Bishop

Page 41: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Bias-variance tradeoff

Training error

Test error

Underfitting Overfitting

Complexity Low Bias

High Variance

High Bias

Low Variance

Err

or

Slide credit: D. Hoiem

Page 42: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Bias-variance tradeoff

Many training examples

Few training examples

Complexity Low Bias

High Variance

High Bias

Low Variance

Test E

rror

Slide credit: D. Hoiem

Page 43: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Choosing the trade-off

• Need validation set

• Validation set is separate from the test set

Training error

Validation error

Complexity Low Bias

High Variance

High Bias

Low Variance

Err

or

Slide credit: D. Hoiem

Page 44: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Effect of Training Size

Testing

Training

Generalization Error

Number of Training Examples

Err

or

Fixed prediction model

Adapted from D. Hoiem

Page 45: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

How to reduce variance?

• Choose a simpler classifier

• Use fewer features

• Get more training data

• Regularize the parameters

Slide credit: D. Hoiem

Page 46: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Regularization

Figures from Bishop

No regularization Huge regularization

Page 47: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Characteristics of vision learning problems

• Lots of continuous features

– Spatial pyramid may have ~15,000 features

• Imbalanced classes

– Often limited positive examples, practically infinite negative examples

• Difficult prediction tasks

• Recently, massive training sets became available

– If we have a massive training set, we want classifiers with low bias (high variance is ok) and reasonably efficient training

Adapted from D. Hoiem

Page 48: CS 1674: Intro to Computer Visionkovashka/cs1674_fa16/vision... · 2016-10-31 · 4. Use this “kernel matrix” to solve for SVM support vectors & weights. 5. To classify a new

Remember…

• No free lunch: machine learning algorithms are tools

• Three kinds of error

– Inherent: unavoidable

– Bias: due to over-simplifications

– Variance: due to inability to perfectly estimate parameters from limited data

• Try simple classifiers first

• Better to have smart features and simple classifiers than simple features and smart classifiers

• Use increasingly powerful classifiers with more training data (bias-variance tradeoff)

Adapted from D. Hoiem


Recommended