+ All Categories
Home > Documents > SVM classifierscvml.ajou.ac.kr/wiki/images/1/1e/Ch3_1_SVM-1.pdfΒ Β· 2017. 1. 12.Β Β· Linear...

SVM classifierscvml.ajou.ac.kr/wiki/images/1/1e/Ch3_1_SVM-1.pdfΒ Β· 2017. 1. 12.Β Β· Linear...

Date post: 26-Jan-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
33
SVM classifiers
Transcript
  • SVM classifiers

  • Binary classification

    Given training data (π‘₯𝑖 , 𝑦𝑖) for 𝑖 = 1 . . . N, with π‘₯𝑖 ∈ R𝑑 and 𝑦𝑖 ∈ {-1,1},

    learn a classifier 𝑓(π‘₯) such that

    𝑓 π‘₯𝑖 = β‰₯ 0, 𝑦𝑖= +1< 0, 𝑦𝑖= βˆ’1

    i.e. 𝑦𝑖𝑓 π‘₯𝑖 > 0 for a correct classification.

  • Linear separability

    Linearly

    separable

    not

    Linearly

    separable

  • Linear classifiers

    A linear classifier has the form

    𝑓 π‘₯ = 𝑀𝑇π‘₯ + b

    β€’ In 2D the discriminant is a line

    β€’ 𝑀 is the normal to the line, and b the bias

    β€’ 𝑀 is known as the weight vector

  • Linear classifiers

    A linear classifier has the form

    𝑓 π‘₯ = 𝑀𝑇π‘₯ + b

    β€’ In 3D the discriminant is a plane, and in nD it is a hyperplain

    For a K-NN classifier it was necessary to β€˜carry’ the training data

    For a linear classifier, the training data is used to learn 𝑀 and then discarded

    Only 𝑀 is needed for classifying new data

  • Reminder: The Perceptron Classifier

    Given linearly separable data π‘₯𝑖 labelled into two categories 𝑦𝑖 = {-1,1}, find a weight vector 𝑀 such that the discriminant function

    𝑓 π‘₯𝑖 = 𝑀𝑇π‘₯𝑖+ b

    Separates the categories for 𝑖 = 1, … ,N

    β€’ How can we find this separating hyperplane?

    The Perceptron Algorithm

    Write classifier as 𝑓 π‘₯𝑖 = 𝑀𝑇 π‘₯𝑖 + Ο‰0 = 𝑀

    𝑇π‘₯𝑖where 𝑀 = ( 𝑀, Ο‰0), π‘₯𝑖 = ( π‘₯𝑖,1)

    β€’ Initialize 𝑀 = 0

    β€’ Cycle though the data points {π‘₯𝑖, 𝑦𝑖}

    β€’ If π‘₯𝑖 is misclassified then 𝑀 ← 𝑀 + Ξ±sign(𝑓 π‘₯𝑖 )π‘₯𝑖‒ Until all the data is correctly classified

  • For example in 2D

    β€’ Initialize 𝑀 = 0

    β€’ Cycle though the data points {π‘₯𝑖, 𝑦𝑖}

    β€’ If π‘₯𝑖 is misclassified then 𝑀 ← 𝑀 + Ξ±sign(𝑓 π‘₯𝑖 )π‘₯𝑖

    β€’ Until all the data is correctly classified

  • β€’ If the data is linearly separable, then the algorithm will converge

    β€’ Convergence can be slow …

    β€’ Separating line close to training data

    β€’ We would prefer a larger margin for generalization

  • β€’ Maximum margin solution: most stable under perturbations of the inputs

  • Support Vector Machine

  • SVM – sketch derivation

    β€’ Since 𝑀𝑇π‘₯+ b = 0 and c(𝑀𝑇π‘₯+ b) = 0 define the same plane, we have the freedom to choose the normalization of 𝑀

    β€’ Choose normalization such that 𝑀𝑇π‘₯++ b = +1 and 𝑀𝑇π‘₯-+ b = -1 for the

    positive and negative support vectors respectively

    β€’ Then the margin is given by

    𝑀

    π‘€βˆ™ (π‘₯+ βˆ’ π‘₯βˆ’) =

    𝑀𝑇(π‘₯+βˆ’π‘₯βˆ’)

    𝑀=

    2

    𝑀

  • Support Vector MachineLinearly separable data

  • SVM – Optimization

    β€’ Learning the SVM can be formulated as an optimization:

    mπ‘Žπ‘₯𝑀

    2

    𝑀subject to 𝑀𝑇π‘₯𝑖 + 𝑏

    β‰₯ 1 𝑖𝑓 𝑦𝑖 = +1≀ βˆ’1 𝑖𝑓 𝑦𝑖 = βˆ’1

    for 𝑖 = 1 . . . N

    β€’ Or equivalently

    m𝑖𝑛𝑀

    𝑀 2 subject to 𝑦𝑖(𝑀𝑇π‘₯𝑖 + 𝑏) β‰₯ 1 for 𝑖 = 1 . . . N

    β€’ This is a quadratic optimization problem subject to linear constraints and

    there is a unique minimum

  • Linear separability again: What is the best w?

    β€’ The points can be linearly separated but there is a very narrow margin

    In general there is a trade off between the margin and the number of

    Mistakes on the training data

    β€’ But possibly the large margin solution is better, even though one constraint is violated

  • Introduce β€œslack” variables

  • β€œSoft” margin solution

    The optimization problem becomes

    minπ‘€βˆˆπ‘…π‘‘,πœ‰

    π‘–βˆˆπ‘…+

    𝑀 2 +𝑐

    𝑖

    𝑁

    πœ‰

    subject to

    𝑦𝑖(𝑀𝑇π‘₯𝑖 + 𝑏) β‰₯ 1- πœ‰π‘– for 𝑖 = 1 . . . N

    β€’ Every constraint can be satisfied if πœ‰π‘– is sufficiently large

    β€’ 𝐢 is regularization parameter:

    - small 𝐢 allows constraints to be easily ignored β†’ large margin

    - large 𝐢 makes constraints hard to ignored β†’ narrow margin

    - 𝐢 = ∞ enforces all constraints: hard margin

    β€’ This is still a quadratic optimization problem and there is a unique minimum.

    Note, there is only one parameter, 𝐢.

  • β€’ Data is linearly separable

    β€’ But only with a narrow margin

  • 𝐢 = ∞ : hard margin

  • C = 10 soft margin

  • Application: Pedestrian detection in

    Computer Vision

    β€’ Objective: detect (localize) standing humans in an image

    (c.f. face detection with a sliding window classifier)β€’

    β€’ reduces object detection to binary classification

    β€’ does an image window contain a person or not?

  • Detection problem (binary) classification

    problem

  • Each window is separately classified

  • Training data

    β€’ 64x128 images of humans cropped from a varied set of personal photos

    β€’ Positive data – 1239 positive window examples (reflections->2478)

    β€’ Negative data – 1218 person-free training photos (12180 patches)

  • Training

    β€’ A preliminary detector

    β€’ Trained with (2478) vs (12180) samples

    β€’ Retraining

    β€’ With augmented data set

    β€’ initial 12180 + hard examples

    β€’ Hard examples

    β€’ 1218 negative training photos are searched exhaustively for false positive

  • Feature: histogram of oriented gradients

    (HOG)

  • Averaged examples

  • Algorithm

    β€’ Training(Learning)

    β€’ Represent each example window by a HOG feature vector

    β€’ Train a SVM classifier

    β€’ Testing(Detection)

    β€’ Sliding window classifier

    𝑓 π‘₯ = 𝑀𝑇π‘₯ + b

    xπ‘–βˆˆR𝑑, π‘€π‘–π‘‘β„Ž 𝑑 = 1024

  • Learned model

    𝑓 π‘₯ = 𝑀𝑇π‘₯ + b


Recommended