October 2-4, 2000 M2000 1 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst. http://www.rpi.edu/~bennek
Transcript
Slide 1
Slide 2
October 2-4, 2000M20001 Support Vector Machines: Hype or
Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer
Polytechnic Inst. http://www.rpi.edu/~bennek
Slide 3
October 2-4, 2000M20002 Outline zSupport Vector Machines for
Classification yLinear Discrimination yNonlinear Discrimination
zExtensions zApplication in Drug Design zHallelujah zHype
Slide 4
October 2-4, 2000M20003 Support Vector Machines (SVM) Key
Ideas: zMaximize Margins zDo the Dual zConstruct Kernels A
methodology for inference based on Vapniks Statistical Learning
Theory.
Slide 5
October 2-4, 2000M20004 Best Linear Separator?
Slide 6
October 2-4, 2000M20005 Best Linear Separator?
Slide 7
October 2-4, 2000M20006 Best Linear Separator?
Slide 8
October 2-4, 2000M20007 Best Linear Separator?
Slide 9
October 2-4, 2000M20008 Best Linear Separator?
Slide 10
October 2-4, 2000M20009 Find Closest Points in Convex Hulls c
d
Slide 11
October 2-4, 2000M200010 Plane Bisect Closest Points d c
Slide 12
October 2-4, 2000M200011 Find using quadratic program Many
existing and new solvers.
Slide 13
October 2-4, 2000M200012 Best Linear Separator: Supporting
Plane Method Maximize distance Between two parallel supporting
planes Distance = Margin =
Slide 14
October 2-4, 2000M200013 Maximize margin using quadratic
program
Slide 15
October 2-4, 2000M200014 Dual of Closest Points Method is
Support Plane Method Solution only depends on support vectors:
Slide 16
October 2-4, 2000M200015 Statistical Learning Theory
zMisclassification error and the function complexity bound
generalization error. zMaximizing margins minimizes complexity.
zEliminates overfitting. zSolution depends only on Support Vectors
not number of attributes.
Slide 17
October 2-4, 2000M200016 Margins and Complexity Skinny margin
is more flexible thus more complex.
Slide 18
October 2-4, 2000M200017 Margins and Complexity Fat margin is
less complex.
Slide 19
October 2-4, 2000M200018 Linearly Inseparable Case Convex Hulls
Intersect! Same argument wont work.
Slide 20
October 2-4, 2000M200019 Reduced Convex Hulls Dont Intersect
Reduce by adding upper bound D
Slide 21
October 2-4, 2000M200020 Find Closest Points Then Bisect No
change except for D. D determines number of Support Vectors.
Slide 22
October 2-4, 2000M200021 Linearly Inseparable Case: Supporting
Plane Method Just add non-negative error vector z.
Slide 23
October 2-4, 2000M200022 Dual of Closest Points Method is
Support Plane Method Solution only depends on support vectors:
Slide 24
October 2-4, 2000M200023 Nonlinear Classification
Slide 25
October 2-4, 2000M200024 Nonlinear Classification: Map to
higher dimensional space IDEA: Map each point to higher dimensional
feature space and construct linear discriminant in the higher
dimensional space. Dual SVM becomes:
Slide 26
October 2-4, 2000M200025 Generalized Inner Product By
Hilbert-Schmidt Kernels (Courant and Hilbert 1953) for certain and
K, e.g.
Slide 27
October 2-4, 2000M200026 Final Classification via Kernels The
Dual SVM becomes:
Slide 28
October 2-4, 2000M200027
Slide 29
October 2-4, 2000M200028 zSolve Dual SVM QP zRecover primal
variable b zClassify new x Final SVM Algorithm Solution only
depends on support vectors :
Slide 30
October 2-4, 2000M200029 Support Vector Machines (SVM) zKey
Formulation Ideas: yMaximize Margins yDo the Dual yConstruct
Kernels zGeneralization Error Bounds zPractical Algorithms
October 2-4, 2000M200031 Example in Drug Design zGoal to
predict bio-reactivity of molecules to decrease drug development
time. zTarget is to predict the logarithm of inhibition
concentration for site "A" on the Cholecystokinin (CCK) molecule.
zConstructs quantitative structure activity relationship (QSAR)
model.
Slide 33
October 2-4, 2000M200032 SVM Regression: -insensitive loss
function ++ --
Slide 34
October 2-4, 2000M200033 SVM Minimizes
Underestimate+Overestimate
Slide 35
October 2-4, 2000M200034 LCCKA Problem zTraining data 66
molecules z323 original attributes are wavelet coefficients of TAE
Descriptors. z39 subset of attributes selected by linear 1-norm SVM
(with no kernels). zFor details see DDASSL project link off of
http://www.rpi.edu/~bennek. http://www.rpi.edu/~bennek zTesting set
results reported.
Slide 36
October 2-4, 2000M200035 LCCK Prediction Q2=.25
Slide 37
October 2-4, 2000M200036 Many Other Applications zSpeech
Recognition zData Base Marketing zQuark Flavors in High Energy
Physics zDynamic Object Recognition zKnock Detection in Engines
zProtein Sequence Problem zText Categorization zBreast Cancer
Diagnosis zSee:
http://www.clopinet.com/isabelle/Projects/http://www.clopinet.com/isabelle/Projects/
SVM/applist.html
Slide 38
October 2-4, 2000M200037 Hallelujah! zGeneralization theory and
practice meet zGeneral methodology for many types of problems zSame
Program + New Kernel = New method zNo problems with local minima
zFew model parameters. Selects capacity. zRobust optimization
methods. zSuccessful Applications BUT
Slide 39
October 2-4, 2000M200038 HYPE? zWill SVMs beat my best
hand-tuned method Z for X? zDo SVM scale to massive datasets? zHow
to chose C and Kernel? zWhat is the effect of attribute scaling?
zHow to handle categorical variables? zHow to incorporate domain
knowledge? zHow to interpret results?
Slide 40
October 2-4, 2000M200039 Support Vector Machine Resources
zhttp://www.support-vector.net/ zhttp://www.kernel-machines.org/
zLinks off my web page: http://www.rpi.edu/~bennek