+ All Categories
Home > Documents > Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... ·...

Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... ·...

Date post: 24-Jan-2021
Category:
Upload: others
View: 2 times
Download: 0 times
Share this document with a friend
56
Sp’10 Bafna/Ideker Classification (SVMs / Kernel method)
Transcript
Page 1: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Classification (SVMs / Kernel method)

Page 2: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

LP versus Quadratic programming

min cT x

Ax b

x 0

min xTQx cT x

Ax b

x 0

• LP: linear constraints, linear

objective function

• LP can be solved in

polynomial time.

• In QP, the objective function

contains a quadratic form.

• For +ve semindefinite Q, the

QP can be solved in

polynomial time

Page 3: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Margin of separation

• Suppose we find a separating hyperplane (, 0) s.t. – For all +ve points x

• Tx-0>=1

– For all +ve points x • Tx-0 <= -1

• What is the margin of separation?

Tx- 0=0

Tx- 0=1

Tx- 0=-1

Page 4: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Separating by a wider margin

• Solutions with a wider margin are better.

Maximize 2

2 , or Minimize

2

2

Page 5: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Separating via misclassification

• In general, data is not linearly separable

• What if we also wanted to minimize misclassified points

• Recall that, each sample xi in our training set has the label yi {-1,1}

• For each point i, yi(Txi-0) should be positive

• Define i >= max {0, 1- yi(Txi-0) }

• If i is correctly classified ( yi(Txi-0) >= 1), and i = 0

• If i is incorrectly classified, or close to the boundaries i > 0

• We must minimize ii

Page 6: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Support Vector machines (wide margin and misclassification)

• Maximimize margin while minimizing misclassification

• Solved using non-linear optimization techniques

• The problem can be reformulated to exclusively using cross products of variables, which allows us to employ the kernel method.

• This gives a lot of power to the method.

min

2

2C i

i

Page 7: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Reformulating the optimization

min

2

2C i

i

i 0

i 1 y i T xi 0

Page 8: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Lagrangian relaxation

L

2

2C i

i i

i i 1 yi

T xi 0 ii

i

• Goal

• S.t.

• We minimize

min

2

2C i

i

i 0

i 1 y i T xi 0

Page 9: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Simplifying

L

2

2C i

i i

i i 1 y i

T x i 0 ii

i

T

2 iy ix i

i

C i i

i i i

i y i0 i

i

• For fixed >= 0, >= 0, we minimize the

lagrangian

L

y i

i ix i 0 (1)

L

0

y ii

i 0 (2)

L

iC i i 0 (3)

Page 10: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Substituting

• Substituting (1)

L

2

2C i

i i

i i 1 y i

T x i 0 ii

i

T

2 iy ix i

i

C i i

i i i

i y i0 i

i

L 1

2 i jyiy jxi

T x ji, j

C i i i

i ii

yi0 ii

Page 11: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

• Substituting (2,3), we have the minimization

problem

L 1

2 i jyiy jxi

T x ji, j

C i i i

i ii

yi0 ii

min 1

2 i jy iy jx i

T x ji, j

ii

s.t.

y i ii

0

0 i C

Page 12: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Classification using SVMs

• Under these conditions, the problem is a

quadratic programming problem and can be

solved using known techniques

• Quiz: When we have solved this QP, how do

we classify a point x?

f (x) T x 0 yii

ixiT x 0

Page 13: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

The kernel method

• The SVM formulation can be solved using QP on dot-products.

• As these are wide-margin classifiers, they provide a more robust solution.

• However, the true power of SVMs approach from using ‘the kernel method’, which allows us to go to higher dimensional (and non-linear spaces)

Page 14: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

kernel

• Let X be the set of objects

– Ex: X =the set of samples in micro-arrays.

– Each object xX is a vector of gene expression

values

• k: X X -> R is a positive semidefinite kernel

if

– k is symmetric.

– k is +ve semidefinite

k(x,x') k(x',x)

cTkc 0 c Rp

Page 15: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Kernels as dot-product

• Quiz: Suppose the objects x are all real vectors (as in gene expression)

• Define

• Is kL a kernel? It is symmetric, but is is +ve semi-definite?

kL x,x' xT x'

Page 16: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Linear kernel is +ve semidefinite

• Recall X as a matrix, such that each column

is a sample

– X=[x1 x2 …]

• By definition, the linear kernel kL=XTX

• For any c

cTkLc cTXTXc Xc

2 0

Page 17: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Generalizing kernels

• Any object can be represented by a feature

vector in real space.

: XRp

k(x,x') (x)T(x')

Page 18: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Generalizing

• Note that the feature mapping could actually

be non-linear.

• On the flip side, Every kernel can be

represented as a dot-product in a high

dimensional space.

• Sometimes the kernel space is easier to

define than the mapping

Page 19: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

The kernel trick

• If an algorithm for vectorial data is expressed

exclusively in the form of dot-products, it can

be changed to an algorithm on an arbitrary

kernel

– Simply replace the dot-product by the kernel

Page 20: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Kernel trick example

• Consider a kernel k defined on a mapping

– k(x,x’) = (x)T (x’)

• It could be that is very difficult to compute

explicitly, but k is easy to compute

• Suppose we define a distance function

between two objects as

• How do we compute this distance?

d(x,x') (x) (x')2

d(x,x') (x) (x')2 (x)T (x)(x')T(x') 2(x)T(x')

k(x,x) k(x',x') 2k(x,x')

Page 21: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Kernels and SVMs

• Recall that SVM based classification is

described as

min 1

2 i jy iy jx i

T x ji, j

ii

s.t.

y i ii

0

0 i C

Page 22: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Kernels and SVMs

• Applying the kernel trick

• We can try kernels that are biologically relevant

min 1

2 i jy iy jk(x i

i, j

,x j ) ii

s.t.

y i ii

0

0 i C

Page 23: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Examples of kernels for vectors

linear kernel kL (x,x') xT x'

poly kernel kp (x,x') xT x'c d

Gaussian RBF kernel kG (x,x') exp x x'

2

2 2

Page 24: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

String kernel

• Consider a string s = s1, s2,…

• Define an index set I as a

subset of indices

• s[I] is the substring limited to

those indices

• l(I) = span

• W(I) = cl(I) c<1

– Weight decreases as span

increases

• For any string u of length k

l(I)

u(s) c l(I )

I :s(I ) u

Page 25: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

String Kernel

• Map every string to a ||n dimensional space, indexed by all strings u of length upto n

• The mapping is expensive, but given two strings s,t,the dot-product kernel k(s,t) = (s)T(t) can be computed in O(n |s| |t|) time

s u

u(s)

Page 26: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

SVM conclusion

• SVM are a generic scheme for classifying data with wide margins and low misclassifications

• For data that is not easily represented as vectors, the kernel trick provides a standard recipe for classification – Define a meaningful kernel, and solve using SVM

• Many standard kernels are available (linear, poly., RBF, string)

Page 27: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Classification review

• We started out by treating the classification problem as one of separating points in high dimensional space

• Obvious for gene expression data, but applicable to any kind of data

• Question of separability, linear separation

• Algorithms for classification – Perceptron

– Lin. Discriminant

– Max Likelihood

– Linear Programming

– SVMs

– Kernel methods & SVM

Page 28: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Classification review

• Recall that we considered 3 problems: – Group together samples in an unsupervised

fashion (clustering)

– Classify based on a training data (often by learning a hyperplane that separates).

– Selection of marker genes that are diagnostic for the class. All other genes can be discarded, leading to lower dimensionality.

Page 29: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Dimensionality reduction

• Many genes have highly correlated

expression profiles.

• By discarding some of the genes, we can

greatly reduce the dimensionality of the

problem.

• There are other, more principled ways to do

such dimensionality reduction.

Page 30: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Why is high dimensionality bad?

• With a high enough dimensionality, all points can be linearly separated.

• Recall that a point xi is misclassified if – it is +ve, but Txi-0<=0

– it is -ve, but Txi+0 > 0

• In the first case choose i s.t. – Txi-0+i >= 0

• By adding a dimension for each misclassified point, we create a higher dimension hyperplane that perfectly separates all of the points!

Page 31: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Principle Components Analysis

• We get the intrinsic dimensionality of a data-

set.

Page 32: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Principle Components Analysis

• Consider the expression values of 2 genes over 6 samples.

• Clearly, the expression of the two genes is highly correlated.

• Projecting all the genes on a single line could explain most of the data.

• This is a generalization of “discarding the gene”.

Page 33: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Projecting

• Consider the mean of all points m, and a vector emanating from the mean

• Algebraically, this projection on means that all samples x can be represented by a single value T(x-m)

m

x

x-m

T =

M

T(x-m)

Page 34: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Higher dimensions

• Consider a set of 2 (k) orthonormal vectors 1, 2…

• Once projected, each sample means that all samples x can be represented by 2 (k) dimensional vector

– 1T(x-m), 2

T(x-m)

1 m

x

x-m

1T

=

M

1T(x-m)

2

Page 35: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

How to project

• The generic scheme allows us to project an m dimensional surface into a k dimensional one.

• How do we select the k ‘best’ dimensions?

• The strategy used by PCA is one that maximizes the variance of the projected points around the mean

Page 36: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

PCA

• Suppose all of the data

were to be reduced by

projecting to a single line

from the mean.

• How do we select the line

? m

Page 37: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

PCA cont’d

• Let each point xk map to x’k=m+ak. We want to minimize the error

• Observation 1: Each point xk maps to x’k = m + T(xk-m) – (ak= T(xk-m))

xk x'k2

k

m

xk

x’k

Page 38: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Proof of Observation 1

min ak xk x'k2

min ak xk m m x'k2

min ak xk m2 m x'k

2 2(x'k m)T (xk m)

min ak xk m2 ak

2T 2akT (xk m)

min ak xk m2 ak

2 2akT (xk m)

2ak 2T (xk m) 0

ak T (xk m)

ak2 ak

T (xk m)

xk x'k2 xk m

2T (xk m)(xk m)T

Differentiating w.r.t ak

Page 39: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Minimizing PCA Error

• To minimize error, we must maximize TS

• By definition, = TS implies that is an eigenvalue, and the corresponding eigenvector.

• Therefore, we must choose the eigenvector corresponding to the largest eigenvalue.

xk x'kk

2

C T

k

(xk m)(xk m)T C TS

Page 40: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

PCA steps

• X = starting matrix with n columns, m rows

xj X

1. m 1

nx j

j1

n

2. hT 11 1

3. M X mhT

4. S MMT x j m j1

n

x j m T

5. BTSB

6. Return BTM

Page 41: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

End of Lecture

Sp’10 Bafna/Ideker

Page 42: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Page 43: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

ALL-AML classification

• The two leukemias need different different

therapeutic regimen.

• Usually distinguished through hematopathology

• Can Gene expression be used for a more definitive

test?

– 38 bonemarrow samples

– Total mRNA was hybridized against probes for 6817 genes

– Q: Are these classes separable

Page 44: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Neighborhood analysis (cont’d)

• Each gene is represented by an expression vector v(g) = (e1,e2,…,en)

• Choose an idealized expression vector as center.

• Discriminating genes will be ‘closer’ to the center (any distance measure can be used).

Discriminating gene

Page 45: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Neighborhood analysis

• Q: Are there genes, whose expression correlates with

one of the two classes

• A: For each class, create an idealized vector c

– Compute the number of genes Nc whose expression

‘matches’ the idealized expression vector

– Is Nc significantly larger than Nc* for a random c*?

Page 46: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Neighborhood test

• Distance measure used: – For any binary vector c, let the one entries denote class 1, and the 0

entries denote class 2

– Compute mean and std. dev. [1(g),1(g)] of expression in class 1 and also [2(g),2(g)].

– P(g,c) = [1(g)-2(g)]/ [1(g)+2(g)]

– N1(c,r) = {g | P(g,c) == r}

– High density for some r is indicative of correlation with class distinction

– Neighborhood is significant if a random center does not produce the same density.

Page 47: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Neighborhood analysis

• #{g |P(g,c) > 0.3} > 709 (ALL) vs 173 by chance.

• Class prediction should be possible using micro-array expression values.

Page 48: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Class prediction

• Choose a fixed set of informative genes (based on their correlation with the class distinction).

– The predictor is uniquely defined by the sample and the subset of informative genes.

• For each informative gene g, define (wg,bg).

– wg=P(g,c) (When is this +ve?)

– bg = [1(g)+2(g)]/2

• Given a new sample X

– xg is the normalized expression value at g

– Vote of gene g =wg|xg-bg| (+ve value is a vote for class 1, and negative for class 2)

Page 49: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Prediction Strength

• PS = [Vwin-Vlose]/[Vwin+Vlose] – Reflects the margin of victory

• A 50 gene predictor is correct 36/38 (cross-validation)

• Prediction accuracy on other samples 100% (prediction made for 29/34 samples.

• Median PS = 0.73

• Other predictors between 10 and 200 genes all worked well.

Page 50: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Performance

Page 51: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Differentially expressed genes?

• Do the predictive genes reveal any biology?

• Initial expectation is that most genes would be of a hematopoetic lineage.

• However, many genes encode – Cell cycle progression genes

– Chromatin remodelling

– Transcription

– Known oncogenes

– Leukemia targets (etopside)

Page 52: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Relationship between ML, and Golub predictor

• ML when the covariance matrix is a diagonal matrix with

identical variance for different classes is similar to Golub’s

classifier

p(x | i) 1

2 d

2 1

2

exp 1

2x m

T1

x m

gi(x) ln p(x | i) lnP( i)

Compute argmax i gi(x)

gi(x) ln p(x | i) lnP( i)

gi(x) x j kj

2

j

2

j1

p

g1(x) g2(x) 1 j 2 j

j

2x j

1 j 2 j 2

j

Page 53: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Automatic class discovery

• The classification of different cancers is over

years of hypothesis driven research.

• Suppose you were given unlabeled samples

of ALL/AML. Would you be able to distinguish

the two classes?

Page 54: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Self Organizing Maps

• SOMs was applied to group the 38 samples

• Class A1 contained 24/25 ALL and 3/13 AML

samples.

• How can we validate this?

• Use the labels to do supervised classification

via cross-validation

• A 20 gene predictor gave 34 accurate

predictions, 1 error, and 2 of 3 uncertains

Page 55: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Comparing various error models

Page 56: Classification (SVMs / Kernel method)cseweb.ucsd.edu/classes/sp12/cse283-a/lecturenotes/... · 2012. 5. 4. · Sp’10 Bafna/Ideker SVM conclusion • SVM are a generic scheme for

Sp’10 Bafna/Ideker

Conclusion


Recommended