+ All Categories
Home > Documents > Get Rich and Cure Cancer with Support Vector Machines

Get Rich and Cure Cancer with Support Vector Machines

Date post: 24-Jan-2016
Category:
Upload: afi
View: 53 times
Download: 0 times
Share this document with a friend
Description:
Get Rich and Cure Cancer with Support Vector Machines. (Your Summer Projects). Kernel Trick. https://www.youtube.com/watch?v= 3liCbRZPrZA. This is achieved with a polynomial kernel. Feature map: Kernel:. Optimization of transformed problem: Only kernel matters. - PowerPoint PPT Presentation
Popular Tags:
20
+ Get Rich and Cure Cancer with Support Vector Machines (Your Summer Projects)
Transcript
Page 1: Get Rich and Cure Cancer with Support Vector Machines

+

Get Rich and Cure Cancer

with Support Vector Machines

(Your Summer Projects)

Page 2: Get Rich and Cure Cancer with Support Vector Machines

+Kernel Trick

https://www.youtube.com/watch?v=3liCbRZPrZA

Page 3: Get Rich and Cure Cancer with Support Vector Machines

+This is achieved with a polynomial kernel

Feature map:

Kernel:

Page 4: Get Rich and Cure Cancer with Support Vector Machines

+Optimization of transformed problem: Only kernel matters Dual Lagrangian for transformed problem:

Optimal weight vector:

Thus, optimal hyperplane:

Page 5: Get Rich and Cure Cancer with Support Vector Machines

+Kernel Trick We can choose the kernel without first defining a

feature map.

How to get a feature map from a kernel?

Define

i.e. map vectors in the original feature space to functions.

Inner product on transformed space:

Page 6: Get Rich and Cure Cancer with Support Vector Machines

+Get rich off of support vectors

Page 7: Get Rich and Cure Cancer with Support Vector Machines

+Making 5-day forecasts of financial futures

Given data on the returns for 5 days

Predict the return on the next day

To achieve this, we need to figure out which 5-day stretches tend to predict good returns on the 6th day, and which predict not-so-good returns

A training data set is used for this purpose

Page 8: Get Rich and Cure Cancer with Support Vector Machines

+Making 5-day forecasts of financial futures

Day 1 Day 2 Day 3 Day 4 Day 5

x11 x12 x13 x14 x15

x21 x22 x23 x24 x25

x31 x32 x33 x34 x35

x41 x42 x43 x44 x45

… … … … …

Day 6

y1

y2

y3

y4

y5

5-dimensional feature space Return on 6th day is classifier for data

Routine learns how to classify 5-day-return data points by working with a training data set for 500 days. Constructs a dividing hypersurface and uses it to decide what the 6th-day return should be for new data points.

Page 9: Get Rich and Cure Cancer with Support Vector Machines

+Good results – you can try it yourself!

Complete with R code: http://www.r-bloggers.com/trading-with-support-vector-machines-svm/

Page 10: Get Rich and Cure Cancer with Support Vector Machines

+Another example: gene expression in normal and cancerous tissue

Gene = unit of heredity

Human genome contains about 21,000 genes

Public domain image from Wikipedia

Page 11: Get Rich and Cure Cancer with Support Vector Machines

+Another example: gene expression in normal and cancerous tissue

DNA transcribes to RNA which translates to proteins

This is the process whereby the “genetic code” is made manifest as biological characteristics (genotype gives rise to phenotype)

Wikimedia Commons image by Madeleine Price Ball

Page 12: Get Rich and Cure Cancer with Support Vector Machines

+Big question: Which genes are responsible for which outcomes?

In various tissues (e.g. tumor versus normal), which genes are active, hyperactive, and silent?

Can use DNA microarrays to measure gene expression levels.

Page 13: Get Rich and Cure Cancer with Support Vector Machines

+DNA Microarray

https://www.youtube.com/watch?v=_6ZMEZK-alM

Source: National Human Genome Research Institute

Page 14: Get Rich and Cure Cancer with Support Vector Machines

+Using support vector machines to determine which genes are important for cancer classification

Page 15: Get Rich and Cure Cancer with Support Vector Machines

+Data

Data points: Patients

Features: Gene expression coefficients (activity level of a given gene)

Feature space will have a huge number of dimensions! Need a way to reduce.

Could examine all possible subspaces of feature space, but note that if dimension (N) of feature space represents thousands of genes, will mean that number of n-dimensional subspaces is

Too large for practical examination of each subspace

Page 16: Get Rich and Cure Cancer with Support Vector Machines

+Generate ranking of features

A ranking of features allows us to make a nested sequence of subspaces of feature space F

and then determine the optimum subspace to work with

One possibility for ranking: Work with each gene individually, get its correlation coefficient with the classifier (i.e. find correlation of gene expression level with classification of tissue into tumor v. normal or into two different types of cancer

Note: ranking by correlation coefficient assumes all the features are independent of one another.

Page 17: Get Rich and Cure Cancer with Support Vector Machines

+Generate ranking of features

Another possible way to generate a ranking of features: sensitivity analysis.

Have training data set, already classified into two classes (cancerous v. non, or cancer type 1 v. cancer type 2)

Construct a cost function to estimate error in classification

Sensitivity of cost function to removal of a feature measures the importance of that feature and allows the construction of a ranking.

Page 18: Get Rich and Cure Cancer with Support Vector Machines

+Ranking by Support Vector Machines Recursive Feature Elimination

Idea of how to use SVM to identify important features: Consider a cartoon scenario.

x1

x2

Indicates that the x1 direction is completely superfluous for classification.

Page 19: Get Rich and Cure Cancer with Support Vector Machines

+Ranking by Support Vector Machines

This suggests the following recursive algorithm for ranking features:

Find weight vector, using all features

Identify the least important feature to be the one with the smallest (in absolute value) component of the weight vector

List that feature as least important and eliminate itfrom the data

Iterate the procedure, with the least important feature thrown out.

End result: Ranked list of features!

Page 20: Get Rich and Cure Cancer with Support Vector Machines

+Try this at home!

Data is available online!

http://www.broadinstitute.org/software/cprg/?q=node/55

Classify two types of leukemia.


Recommended