+ All Categories
Home > Documents > Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate:...

Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate:...

Date post: 05-Jan-2016
Category:
Upload: magdalen-king
View: 219 times
Download: 2 times
Share this document with a friend
Popular Tags:
23
Mathematical Mathematical Programming in Data Programming in Data Mining Mining Author: O. L. Mangasaria Author: O. L. Mangasaria n n Advisor: Dr. Hsu Advisor: Dr. Hsu Graduate: Yan-Cheng Lin Graduate: Yan-Cheng Lin
Transcript
Page 1: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Mathematical Mathematical Programming in Data Programming in Data

MiningMining

Author: O. L. MangasarianAuthor: O. L. MangasarianAdvisor: Dr. HsuAdvisor: Dr. HsuGraduate: Yan-Cheng LinGraduate: Yan-Cheng Lin

Page 2: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

AbstractAbstract

Describe mathematical programming Describe mathematical programming to feature selection, clustering and to feature selection, clustering and robust representationrobust representation

Page 3: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

OutlineOutline

MotivationMotivation Objective Objective ProblemsProblems Feature SelectionFeature Selection ClusteringClustering Robust RepresentationRobust Representation ConclusionConclusion

Page 4: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

MotivationMotivation

Mathematical programming has been Mathematical programming has been applied to a great variety of applied to a great variety of theoreticaltheoretical

Problems can be formulated and Problems can be formulated and effectively solved as mathematical effectively solved as mathematical programsprograms

Page 5: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Objective Objective

Describe three mathematical-Describe three mathematical-programming-based developments programming-based developments relevant to data miningrelevant to data mining

Page 6: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

ProblemsProblems

Feature SelectionFeature Selection ClusteringClustering Robust RepresentationRobust Representation

Page 7: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Problem - Feature SelectionProblem - Feature Selection

Discriminating between two finite Discriminating between two finite point sets in n-dimensional feature point sets in n-dimensional feature space and utilizes as few of the space and utilizes as few of the feature as possiblefeature as possible

Formulated as mathematical Formulated as mathematical program with a parametric objective program with a parametric objective function and linear constraintsfunction and linear constraints

Page 8: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Problem - ClusteringProblem - Clustering

Assigning m points in the n-dimensional Assigning m points in the n-dimensional real space Rreal space Rnn to k clusters to k clusters

Formulated as determining k centers in Formulated as determining k centers in RRnn, the sum of distances of each point to , the sum of distances of each point to the nearest center is minimizedthe nearest center is minimized

Page 9: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Problem - Robust RepresentationProblem - Robust Representation

Modeling a system of relations in a manModeling a system of relations in a manner that preserves the validity of the repner that preserves the validity of the representation when the data on which the resentation when the data on which the model is based changesmodel is based changes

Use a sufficiently small error Use a sufficiently small error ּזּז is purposeis purposely toleratedly tolerated

Page 10: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Feature SelectionFeature Selection

Use the simplest model to describe the eUse the simplest model to describe the essence of a phenomenonssence of a phenomenon

Binary classification problem: Binary classification problem: – discriminating between two given point sets discriminating between two given point sets

A and B in the n-dimensional real space RA and B in the n-dimensional real space Rnn b by using as few of the n-dimensions of the spy using as few of the n-dimensions of the space as possible ace as possible

Page 11: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Binary classificationBinary classification

W

P

Page 12: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

the following are some defined:the following are some defined:

AA

BB

Feature SelectionFeature Selection

Page 13: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Successive Linearization AlgorithmSuccessive Linearization Algorithm

w vector is resultw vector is result

Page 14: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

ExperimentationExperimentation

32-feature Wisconsin 32-feature Wisconsin Prognostic Breast CaPrognostic Breast Cancer(WPBC)ncer(WPBC)

N=32, m = 28, k = 118,N=32, m = 28, k = 118, r r = 0.05, 4 features, i = 0.05, 4 features, increasing tenfold croncreasing tenfold cross-validation correctss-validation correctness by 35.4%ness by 35.4%

Page 15: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

ClusteringClustering

Determining k cluster centers, the Determining k cluster centers, the sum of the 1-norm distances of each sum of the 1-norm distances of each point in a given database to nearest point in a given database to nearest cluster center is minimizedcluster center is minimized

Minimizing product of two linear Minimizing product of two linear functions on a set defined by linear functions on a set defined by linear inequalitiesinequalities

Page 16: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

K-Median AlgorithmK-Median Algorithm

Need to solveNeed to solve

Page 17: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

ExperimentationExperimentation

used as a KDD tool to mine WPBC to used as a KDD tool to mine WPBC to discover medical knowledgediscover medical knowledge

key observation is curves are well key observation is curves are well separatedseparated

Page 18: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

ExperimentationExperimentation

Page 19: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Robust RepresentationRobust Representation

model remains valid under a class of datmodel remains valid under a class of data perturbationa perturbation

Use Use ּזּז-tolerance zone wherein errors are -tolerance zone wherein errors are disregardeddisregarded

Better generalization results than conveBetter generalization results than conventional zero-tolerancentional zero-tolerance

Page 20: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Robust RepresentationRobust Representation

A is a m*n matrix, a is a m*1 vectorA is a m*n matrix, a is a m*1 vector x is a vector be “learned”x is a vector be “learned” find minimize of Ax - afind minimize of Ax - a

Page 21: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Robust RepresentationRobust Representation

=

xA atolerate-ּזּז

=

xA a

Page 22: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

ConclusionConclusion

Mathematical programming codes Mathematical programming codes are reliable and robust codesare reliable and robust codes

Problems solved demonstrate Problems solved demonstrate mathematical programming as mathematical programming as versatile and effective tool for versatile and effective tool for solving important problems in data solving important problems in data mining and knowledge discovery in mining and knowledge discovery in databasesdatabases

Page 23: Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

OpinionOpinion

Mathematical describe can explain Mathematical describe can explain about complex problems and about complex problems and convince others, but …you must be convince others, but …you must be understand it firstunderstand it first


Recommended