Data Mining - University of...

Post on 31-Jan-2018

222 views 0 download

transcript

Data Mining

Dr. Saed SayadUniversity of Toronto

2010

saed.sayad@utoronto.ca

1http://chem-eng.utoronto.ca/~datamining/

Data Mining

http://chem-eng.utoronto.ca/~datamining/ 2

Data mining is about explaining the past and predicting the

future by means of data analysis.

http://chem-eng.utoronto.ca/~datamining/ 3

AI &Machine Learning

Statistics

Data Mining

Database & DW

Data Mining

http://chem-eng.utoronto.ca/~datamining/ 4

0 10 20 30 40 50 60

Gambling

Entertainment/ Music

Investment / Stocks

Junk email / Anti-spam

Security / Anti-terrorism

Travel/Hospitality

Web

Biotech/Genomics

e-Commerce

Other

Government applications

Medical/ Pharma

Health care/ HR

Science

Manufacturing

Telecom

Insurance

Retail

Fraud Detection

Direct Marketing/ Fundraising

Credit Scoring

Banking

CRM

Data Mining Applications

Source: KDnuggets.com

http://chem-eng.utoronto.ca/~datamining/ 5

much higher20%

somewhat higher

30%

about the same41%

somewhat lower4%

much lower5%

Data mining activity in 2007 compare to 2006

Source: KDnuggets.com

Data Mining Steps

1 • Problem Definition

2 • Data Preparation

3 • Data Exploration

4 • Modeling

5 • Evaluation

6 • Deployment

http://chem-eng.utoronto.ca/~datamining/ 6

CRISP-DM Process ModelCRoss-Industry Standard Process for Data Mining

http://chem-eng.utoronto.ca/~datamining/ 7

Source: http://www.crisp-dm.org/Process/index.htm

1. Problem Definition

http://chem-eng.utoronto.ca/~datamining/ 8

Understanding the project objectives and requirements from a business perspective and then converting this knowledge into a data mining problem definition with a preliminary plan designed to achieve the objectives.

Source: http://www.crisp-dm.org/Process/index.htm

2. Data Preparation

Modeling Data

DataText

Data DSN

ETL

http://chem-eng.utoronto.ca/~datamining/ 9

3. Data Exploration

Data Exploration

UnivariateAnalysis

Average, StDev, Min, Max, ...

Bar, Line, Pie, ...

Charts

Bivariate Analysis

Correlation

Z test, ...

Combination Charts

http://chem-eng.utoronto.ca/~datamining/ 10

Data Exploration - Univariate

http://chem-eng.utoronto.ca/~datamining/ 11

Data Exploration - Bivariate

http://chem-eng.utoronto.ca/~datamining/ 12

4. Modeling

Classification

Bayesian

Decision Tree

Logistic Regression

SVM

Regression

Linear Regression

Robust Regression

Neural Network

Clustering

Hierarchical

K-Means

Association

A Priori

http://chem-eng.utoronto.ca/~datamining/ 13

Data Mining: Classification & Regression

http://chem-eng.utoronto.ca/~datamining/ 14

Frequency

Table

OneR

Bayesian

Decision Tree

Markov Chains

HMM

Covariance

Matrix

Linear

Regression

LDA

(Z Score)

PCA/PCR

Logistic

Regression

Robust Regression

Similarity

Functions

KNN

Neural

Networks

Perceptron

Back

Propagation

RBF

Others

SVM

GA

Scalable Methods

Modeling - Classification

http://chem-eng.utoronto.ca/~datamining/ 15

fAge Responder

e.g., Y or N

Modeling - Regression

http://chem-eng.utoronto.ca/~datamining/ 16

fAge AmountPurchased

e.g., $350

Modeling - Clustering

http://chem-eng.utoronto.ca/~datamining/ 17

Age

Income

Association Rules

http://chem-eng.utoronto.ca/~datamining/ 18

Market Basket Analysis

5. Evaluation

Charts Stats

Variables Contribution

Mean Square Error

Confusion Matrix

K-S Chart

Lift Chart

Gain Chart

http://chem-eng.utoronto.ca/~datamining/ 19

Evaluation - Confusion Matrix

http://chem-eng.utoronto.ca/~datamining/ 20

True

Positive

False

Positive

False

Negative

True

Negative

CM

Positive Cases Negative Cases

Pre

dic

ted

Po

siti

veP

red

icte

d

Neg

ativ

e

Evaluation – Gain Chart

http://chem-eng.utoronto.ca/~datamining/ 21

Population%

50%10%

100%

100%

45%

10%

Responder%

6. Deployment

SQL VB

JAVA HTML

http://chem-eng.utoronto.ca/~datamining/ 22

Data Mining Team

Modeler

AnalystDBA

http://chem-eng.utoronto.ca/~datamining/ 23

DomainExpert

Data Mining Software Vendors

http://chem-eng.utoronto.ca/~datamining/ 24

Data Mining

SAS

KXEN

KNIMEAngoss

SPSS

Case Study...

http://chem-eng.utoronto.ca/~datamining/ 25