+ All Categories
Home > Documents > Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives...

Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives...

Date post: 23-Dec-2015
Category:
Upload: robert-stephens
View: 220 times
Download: 0 times
Share this document with a friend
35
Chapter 5 Data mining : A Data mining : A Closer Look Closer Look
Transcript
Page 1: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 5Chapter 5

Data mining : A Data mining : A Closer LookCloser LookData mining : A Data mining : A Closer LookCloser Look

Page 2: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 52Data Warehouse and Data Mining

Chapter ObjectivesChapter ObjectivesChapter ObjectivesChapter Objectives

Determine an appropriate data mining Determine an appropriate data mining strategy for a specific problem.strategy for a specific problem.

Know about several data mining techniques Know about several data mining techniques and how each technique builds a generalized and how each technique builds a generalized model to represent data.model to represent data.

Understand how a confusion matrix is used Understand how a confusion matrix is used to help evaluate supervised learner models.to help evaluate supervised learner models.

Page 3: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 53Data Warehouse and Data Mining

Understand basic techniques for evaluating Understand basic techniques for evaluating supervised learner models with numeric supervised learner models with numeric output.output.Know how measuring lift can be used to Know how measuring lift can be used to compare the performance of several compare the performance of several competing supervised learner models.competing supervised learner models.Understand basic techniques for evaluating Understand basic techniques for evaluating unsupervised learner models.unsupervised learner models.

Chapter ObjectivesChapter ObjectivesChapter ObjectivesChapter Objectives

Page 4: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 54Data Warehouse and Data Mining

Data Mining StrategiesData Mining StrategiesData Mining StrategiesData Mining Strategies

ClassificationClassification is probably the best understood of is probably the best understood of all data mining strategies. all data mining strategies.

Classification tasks have three common Classification tasks have three common characteristics.characteristics.

• Learning is supervised.Learning is supervised.

• The dependent variable is categorical.The dependent variable is categorical.

• The emphasis is on The emphasis is on building modelsbuilding models able to able to assign new instances to one of a set of well-assign new instances to one of a set of well-defined classes.defined classes.

Page 5: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 55Data Warehouse and Data Mining

Data Mining StrategiesData Mining StrategiesData Mining StrategiesData Mining Strategies

• Some example classification tasks include the Some example classification tasks include the following:following:

•Determine those characteristics that differentiate individuals Determine those characteristics that differentiate individuals who have suffered a heart attack from those who have not.who have suffered a heart attack from those who have not.

• Develop a profile of a “successful” person.Develop a profile of a “successful” person.

• Determine if a credit card purchase is fraudulent.Determine if a credit card purchase is fraudulent.

• Classify a car loan applicant as a good or a poor credit risk.Classify a car loan applicant as a good or a poor credit risk.

• Develop a profile to differentiate female and male stroke Develop a profile to differentiate female and male stroke victims.victims.

Page 6: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 56Data Warehouse and Data Mining

Data Mining StrategiesData Mining StrategiesData Mining StrategiesData Mining Strategies

Page 7: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 57Data Warehouse and Data Mining

Data Mining StrategiesData Mining StrategiesData Mining StrategiesData Mining Strategies

Page 8: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 58Data Warehouse and Data Mining

Data Mining StrategiesData Mining StrategiesData Mining StrategiesData Mining Strategies

Page 9: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 59Data Warehouse and Data Mining

Data Mining StrategiesData Mining StrategiesData Mining StrategiesData Mining Strategies

Page 10: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 510Data Warehouse and Data Mining

Data Mining StrategiesData Mining StrategiesData Mining StrategiesData Mining Strategies

34% are healthy within these max heart rate range

Page 11: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 511Data Warehouse and Data Mining

Supervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining Techniques

Page 12: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 512Data Warehouse and Data Mining

Supervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining Techniques

Page 13: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 513Data Warehouse and Data Mining

Supervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining Techniques

Page 14: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 514Data Warehouse and Data Mining

Supervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining Techniques

Page 15: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 515Data Warehouse and Data Mining

Supervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining TechniquesSupervised Data Mining Techniques

Page 16: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 516Data Warehouse and Data Mining

Association RulesAssociation RulesAssociation RulesAssociation Rules

Page 17: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 517Data Warehouse and Data Mining

Clustering TechniquesClustering TechniquesClustering TechniquesClustering Techniques

Page 18: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 518Data Warehouse and Data Mining

Clustering TechniquesClustering TechniquesClustering TechniquesClustering Techniques

Page 19: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 519Data Warehouse and Data Mining

Evaluating PerformanceEvaluating PerformanceEvaluating PerformanceEvaluating Performance

Page 20: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 520Data Warehouse and Data Mining

Evaluating PerformanceEvaluating PerformanceEvaluating PerformanceEvaluating Performance

Page 21: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 521Data Warehouse and Data Mining

Evaluating PerformanceEvaluating PerformanceEvaluating PerformanceEvaluating Performance

Page 22: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 522Data Warehouse and Data Mining

Evaluating PerformanceEvaluating PerformanceEvaluating PerformanceEvaluating Performance

Page 23: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 523Data Warehouse and Data Mining

Evaluating PerformanceEvaluating PerformanceEvaluating PerformanceEvaluating Performance

Page 24: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 524Data Warehouse and Data Mining

Chapter SummaryChapter SummaryChapter SummaryChapter SummaryData mining strategies include Data mining strategies include classification, classification,

estimation, estimation, predictionprediction, , unsupervised clusteringunsupervised clustering, , and and market basket analysis. market basket analysis.

Classification and estimation strategies are similar in Classification and estimation strategies are similar in that each strategy is employed to build models able to that each strategy is employed to build models able to

generalize current outcome. generalize current outcome.

However, the output of a classification strategy is However, the output of a classification strategy is categorical, whereas categorical, whereas the output of an estimation strategy is the output of an estimation strategy is

numeric.numeric.

Page 25: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 525Data Warehouse and Data Mining

Chapter SummaryChapter SummaryChapter SummaryChapter SummaryA predictive strategyA predictive strategy differs from a classification or differs from a classification or

estimation strategy in that it is used to design models for estimation strategy in that it is used to design models for predicting future outcome rather than current behavior.predicting future outcome rather than current behavior.

Unsupervised clusteringUnsupervised clustering strategies are employed to strategies are employed to discover hidden concept structures in data as well as to locate discover hidden concept structures in data as well as to locate

atypical data instances. atypical data instances.

The purpose of The purpose of market basket analysismarket basket analysis is to find is to find interesting relationships among retail products.interesting relationships among retail products.

Discovered relationships can be used to design Discovered relationships can be used to design promotions, arrange shelf or catalog items, or develop cross-promotions, arrange shelf or catalog items, or develop cross-

marketing strategies.marketing strategies.

Page 26: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 526Data Warehouse and Data Mining

A data mining technique applies a data A data mining technique applies a data mining strategy to a set of data. mining strategy to a set of data.

Data mining techniques are defined by Data mining techniques are defined by an an algorithm and a knowledge structure.algorithm and a knowledge structure.

Common features that distinguish the various Common features that distinguish the various techniques are whether learning is techniques are whether learning is supervised supervised or unsupervised or unsupervised and whether theirand whether their output is output is

categorical or numeric. categorical or numeric.

Chapter SummaryChapter SummaryChapter SummaryChapter Summary

Page 27: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 527Data Warehouse and Data Mining

Familiar Familiar supervised data miningsupervised data mining techniques include decision techniques include decision tree methods, production rule generators, neural networks, and tree methods, production rule generators, neural networks, and

statistical methods. statistical methods.

Association rules are a favorite technique for marketing Association rules are a favorite technique for marketing applications. applications.

Clustering techniques employ some Clustering techniques employ some measure of similarity to measure of similarity to group instancesgroup instances into disjoint partitions. into disjoint partitions.

Clustering methods are frequently used to help determine a Clustering methods are frequently used to help determine a best set of input attributes for building supervised learner best set of input attributes for building supervised learner

models. models.

Chapter SummaryChapter SummaryChapter SummaryChapter Summary

Page 28: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 528Data Warehouse and Data Mining

Chapter SummaryChapter SummaryChapter SummaryChapter Summary

Performance evaluationPerformance evaluation is probably the most is probably the most critical of all the steps in the data mining critical of all the steps in the data mining

process. process.

Supervised model evaluation is often Supervised model evaluation is often performed using a performed using a training/test set scenariotraining/test set scenario. .

Supervised models with numeric output can Supervised models with numeric output can be evaluated by be evaluated by computing average absolute or computing average absolute or

average squared error differences between average squared error differences between computed and desired outcome.computed and desired outcome.

Page 29: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 529Data Warehouse and Data Mining

Chapter SummaryChapter SummaryChapter SummaryChapter Summary

Marketing applications that focus on mass mailings are Marketing applications that focus on mass mailings are interested in developing models for increasing response rates to interested in developing models for increasing response rates to

promotions.promotions.

A marketing application measures the goodness of a model by A marketing application measures the goodness of a model by its ability to lift response rate thresholds to levels well above its ability to lift response rate thresholds to levels well above

those achieved by nathose achieved by naïve (mass) mailing strategies. ïve (mass) mailing strategies.

Unsupervised models support some measure of Unsupervised models support some measure of cluster qualitycluster quality that can be used for evaluative purposes. that can be used for evaluative purposes.

Supervised learning can also be employed to Supervised learning can also be employed to evaluate the evaluate the quality of the clusters formedquality of the clusters formed by an unsupervised model. by an unsupervised model.

Page 30: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 530Data Warehouse and Data Mining

Key TermsKey TermsKey TermsKey Terms

Classification. Classification. A supervised learning strategy where the A supervised learning strategy where the output attribute is categorical. The emphasis is on output attribute is categorical. The emphasis is on building models able to assign new instances to one of building models able to assign new instances to one of a set of well-defined classes.a set of well-defined classes.

Association rule.Association rule. A production rule whose consequent A production rule whose consequent may contain multiple conditions and attribute may contain multiple conditions and attribute relationships. An output attribute in one association rule relationships. An output attribute in one association rule can be an input attribute in other rule.can be an input attribute in other rule.

Confusion matrix.Confusion matrix. A matrix used to summarize the A matrix used to summarize the results of a supervised classification. Entries along the results of a supervised classification. Entries along the main diagonal represent the total number of correct main diagonal represent the total number of correct classifications. Entries other than those on the main classifications. Entries other than those on the main diagonal represent classification errors.diagonal represent classification errors.

Page 31: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 531Data Warehouse and Data Mining

Key TermsKey TermsKey TermsKey Terms

DataData mining strategy.mining strategy. An outline of an approach for An outline of an approach for problem solution.problem solution.

Data mining technique.Data mining technique. One or more algorithms together One or more algorithms together with an associated knowledge structure.with an associated knowledge structure.

Dependent variable.Dependent variable. A variable whose value is A variable whose value is determined by a combination of one or more determined by a combination of one or more independent variables.independent variables.

Estimation.Estimation. A supervised learning strategy where the A supervised learning strategy where the output attribute is numeric. Emphasis is on determining output attribute is numeric. Emphasis is on determining current rather than future outcome.current rather than future outcome.

Page 32: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 532Data Warehouse and Data Mining

Key TermsKey Terms

Independent variable.Independent variable. An input attribute used for building An input attribute used for building supervised or unsupervised learner models.supervised or unsupervised learner models.

Lift.Lift. The probability of class The probability of class CCii given a sample taken given a sample taken

from population from population PP divided by the probability of divided by the probability of CCii

given the entire population given the entire population PP..

Lift chart.Lift chart. A graph that displays the performance of a data A graph that displays the performance of a data mining model as a function of sample size.mining model as a function of sample size.

Linear regression.Linear regression. A supervised learning technique that A supervised learning technique that generalizes numeric data as a linear equation. The generalizes numeric data as a linear equation. The equation defines the value of an output attribute as a equation defines the value of an output attribute as a linear sum of weighted input attribute values.linear sum of weighted input attribute values.

Page 33: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 533Data Warehouse and Data Mining

Key TermsKey TermsKey TermsKey TermsMarket basket analysis.Market basket analysis. A data mining strategy that A data mining strategy that

attempts to find interesting relationships among retail attempts to find interesting relationships among retail products.products.

Mean absolute error.Mean absolute error. For a set of training or test set For a set of training or test set instances, the mean absolute error is the average instances, the mean absolute error is the average absolute difference between classifier predicted output absolute difference between classifier predicted output and actual output.and actual output.

Mean squared error.Mean squared error. For a set of training or test set For a set of training or test set instances, the mean squared error is the average of the instances, the mean squared error is the average of the sum of squared differences between classifier predicted sum of squared differences between classifier predicted output and actual output.output and actual output.

Neural network.Neural network. A set of interconnected nodes designed A set of interconnected nodes designed to imitate the functioning of the human brain.to imitate the functioning of the human brain.

Page 34: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 534Data Warehouse and Data Mining

Key TermsKey TermsKey TermsKey Terms

Outliers.Outliers. Atypical data instances. Atypical data instances.

Prediction.Prediction. A supervised learning strategy designed to A supervised learning strategy designed to determine future outcome.determine future outcome.

Root mean squared error.Root mean squared error. The square root of the mean The square root of the mean squared error.squared error.

Rule Maker.Rule Maker. A supervised learner model for generating A supervised learner model for generating production rules from data.production rules from data.

Statistical regression.Statistical regression. A supervised learning technique A supervised learning technique that generalizes numerical data as a mathematical that generalizes numerical data as a mathematical equation. The equation defines the value of an output equation. The equation defines the value of an output attribute as a sum of weighted input attribute values.attribute as a sum of weighted input attribute values.

Page 35: Chapter 5 Data mining : A Closer Look. Data Warehouse and Data Mining Chapter 5 2 Chapter Objectives  Determine an appropriate data mining strategy for.

Chapter 535Data Warehouse and Data Mining


Recommended