Home >Documents >Data Mining Concepts

Data Mining Concepts

Date post:24-Feb-2016
Category:
View:34 times
Download:0 times
Share this document with a friend
Description:
Microsoft Enterprise Consortium. Data Mining Concepts. Introduction to Directed Data Mining: Decision Trees. Decision Trees. - PowerPoint PPT Presentation
Transcript:

Database Concepts

Data Mining ConceptsIntroduction to Directed Data Mining: Decision TreesPrepared by David Douglas, University of ArkansasHosted by the University of Arkansas1Microsoft Enterprise ConsortiumMicrosoft Enterprise ConsortiumMicrosoft Enterprise ConsortiumMicrosoft Enterprise ConsortiumJ Kreie, New Mexico State University1Decision Trees2A decision tree is a structure that can be used to divide a large collection of records into successively smaller sets of records by applying a sequence of simple decisions rules.Berry and Linoff.

It consists of a set of rules for dividing a large heterogeneous population into smaller and smaller homogeneous groups based on a target variable.

A decision tree is a tree-structured plan of a set of attributes to test in order to predict the output. Andrew Moore.

Target variable is usually categorical.Prepared by David Douglas, University of ArkansasHosted by the University of ArkansasMicrosoft Enterprise ConsortiumUses of Decision Trees3Decision trees are popular for both classification and prediction (Supervised/Directed).

Attractive largely due to the fact that decision trees represent rulesexpressed in both English and SQL.

Can also be used for data explorationthus a powerful first step in model building.Prepared by David Douglas, University of ArkansasHosted by the University of ArkansasMicrosoft Enterprise ConsortiumExample Decision Tree

Note this is a binary treelikely to respond or not. Leaf nodes with 1 are likely to respond. There are rules for getting from the root node to a leaf node.

Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas4Adapted from Berry and LinoffMicrosoft Enterprise Consortium4ScoringBinary classifications throw away useful information.

Thus, use of scores and probabilities is essential.Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas5Microsoft Enterprise Consortium5Decision Tree with Proportions

Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas6Adapted from Berry and LinoffMicrosoft Enterprise Consortium6Some DM tools produce trees with more than 2 splits

Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas7Adapted from Berry and LinoffMicrosoft Enterprise Consortium7EstimationAlthough decision trees can be used to estimate continuous values, there are better ways to do it. So, there are currently no plans to use decision trees for estimation in our discussions.

Multiple Linear Regression and Neural Networks will be used for estimation. Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas8Microsoft Enterprise Consortium8Finding the SplitsA decision tree is built by splitting records at each node based on a single input fieldthus there has to be a way to identify the input field that makes the best split in terms of the target variable.

Measure to evaluate the split is purity (Gini, Entropy, Information Gain, Chi-square for categorical target variables and variance reduction and F test for continuous target variables)

Tree building algorithms are exhaustivetry each variable to determine best one on which to split (increase in purity)not recursive because it repeats itself on the children.Prepared by David Douglas, University of ArkansasHosted by the University of Arkansas9Microsoft Enterprise Consortium9Splitting on a Numeric VariableBinary split on a numeric input considers each value of the input variable.

Takes the form of X

Popular Tags:

Click here to load reader

Reader Image
Embed Size (px)
Recommended