Date post: | 17-Jul-2015 |
Category: |
Data & Analytics |
Upload: | pragya-pandey |
View: | 140 times |
Download: | 1 times |
Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial, implicit, previously unknown
and potentially useful) patterns or knowledge from huge amount of
data
Alternative names
Knowledge discovery (mining) in databases (KDD), knowledge
extraction, data/pattern analysis, data archeology, data dredging,
information harvesting, business intelligence, etc.
DATA
MININGELEMENTS TECHNIQUES APPLICATIONS
KDD
DATA
MININGELEMENTS TECHNIQUES APPLICATIONS
Data Mining – Core
of Knowledge
Discovery Process
(KDD)
DATA
MININGELEMENTS
TECHNIQUES/
ALGORITHMSAPPLICATIONS
Data Relationships Sequential
Patterns
ClustersData Mining Techniques
Decision Trees
Neural Networks
Regression
Association Rules
Nearest Neighbor Method
Genetic Algorithm
Artificial Intelligence
DATA
MININGELEMENTS
TECHNIQUES/
ALGORITHMSAPPLICATIONS
DA
TA
R
EL
AT
ION
SH
IPS
Sequential Patterns
Finding statistically relevant patterns between data
examples where discrete values delivered in sequence.
Problems addressed
Building efficient databases, indexes for sequence
information, extracting frequently occurring patterns,
comparing sequences, recovering missing sequence
members.
Application:
{Retail Environment}
Anticipating customer behavior for prediction of future
customer purchasing habits.
Increase profit, Decrease cost : Proper management of
shelf space allocation & products display.
DATA
MININGELEMENTS
TECHNIQUES/
ALGORITHMSAPPLICATIONS
DA
TA
R
EL
AT
ION
SH
IPS
Sequential Patterns {Eg.:}
DATA
MININGELEMENTS
TECHNIQUES/
ALGORITHMSAPPLICATIONS
DA
TA
R
EL
AT
ION
SH
IPS
Clusters
Placing data elements into
related groups without advance
knowledge of the group definitions.
Popular clustering
techniques: K-means,
Expectation Maximization (EM)
Problems addressed
Find natural groupings
Preprocess data to
identify homogeneous groups
on which to build supervised
models.
Anomaly detection
DATA
MININGELEMENTS
TECHNIQUES/
ALGORITHMSAPPLICATIONS
DA
TA
R
EL
AT
ION
SH
IPS
Clusters
Application:
Plant and animal ecology
Make spatial and temporal comparisons of
communities of organisms in heterogeneous environments
Medical imaging
differentiate between different types
of tissue and blood in a three-dimensional image
Business and marketing
Partition the general population of consumers for use
in market segmentation, product positioning, new product
development and Selecting test markets.
Decision Trees
In decision tree technique, the root of the decision tree is a simple
question or condition that has multiple answers.
Each answer then leads to a set of questions or conditions that help
us determine the data so that we can make the final decision based
on it.
For example, we use the following decision tree to determine
whether or not to play tennis
DATA
MININGELEMENTS
TECHNIQUES/
ALGORITHMSAPPLICATIONS
Starting at root node, if the outlook
is overcast then we should
definitely play tennis.
If it is rainy, we should only play
tennis if the wind is week.
If it is sunny then we should play
tennis in case the humidity is
normal
Neural Networks
Set of connected input/output units and each connection has a
weight present with it. During the learning phase, network learns by
adjusting weights so as to be able to predict the correct class labels
of the input tuples.
Well suited for continuous valued inputs andoutputs
Used to extract patterns and detect trends that are too complex to be
noticed by.
Neural networks are best at identifying patterns or trends in data and
well suited for prediction of forecasting needs.
DATA
MININGELEMENTS
TECHNIQUES/
ALGORITHMSAPPLICATIONS
Example :
Handwritten character reorganization, for training a computer to
pronounce English text and many real world business problems and
have already been successfully applied in many industries.
Regression
Regression technique can be adapted for predication
Regression analysis can be used to model the relationship between
one or more independent variables and dependent variables. In data
mining independent variables are attributes already known and
response variables are what we want to predict.
However, it cannot be used for areas involving complex variables like
in sales volumes, stock prices and product failure rates.
Types of regression methods
Linear Regression
Multivariate Linear Regression
Nonlinear Regression
Multivariate Nonlinear Regress
DATA
MININGELEMENTS
TECHNIQUES/
ALGORITHMSAPPLICATIONS
Association Rules
In association, a pattern is discovered based on a relationship
between items in the same transaction.
E.g. Rule Form : “Body Head [support, confidence]”
Application
Retailers are using association technique to research
customer’s buying habits. Based on historical sale data, retailers might
find out that customers always buy crisps when they buy beers, and
therefore they can put beers and crisps next to each other to save time
for customer and increase sales.
DATA
MININGELEMENTS
TECHNIQUES/
ALGORITHMSAPPLICATIONS
Types
Multilevel association rule
Multidimensional association
rule
Quantitative association rule
DATA
MININGELEMENTS TECHNIQUES APPLICATIONS
Study of frequent flyer data from an Indian Airline
Data selected, prepared : 3 most common sectors flown & points
redeemed for.
(Note :Incomplete/Inaccurate Data supplied by airlines)
Data Mining results:
Patterns about customers flying between metropolitan cities
Customers that flew between Mumbai-Delhi also flew to other cities
like Mumbai-Chennai, Mumbai-Kolkata & Mumbai Bangalore.
Customers flying Bangalore-Hyderabad also flew Delhi-Bangalore
Those who flew Bagdogra - Guwahati did not fly back; instead flew to
Delhi
DATA
MININGELEMENTS TECHNIQUES APPLICATIONS
Banking information systems contains huge volumes of data both
operational and historical.
Data mining can assist critical decision making processes in a bank.
Areas of application:
Marketing
Risk management and
default detection
Fraud detection
Customer relationship
management
Money laundering detection
Wikipedia
http://en.wikipedia.org/wiki/Sequential_Pattern_Mini
ng
http://searchbusinessintelligence.techtarget.in/
Indian Journal of Computer Science & Engg
Introduction to Data Mining with Case Studies by
G.K. Gupta
DATA
MININGELEMENTS TECHNIQUES APPLICATIONS