CCB-681: Data Mining
Unit 1
Basics of data mining, Knowledge Discovery in
databases, KDD process, data mining tasks primitives,
Integration of data mining systems with a database or data
warehouse system, Major issues in data mining, Data pre-
processing: data cleaning, data integration and
transformation, data reduction etc.
Data Mining is defined as extracting information from huge
sets of data. In other words, we can say that data mining is the
procedure of mining knowledge from data. Mined knowledge
can be used for any of the following applications −
Market Analysis
Fraud Detection
Customer Retention
Production Control
Science Exploration
Why Data Mining
Credit ratings/targeted marketing:
Given a database of 100,000 names, which persons
are the least possible to default on their credit cards?
Identify possible responders to sales promotions
Fraud detection
Which types of transactions are possible to be fake,
given the demographics and transactional history of a
particular customer?
Customer relationship management:
Which of my customers are possible to be the most
loyal, and which are most possible to leave for a
competitor? :
Data Mining helps to extract such information
Today’s Scenario: The Explosive Growth of Data
And solution is Data mining—Automated analysis of
massive data sets.
What Is Data Mining?
Data mining (knowledge discovery from data)
Extraction of interesting (non-trivial (significant),
implicit (hidden), previously unknown and
potentially useful) patterns or knowledge from huge
amount of data.
Alternative name
Data mining is the analysis step of the "knowledge
discovery in databases" process or KDD.
6
Data mining
Process of semi-automatically analyzing large databases
to find patterns that are:
valid: hold on new data with some certainty (legal).
novel: non-obvious to the system (unique).
useful: should be possible to act on the item.
understandable: humans should be able to interpret
the pattern.
The actual data mining task is the semi-automatic or
automatic analysis of large quantities of data to extract
previously unknown, interesting patterns such as
groups of data records (cluster analysis), unusual
records (anomaly detection), and dependencies
(association rule mining, sequential pattern mining).
Is everything “data mining”?
Data mining is the process of discovering patterns
in large data sets involving methods at the
intersection of machine learning, statistics, and
database systems. ... Data mining is the analysis
step of the "knowledge discovery in databases"
process or KDD.
Knowledge discovery is an iterative process
Data Mining: A KDD Process
Data mining—core of
knowledge discovery process
13
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
The KDD process
The main objective of the KDD process is to extract information
from data in the context of large databases.
The Knowledge Discovery in Databases is considered as a
programmed, exploratory (experimental) analysis and modeling of
vast data repositories.
KDD is the organized procedure of recognizing valid, useful, and
understandable patterns from huge and complex data sets.
Data Mining is the root of the KDD procedure, including the
gathering of algorithms that investigate the data, develop the model,
and find previously unknown patterns.
The KDD process
The model is used for extracting the knowledge from the
data, analyze the data, and predict the data.
The knowledge discovery process is iterative and
interactive, comprises of nine steps. The process is iterative
at each stage, implying that moving back to the previous
actions might be required.
The process begins with determining the KDD objectives
and ends with the implementation of the discovered
knowledge. At that point, the loop is closed.
List of steps involved in the knowledge
discovery process −
Data Cleaning − In this step, the noise and inconsistent
data is removed.
Data Integration − In this step, multiple data sources are
combined.
Data Selection − In this step, data relevant to the analysis
task are retrieved from the database.
Data Transformation − In this step, data is transformed or
consolidated into forms appropriate for mining by
performing summary or aggregation operations.
Data Mining − In this step, intelligent methods are
applied in order to extract data patterns.
Pattern Evaluation − In this step, data patterns are
evaluated.
Knowledge Presentation − In this step, knowledge is
represented.
List of steps involved in the knowledge
discovery process −