Date post: | 29-May-2018 |
Category: |
Documents |
Upload: | amitvaghela |
View: | 236 times |
Download: | 0 times |
of 16
8/9/2019 amit vaghela
1/16
8/9/2019 amit vaghela
2/16
` Data mining is the principle of sorting through
large amounts of data and picking out relevant
information.
` In other words, data mining is Extraction of
interesting patterns or knowledge from huge
amount of data.
8/9/2019 amit vaghela
3/16
` The knowledge discovery process (KDP), also
called knowledge discovery in databases,
seeks new knowledge in some applicationdomain.
` It is defined as the process of identifying valid,
novel, potentially useful, and ultimatelyunderstandable patterns in data.
8/9/2019 amit vaghela
4/16
1. Developing and understanding the
application domain:
This step includes learning the relevant prior
knowledge and the goals of the end user of thediscovered knowledge.
2. Creating a target data set:
This step usually includes querying theexisting data to select the desired subset.
8/9/2019 amit vaghela
5/16
3. Data cleaning and preprocessing:
This step consists of dealing with noise and
missing values in the data, and accounting fortime sequence information and known changes.
4. Data reduction and projection:
This step consists of finding useful attributes byapplying dimension reduction and transformationmethods.
8/9/2019 amit vaghela
6/16
5. Choosing the data mining task:
Here the data miner matches the goals defined
in Step 1 with a particular DM method.
6. Choosing the data mining algorithm:
The data miner selects methods to search for
patterns in the data and decides which models
and parameters of the methods used may beappropriate.
8/9/2019 amit vaghela
7/16
7. Data mining:
This step generates patterns in a particularrepresentational form such as classification
rules, decision trees etc.
8. Interpreting mined patterns:
Here the analyst performs visualization of theextracted patterns and models.
9. Consolidating discovered knowledge:
The final step consists of incorporating the discoveredknowledge into the performance system, anddocumenting and reporting it
8/9/2019 amit vaghela
8/16
3 Different Facet of Data Mining Community:
` Client (3):
Mfg & Supplier ofHospital in North America
(Innovator).` Developer (3):
A software firm. Creator of Award Wining Data
Mining Software.
` Consulting Firm (3):
This firm has earned a position of niche data
mining consulting firm.
8/9/2019 amit vaghela
9/16
Task Domain Identification is fundamental to theeffectiveness of all later phases.
` Client:
The time invested in any KD process is indicative ofboth the direct and opportunity costs of a knowledge-seeking firm, the starting conditions are important.
` Client representatives prefer to spend additional time
early-on defining and specifying the scope of eachtask and its related data requirements.
8/9/2019 amit vaghela
10/16
` Developer:
The data sets initially used or provided by clients
may be incomplete or inappropriate at first.
` Consulting firm:The consulting firm reviewed the initial data
through the use of baseline summary statistics
and checked on redundancy. If necessary,
cleaning and aggregation techniques were applied
8/9/2019 amit vaghela
11/16
` Once the task domain has been partially structured, thespecification and application of effective KD strategies can beconsidered.
` Data mining techniques described as either directed orundirected searches.
` Fully directed techniques required the a priori specification ofinputs, outputs, and models.
` Less directed techniques, often utilizing step-wise and self-organizing approaches
8/9/2019 amit vaghela
12/16
` Client:
The Client firm did suggest that the presence of timeconstraints and a desire to provide manageable results
tends to force a streamlining of the analysis wheneverpossible.
` Consulting firm:
Suggested to gain an entirely structured reduction
approach through adequate level of understanding of theproblem and available data because of time requirement ofcomplex analyses.
8/9/2019 amit vaghela
13/16
` Developer:
The Development firm proposed alternate
approaches. The consideration of simple metrics,such as correlations, was joined with the
consideration of more complex techniques and
they also agree with the need of frequent data
reduction.
8/9/2019 amit vaghela
14/16
` The ultimate phase of the knowledge discovery
process involved the interpretation of the results
provided by analyst-specified algorithmic search.
` Client:
A representative from the Client firm claimed that,
as subsequent evaluations and iterations
occurred, result is based upon total availableknowledge.
8/9/2019 amit vaghela
15/16
` Development firm:
Development firm representatives insisted that thediscovery process was a cycle of trial and error.
` Consulting firm:As evident from the formality of alternation schemesproposed by the Consulting and Development firms,the relevancy of such issues seems apparent. As
such, the timing of such alternations may have aprofound impact on the efficacy of the process.
8/9/2019 amit vaghela
16/16