Date post: | 06-Apr-2018 |
Category: |
Documents |
Upload: | karthik-jalla |
View: | 221 times |
Download: | 0 times |
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 1/18
OptimizedDistributed
Data
Mining
1
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 2/18
Introduction
With the explosive growth of informationsources available on the World Wide Web.
It has become increasingly necessary forusers to utilize automated tools in findingthe desired information resources and to
track and analyze their usage patterns.
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 3/18
Applications of Data Mining:
Data mining tools predict future trends andbehaviors, allowing businesses to make proactive,knowledge-driven decisions.
Data mining tools can answer business questions
that traditionally were too time consuming toresolve.
Data mining techniques can be implementedrapidly on existing software and hardware
platforms . To enhance the value of existing information
resources, and can be integrated with newproducts and systems as they are brought on-line.
3
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 4/18
What do you mean by data
mining?
4
The process of extracting valid, previously
unknown, comprehensible, actionable informationfrom the large database.
Extraction of hidden predictive information from
large data base.
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 5/18
Brief description:
ODAM is a distributed algorithm for geographicallydistributed data sets that reduces communicationcosts.
Distributed Association Rule Mining (D-ARM)algorithms have been developed, to mine patternsacross distributed databases.
Existing D-ARM algorithms cannot discover rulesbased on higher-order associations between items indistributed textual documents
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 6/18
ARM(Association RuleMining)
Association rule mining is the active data mining researcharea.
This ARM algorithm caters to a centralized environment.
ARM algorithms are focused on sequential or centralizedenvironment.
Association rule mining finds interesting associations and /orcorrelation relationships among large set of data items.
Association rules provide information of this type in the formof "if-then" statements.
6
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 7/18
................................................
..... In addition to the antecedent (the "if" part) and the
consequent (the "then" part) an association rule has two
numbers that express the degree of uncertainty about the
rule.
--Support--Confidence
Example:
bread => milk | 80%
Association rules can be between more than 2 items.
bread, milk => jam | 60%
7
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 8/18
• Item set x={x1,x2,x3…. xn}
• Find all the rules with the minimum support and
confidence.• Support ,s, probability that a transaction contains xUy.
• confidence, c, conditional probability that a transaction ‘x’
also contains ‘y’.
Working Of Association Rule
8
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 9/189
Aim of ARM:
To reduce the communication cast and synchronization in
data mining system.
We introduced this new system to mainly achieve two major
issues.
Communication
Synchronization
Decreasing of the communication cast is the one of the major
advantage in this new system.
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 10/18
Data mining mainly includes the following methods:a. Association algorithm:
This rule implies certain association relationship among
set of objects in a database.
b. Classification :
The process of dividing a dataset into mutually exclusive
groups such that member of each group close as possible to one
another, and different groups are far as possible from one another,
where distance is measured with respected to specific variable.
c. Clustering algorithm:
Here the process is same as above but the distance is
measured with the all variables.
10
Existing system contains
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 11/18
Transaction-id Items bought
10 A, B, D
20 A, C, D
30 A, D, E
40 B, E, F
50 B, C, D, E, FCustomerBuys milk
CustomerBuys bread
CustomerBuys both
The occurrence of the data mining using the association rule mining is
shown in the ven-diagram as follows:
The example of the tabular form of the data base(Market Basket Analysis.
Operation occurrence
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 12/18
The Apriori Algorithm—An Example
12
Database TDB
1st scan
C 1
L1
L2
C 2 C
2
2nd scan
C 3
L33rd scan
Tid Items
10 A, C, D
20 B, C, E
30 A, B, C, E
40 B, E
Itemset sup
{A} 2
{B} 3
{C} 3
{D} 1
{E} 3
Itemset sup{A} 2
{B} 3
{C} 3
{E} 3
Itemset
{A, B}
{A, C}
{A, E}
{B, C}
{B, E}
{C, E}
Itemset sup
{A, B} 1
{A, C} 2
{A, E} 1
{B, C} 2
{B, E} 3
{C, E} 2
Itemset sup
{A, C} 2
{B, C} 2
{B, E} 3{C, E} 2
Itemset
{B, C, E}
Itemset sup
{B, C, E} 2
Supmin = 2
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 13/18
Proposed System:
The reduce the communication cost in the new system
we highlight several message optimization techniques
those are:
-Direct support count-Indirect support count exchange methods.
Communication is one of the most important DARM
objectives.
All sites share a common globally frequent itemset with
identical support counts.
13
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 14/18
Algorithm design:
Same as Association mining but it broadcasts supportcounts of candidate itemsets after every pass.
ODAM first computes support counts of 1-itemsets from
each site in the same manner .
It then broadcasts those itemsets to other sites and
discovers the global frequent 1-itemsets.
Subsequently, each site generates candidate 2- itemsets
and computes their support counts.
14
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 15/18
--------------------------------------------- ODAM also eliminates all globally infrequent 1-
itemsets from every transaction.
inserts the new transaction into new memory.
After generating support counts of candidate 2-itemsets at each site, then
-ODAM generates the globally frequent 2-itemsets,then iterates through main memory.
-then generates the support counts of candidateitem sets of respective length.
15
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 16/18
----------------------------------------- Hence, it reduces the transaction size (the number of
items) and finds more identical transactions.
Finally, it writes all main-memory entries for this
partition into a temp file ..
-then each local site generates support counts andbroadcasts them to all other sites to let each site
calculate globally frequent item sets for that pass.
16
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 17/18
17
Implementation
This new system includes all the activities those are present
in the existing system and we included some new features. It is implemented using JAVA.
We established a socket-based, client-server distributed
environment to evaluate ODAM’s message reduction
techniques. Each site has a receiving and sending unit and assigns a
specific port to send and receive candidate support counts.
8/3/2019 Ppt(15 Slides)
http://slidepdf.com/reader/full/ppt15-slides 18/18
Requirements
18
Hardware requirements:
Processor : Intel processor IV
RAM : 128MB
Hard disk : 20GB
Monitor : 15’ color
Keyboard : 108 mercury keyboard
Mouse : Logitech mouse
Software requirements:Operating system : windows xp/2000
Language used : J2sdk1.4.0, Jcreator