+ All Categories
Home > Software > Data mining concepts and work

Data mining concepts and work

Date post: 20-Jan-2017
Category:
Upload: amr-abd-el-latief
View: 211 times
Download: 0 times
Share this document with a friend
31
Data Mining Concepts Presented to : Dr. Rabie By : Amr Abd EL Latief Abd El Al
Transcript

Data Mining Concepts

Data Mining ConceptsPresented to : Dr. Rabie

By :Amr Abd EL Latief Abd El Al

Data Mining Def.Def. : Data mining is the extraction of interesting patterns orknowledge from huge amount of data. Known different names : knowledge discovery (mining) in databases (KDD) knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence and others. [1]

What is Data Mining Data Mining enables data exploration, data analysis, and data visualization of huge databases at a high level of abstraction, without a specific hypothesis in mind.working of data mining is understood by using a method called modeling with it to make predictions.

Data Mining Technologies include : artificial neural networks decision trees genetic algorithms.Machine Learning .Evolutionary ComputingMOEA Multi objective Evolutionary Computing

Data Mining System Arch.

Data Mining Procedure

The Process of Data Mining

Data Types Application S.V.Business transactionsScientific dataMedical and personal dataSurveillance video and picturesSatellite sensingText reports and memos (e-mail messages)Most of the communicationsThe World Wide Web repositories

types of data (Data Structure S.V.)Flat filesRelational DatabasesData WarehousesTransaction DatabasesMultimedia DatabasesSpatial DatabasesWorld Wide Web

FUNCTIONALITIES AND CLASSIFICATIONS OFDATA MININGCharacterizationDiscriminationAssociation analysisClassificationuses given class labels to order the objects inthe data collection Classification approaches normally use atraining set where all objects are already associated withknown class labels. The classification algorithm learns fromthe training set and builds a model. The model is used toclassify new objects.PredictionPrediction

Classification according to the type of data source minedThis classification categorizes data mining systems according to the type of data handled:spatial datamultimedia data time-series data text data World Wide Web.

Classification according to the data model drawn onThis classification categorizes data mining systems based on the data model involved:Relational database object-oriented database data warehouseTransactionalothers

Classification according to the king of knowledge discoveredThis classification categorizes data mining systems based on the kind of knowledge discovered or data mining functionalities:Characterization discriminationAssociation classification clustering others

Classification according to mining techniques usedThe classification categorizes data mining systems according to the data analysis approach used: machine learningneural networksGenetic algorithmsStatistics visualizationdatabase oriented data warehouse-oriented others

take into account the degree of user interaction involved in the data mining process

query-driven systems,interactive exploratory systems autonomous systemsNote: A comprehensive system would provide a wide variety of data mining techniques to fit different situations and options, and offer different degrees of user interaction.

[2]

Papers

Data Mining Goalsthe two main goals of DM are: description prediction.

Standard tasks in the field of DM are: description, clustering, association discovery, sequential pattern analysis, classification and regression.Description : can be obtained by characterization or by discrimination. Characterization: is a summarization of the general featuresDiscrimination :does not differ too much from characterization. It consists of characterizing a class by comparison with another one.

Data Mining GoalsClustering differs from classification since it analyses data objects without knowing their class. Association : discovery results in a set of association rules which represents attribute-value conditions frequently occurring in a given set of data.

Sequential pattern analysis : consists in searching for frequently occurring patterns related to time.Regression : uses existing values of some variables in order to forecast what values of another continuous variable will be

Machine LearningA ML system uses an entire finite set of objects, examples which represent observations of the environment ; the learning algorithm learns a model from this set which is called the training set.ML In DM include: databases data warehouses flat files

Classification in DMClassification: is a form of data analysis that can be used to extract models describing important classes or to predict future trends.

It represents : learning paradigm which consists in segmenting data by assigning it to groups, or classes,, that are already defined.

the assumption is a small database size but In Data Mining it must be scalable technique.

Classification in DMclasses are represented by: the values of a particular attribute called goal attribute and remaining attributes are called predicting attribute.resulting model is usually represented as: a set of IF-THEN prediction rules where each one predicts a class from the predicting attributes.

ML in ClassificationProcedure:Algorithms are first applied to the so-called training set which contains training examples with a known class to discover rules. the model is used for classification on a set of examples, called the test set. The predictive accuracy of the model is evaluated on the test set

Classification MethodsMain classification methods are: decision tree inductionScalability problemBayesian classification neural network learning.Draw Backs:Time-consumingdifficulty for humans to interpret their results.

ASSOCIATION ANALYSISThey show relationships between attributes. Their typical application domain is market basket and transaction data analysis.Association Rules:An association rule is generally defined as an expressionX=>Y,where X and Y are sets of attribute-value terms

ASSOCIATION ANALYSISRules are not supposed to be strictly correct in order for them to be useful. It is generally required to find rules which are true to some degree only.X implies YX tends to imply Y Support and confidence

Apriori AlgorithmDepends on Frqeuent occurence Draw Backs :Large number of database scans Large size of generated intermediate sets. Apriori mining only Boolean and single-dimensional association rules.These rules are adapted to market basket analysis and can

28

GA Advantages in Data Mining DM problem needs: robustness of solutions and scalabilityGA Advantages:there is high ability to find patterns in vey large spaces.parallel implementationIt performs a kind Of global search rather than local hill-climbing.the patterns produced are directly understandable

Search Challengesscalability problems is an important researchchallenge too.MULTI-OBJECTIVE RULE EXTRACTIONMOEA Issues

Aperior Ex.


Recommended