+ All Categories
Home > Documents > 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text...

28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text...

Date post: 23-Dec-2015
Category:
Upload: winifred-sherman
View: 216 times
Download: 0 times
Share this document with a friend
Popular Tags:
70
28.11.2001 Data mining - Application s, future, and summary 1 Intro/Ass. Rules Intro/Ass. Rules Episodes Episodes Text Mining Text Mining Home Exam Home Exam 24./26.10. 30.10. Clustering Clustering KDD Process KDD Process Appl./Summary Appl./Summary 14.11. 21.11. 7.11. 28.11. Course on Data Mining Course on Data Mining (581550-4) (581550-4)
Transcript
Page 1: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

1

Intro/Ass. RulesIntro/Ass. RulesIntro/Ass. RulesIntro/Ass. Rules

EpisodesEpisodesEpisodesEpisodes

Text MiningText MiningText MiningText Mining

Home ExamHome Exam

24./26.10.

30.10.

ClusteringClusteringClusteringClustering

KDD ProcessKDD ProcessKDD ProcessKDD Process

Appl./SummaryAppl./SummaryAppl./SummaryAppl./Summary

14.11.

21.11.

7.11.

28.11.

Course on Data Mining (581550-4)Course on Data Mining (581550-4)

Page 2: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

2

Today 28.11.2001Today 28.11.2001Today 28.11.2001Today 28.11.2001

Course on Data Mining (581550-4)Course on Data Mining (581550-4)

• Today's subjectToday's subject: :

o Data mining applications, Data mining applications, future, and summaryfuture, and summary

• The program at the end of The program at the end of this week:this week:

o Exercise: KDD ProcessExercise: KDD Process

o Seminar: KDD ProcessSeminar: KDD Process

Page 3: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

3

Applications, future and summaryApplications, future and summary

• Data mining applicationsData mining applications• How to choose a data mining How to choose a data mining

system?system?• Data mining system products Data mining system products

and research prototypesand research prototypes• Additional themes on data Additional themes on data

miningmining• Social impact of data miningSocial impact of data mining• Trends in data miningTrends in data mining• SummarySummary

Page 4: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

4

Data mining applicationsData mining applications

• Data mining is a young discipline with wide and diverse Data mining is a young discipline with wide and diverse applicationsapplications

o general principles of data mining versus domain-specific, effective data mining tools for particular applications

• Application domains, e.g.,Application domains, e.g.,

o biomedical and DNA data analysis

o financial data analysis

o retail industry

o telecommunication industry

Page 5: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

5

Biomedical data mining Biomedical data mining and DNA analysisand DNA analysis

• DNA sequencesDNA sequences consist of 4 basic building blocks consist of 4 basic building blocks (nucleotides):(nucleotides): adenine (A), cytosine (C), guanine (G), and thymine (T).

• GeneGene: a sequence of hundreds of individual nucleotides arranged in a particular order

• Semantic integration of heterogeneous, distributed Semantic integration of heterogeneous, distributed genome databasesgenome databases

o data cleaning and data integration methods developed in data mining will help

Page 6: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

6

DNA analysis – Examples (1)DNA analysis – Examples (1)

• Similarity search and comparison among DNA Similarity search and comparison among DNA sequencessequences

o compare the frequently occurring patterns of each class

o identify gene sequence patterns that play roles in various diseases

• Association analysis: Association analysis: identification of co-occurring gene sequences

o most diseases are triggered by a combination of genes acting together

o may help determine the kinds of genes that are likely to co-occur together in target samples

Page 7: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

7

DNA analysis – Examples (2)DNA analysis – Examples (2)

• Path analysis: Path analysis: linking genes to different disease development stages

o different genes may become active at different stages of the disease

o develop pharmaceutical interventions that target the different stages separately

• Visualization tools and genetic data analysisVisualization tools and genetic data analysis

Page 8: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

8

Data mining for financial data Data mining for financial data analysis (1)analysis (1)

• Collected data is often relatively complete, reliable, Collected data is often relatively complete, reliable, and of high qualityand of high quality

• Design and construction of data warehouses for Design and construction of data warehouses for multidimensional data analysis and data miningmultidimensional data analysis and data mining

o view the debt and revenue changes, e.g., by month

o access statistical information, e.g., trend

• Loan payment prediction/consumer credit policy Loan payment prediction/consumer credit policy analysisanalysis

o loan payment performance

o consumer credit rating

Page 9: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

9

Data mining for financial data Data mining for financial data analysis (2)analysis (2)

• Classification and clustering of customers for targeted Classification and clustering of customers for targeted marketingmarketing

o multidimensional segmentation to identify customer groups or associate a new customer to an appropriate customer group

• Detection of money laundering and other financial Detection of money laundering and other financial crimescrimes

o integration of multiple DBs

o tools: data visualization, linkage analysis, classification, clustering tools, outlier analysis, and sequential pattern analysis tools

Page 10: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

10

Data mining for retail industry (1)Data mining for retail industry (1)

• Retail industry:Retail industry: huge amounts of data on sales, customer shopping history, etc.

• Applications of retail data mining:Applications of retail data mining:

o identify customer buying behaviors

o discover customer shopping patterns and trends

o improve the quality of customer service

o achieve better customer retention and satisfaction

o enhance goods consumption ratios

o design more effective goods transportation and distribution policies

Page 11: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

11

Data mining in retail industry (2) Data mining in retail industry (2)

• Design and construction of data warehouses based on Design and construction of data warehouses based on the benefits of data mining the benefits of data mining (multidimensional analysis of sales, customers, products, time, and region)

• Analysis of the effectiveness of sales campaignsAnalysis of the effectiveness of sales campaigns

• Analysis of customer loyaltyAnalysis of customer loyalty

o use customer loyalty card information to register sequences of purchases of particular customers

o use sequential pattern mining to investigate changes in customer consumption or loyalty

o suggest adjustments on the pricing and variety of goods

• Purchase recommendation and cross-reference of itemsPurchase recommendation and cross-reference of items

Page 12: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

12

Data mining for Data mining for telecommunication industry (1)telecommunication industry (1)

• A rapidly expanding and highly competitive industry A rapidly expanding and highly competitive industry and a great demand for data miningand a great demand for data mining

o understand the business involved

o identify telecommunication patterns

o catch fraudulent activities

o make better use of resources

o improve the quality of service

• Multidimensional analysis of telecommunication dataMultidimensional analysis of telecommunication data

o e.g., calling-time, duration of call, location of caller, type of call, etc.

Page 13: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

13

Data mining for Data mining for telecommunication industry (2)telecommunication industry (2)

• Fraudulent pattern analysis and the identification of Fraudulent pattern analysis and the identification of unusual patternsunusual patterns

o identify potentially fraudulent users and their atypical usage patterns

o detect attempts to gain fraudulent entry to customer accounts

o discover unusual patterns which may need special attention

Page 14: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

14

Data mining for Data mining for telecommunication industry (3)telecommunication industry (3)

• Multidimensional association and sequential pattern analysis

o find usage patterns for a set of communication services by customer group, by month, etc.

o promote the sales of specific services

o improve the availability of particular services in a region

• Use of visualization tools in telecommunication data Use of visualization tools in telecommunication data analysisanalysis

Page 15: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

15

How to choose a data mining How to choose a data mining system? (1)system? (1)

• Commercial data mining systems Commercial data mining systems have little in commonhave little in common

o different data mining functionality or methodology

o may even work with completely different kinds of data sets

• For selection of a system we need to For selection of a system we need to have a multiple dimensional view of have a multiple dimensional view of existing systemsexisting systems

Page 16: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

16

How to choose a data mining How to choose a data mining system? (2)system? (2)

• Data types:Data types: relational, transactional, text, time sequence, spatial?

• System issuesSystem issues

o running on only one or on several operating systems?

o a client/server architecture?

o provide Web-based interfaces and allow XML data as input and/or output?

• Data sourcesData sources

o ASCII text files, multiple relational data sources

o support ODBC connections (OLE DB, JDBC)?

Page 17: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

17

How to choose a data mining How to choose a data mining system? (3)system? (3)

• Data mining functions and methodologiesData mining functions and methodologies

o one vs. multiple data mining functions

o one vs. variety of methods per function

• Coupling with DB and/or data warehouse systemsCoupling with DB and/or data warehouse systems

o four forms of coupling:four forms of coupling: no coupling, loose coupling, semitight coupling, and tight coupling

• Visualization tools: Visualization tools: data visualization, mining result visualization, mining process visualization, and visual data mining

Page 18: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

18

How to choose a data mining How to choose a data mining system? (4)system? (4)

• ScalabilityScalability

o row (or database size) scalability

o column (or dimension) scalability

o curse of dimensionality: curse of dimensionality: it is much more challenging to make a system column scalable that row scalable

• Data mining query language and graphical user Data mining query language and graphical user interfaceinterface

o easy-to-use and high-quality graphical user interface

o essential for user-guided, highly interactive data mining

Page 19: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

19

Data mining systems (1)Data mining systems (1)

• IBM Intelligent MinerIBM Intelligent Miner

o a wide range of data mining algorithms

o scalable mining algorithms

o toolkits:toolkits: neural network algorithms, statistical methods, data preparation, and data visualization tools

o tight integration with IBM's DB2 relational database system

• SAS Enterprise MinerSAS Enterprise Miner

o a variety of statistical analysis tools

o data warehouse tools and multiple data mining algorithms

Page 20: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

20

Data mining systems (2)Data mining systems (2)

• SGI MineSetSGI MineSet

o multiple data mining algorithms and advanced statistics

o advanced visualization tools

• Clementine (SPSS)Clementine (SPSS)

o an integrated data mining development environment for end-users and developers

o multiple data mining algorithms and visualization tools

Page 21: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

21

Data mining systems (3)Data mining systems (3)

• DBMiner (DBMiner Technology Inc.)DBMiner (DBMiner Technology Inc.)

o multiple data mining modules: discovery-driven OLAP analysis, association, classification, and clustering

o efficient, association and sequential-pattern mining functions, and visual classification tool

o mining both relational databases and data warehouses

• Microsoft SQLServer 2000Microsoft SQLServer 2000

o integrate DB and OLAP with mining

o support OLEDB for DM standard

Page 22: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

22

Additional themes on data miningAdditional themes on data mining

• Web miningWeb mining

• Visual data miningVisual data mining

• Audio data miningAudio data mining

• Theoretical foundations of data Theoretical foundations of data miningmining

• Data mining and intelligent Data mining and intelligent query answeringquery answering

Page 23: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

23

Web mining (1)Web mining (1)

• The WWW is huge, widely The WWW is huge, widely distributed, global information distributed, global information service center forservice center for

o information services: news, advertisements, consumer information, education, government, e-commerce, etc.

o hyper-link information

o access and usage information

Page 24: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

24

Web mining (2)Web mining (2)

• Web search engines:Web search engines:

o index-based: search the Web, index Web pages, and build and store huge keyword-based indices

o help locate sets of Web pages containing certain keywords

• Deficiencies of the web search engines:Deficiencies of the web search engines:

o a topic of any breadth may easily contain hundreds of thousands of documents

o many documents that are highly relevant to a topic may not contain keywords defining them

Page 25: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

25

Web mining (3)Web mining (3)

• WWW provides rich sources for WWW provides rich sources for data miningdata mining

• Challenges:Challenges:

o too huge for effective data warehousing and data mining

o too complex and heterogeneous: no standards and structure

Page 26: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

26

Web mining (4)Web mining (4)

• Web mining is a more Web mining is a more challenging task than challenging task than constructing and using web constructing and using web search enginessearch engines

• Web mining searches forWeb mining searches for

o web access patterns

o web structures

o regularity and dynamics of web contents

Page 27: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

27

Web mining (5)Web mining (5)

• Web mining taxonomy:Web mining taxonomy:

Web Mining

Web StructureMining

Web ContentMining

Web PageContent Mining

Search ResultMining

Web UsageMining

General AccessPattern Tracking

CustomizedUsage Tracking

Page 28: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

28

Visual data mining (1)Visual data mining (1)

• Visualization:Visualization: use of computer graphics to create visual images which aid in the understanding of complex, often massive representations of data

• Visual data mining:Visual data mining: the process of discovering implicit, but useful knowledge from large data sets using visualization techniques

Page 29: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

29

Visual data mining (2)Visual data mining (2)

• Purpose of visualizationPurpose of visualization

o gain insight into an information space by mapping data onto graphical primitives

o provide qualitative overview of large data sets

o search for patterns, trends, structure, irregularities, relationships among data

o help find interesting regions and suitable parameters for further quantitative analysis

o provide a visual proof of computer representations derived

Page 30: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

30

Visual data mining (3)Visual data mining (3)

• Integration of visualization and Integration of visualization and data miningdata mining

o data visualization

o data mining result visualization

o data mining process visualization

o interactive visual data mining

Page 31: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

31

Data visualizationData visualization

• Data in a database or data Data in a database or data warehouse can be viewedwarehouse can be viewed

o at different levels of granularity or abstraction

o as different combinations of attributes or dimensions

• Data can be presented in various Data can be presented in various visual formsvisual forms

Page 32: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

32

Box-plots in StatsoftBox-plots in Statsoft

Page 33: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

33

Data mining result visualizationData mining result visualization

• Presentation of the results or Presentation of the results or knowledge obtained from data knowledge obtained from data mining in visual formsmining in visual forms

• ExamplesExamples

o scatter plots and box-plots

o association rules

o clusters

o outliers

o generalized rules

Page 34: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

34

Scatter plots in Scatter plots in SAS Enterprise MinerSAS Enterprise Miner

Page 35: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

35

Association rules in MineSet 3.0Association rules in MineSet 3.0

Page 36: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

36

A decision tree in MineSet 3.0A decision tree in MineSet 3.0

Page 37: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

37

Cluster groupings in Cluster groupings in IBM Intelligent MinerIBM Intelligent Miner

Page 38: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

38

Data mining process visualizationData mining process visualization

• Presentation of the various processes of data mining Presentation of the various processes of data mining in visual forms so that users can seein visual forms so that users can see

o how the data are extracted

o from which database or data warehouse they are extracted

o how the selected data are cleaned, integrated, preprocessed, and mined

o which method is selected at data mining

o where the results are stored

o how they may be viewed

Page 39: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

39

Data mining processes in Data mining processes in ClementineClementine

Page 40: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

40

Interactive visual data miningInteractive visual data mining

• Using visualization tools in the data Using visualization tools in the data mining process to help users make mining process to help users make smart data mining decisions smart data mining decisions

• ExampleExample

o display the data distribution in a set of attributes using colored sectors or columns

o use the display to decide which sector should first be selected for classification and where a good split point for this sector may be

Page 41: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

41

Interactive visual mining by Interactive visual mining by perception-based classificationperception-based classification

Page 42: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

42

Audio data miningAudio data mining

• Audio signals Audio signals (sounds, music) are used to indicate the the patterns of data, or the features of data mining resultspatterns of data, or the features of data mining results

• An interesting alternative interesting alternative to visual mining

• An inverse task of mining audio An inverse task of mining audio (such as music) databases databases which is to find patterns from audio data

• Visual data mining Visual data mining may disclose interesting patterns using graphical displays, but requires users to concentrate on watching patterns watching patterns

• In audio data mining, audio data mining, the user listens listens to pitches, rhythms, tune, and melody in order to identify anything in order to identify anything interesting or unusualinteresting or unusual

Page 43: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

43

Theoretical foundations of Theoretical foundations of data mining (1)data mining (1)

• Data reductionData reduction

o the basis of data mining is to reduce the data representation (use, e.g., histograms or clustering)

o trades accuracy for speed

• Data compressionData compression

o the basis of data mining is compress the given data by encoding in terms of bits, association rules, decision trees, clusters, etc.

Page 44: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

44

Theoretical foundations of Theoretical foundations of data mining (2)data mining (2)

• Pattern discoveryPattern discovery

o the basis of data mining is to discover patterns occurring in the database, e.g., associations, classification models and sequential patterns

• Probability theoryProbability theory

o the basis of data mining is to discover joint probability distributions of random variables

Page 45: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

45

Theoretical foundations of Theoretical foundations of data mining (3)data mining (3)

• Microeconomic viewMicroeconomic view

o a view of utility

o the task of data mining is finding patterns that are interesting only to the extent in that they can be used in the decision-making process of some enterprise

Page 46: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

46

Theoretical foundations of Theoretical foundations of data mining (4)data mining (4)

• Inductive databasesInductive databases

o data mining is the problem of performing inductive logic on databases

o the task is to query the data and the theory (i.e., patterns) of the database

o popular among many researchers in database systems

Page 47: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

47

Data mining and Data mining and intelligent query answering (1)intelligent query answering (1)

• Query answeringQuery answering

o direct query answering:direct query answering: returns exactly what is being asked

o intelligent intelligent (or cooperative) query query answering:answering: analyzes the intent of the query and provides generalized, neighborhood or associated information relevant to the query

Page 48: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

48

Data mining and Data mining and intelligent query answering (2)intelligent query answering (2)

• Some users may not not have a clear idea clear idea of exactly what to what to minemine or what is contained in the database

• Intelligent query answering Intelligent query answering analyzes the user's intent and answers queries in an intelligent way

Page 49: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

49

Data mining and Data mining and intelligent query answering (3)intelligent query answering (3)

• A general framework for the A general framework for the integration of data mining and integration of data mining and intelligent query answeringintelligent query answering

o data query: data query: finds concrete data stored in a database

o knowledge query: knowledge query: finds rules, patterns, and other kinds of knowledge in a database

Page 50: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

50

Data mining and Data mining and intelligent query answering (4)intelligent query answering (4)

• For example, three ways to improve For example, three ways to improve on-line shopping serviceon-line shopping service

o informative query answering by providing summary information

o suggestion of additional items based on association analysis

o product promotion by sequential pattern mining

Page 51: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

51

Social impact of data miningSocial impact of data mining

• Is data mining a hype?Is data mining a hype?

• Data mining: merely Data mining: merely managers’ business or managers’ business or everyone’severyone’s

• Privacy and data securityPrivacy and data security

Page 52: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

52

Is data mining a hype, or Is data mining a hype, or will it be persistent?will it be persistent?

• Data mining is a technologyData mining is a technology

• Technological life cycle:Technological life cycle:

o innovators

o early adopters

o chasm

o early majority

o late majority

o laggards

Page 53: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

53

Life Cycle of Technology AdoptionLife Cycle of Technology Adoption

• Data mining is at chasm!?Data mining is at chasm!?

o existing data mining systems are too generic

o need business-specific data mining solutions and smooth integration of business logic with data mining functions

Page 54: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

54

Whose business is it?Whose business is it?

• Data mining will surely be an important tool for an important tool for managers’ decision makingmanagers’ decision making

• The amount of the available datadata is increasingincreasing, and data mining systemssystems will be more affordablemore affordable

• Multiple personal usesMultiple personal uses

o mine your family's medical history to identify genetically-related medical conditions

o mine the records of the companies you deal with

o mine data on stocks and company performance, etc.

• Invisible data mining: Invisible data mining: bbuild data mining functions into many intelligent tools

Page 55: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

55

Threat to privacy Threat to privacy and data security?and data security?

• ““Big Brother” is carefully watching youBig Brother” is carefully watching you

• Profiling information is collected constantlyProfiling information is collected constantly

o you use your credit card, supermarket loyalty card, or frequent flyer card, or apply for any of the above

o you surf the Web, reply to an Internet newsgroup, subscribe to a magazine, rent a video, or fill out a contest entry form

• Collection of personal data may be beneficial for Collection of personal data may be beneficial for companies and consumers, but there is also potential companies and consumers, but there is also potential for misusefor misuse

Page 56: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

56

Protect privacy and data securityProtect privacy and data security

• Fair information practicesFair information practices

o international guidelines for data privacy protection

o cover aspects relating to data collection, purpose, use, quality, openness, individual participation, and accountability

o purpose specification and use limitation

o openness: individuals have the right to know what information is collected about them, who has access to the data, and how the data are being used

• Develop and use data security-enhancing techniques, Develop and use data security-enhancing techniques, e.g., blind signatures, biometric encryption, and anonymous databases

Page 57: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

57

Trends in data mining (1)Trends in data mining (1)

• Application explorationApplication exploration

o development of application-specific data mining system

o invisible data mining (mining as built-in function)

• Scalable data mining methodsScalable data mining methods

o constraint-based mining: use of constraints to guide data mining systems in their search for interesting patterns

Page 58: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

58

Trends in data mining (2)Trends in data mining (2)

• Integration of data mining with Integration of data mining with database systems, data warehouse database systems, data warehouse systems, and web database systems systems, and web database systems

• Standardization of data mining Standardization of data mining languagelanguage

o a standard will facilitate systematic development, improve interoperability, and promote the education and use of data mining systems in industry and society

• Visual data miningVisual data mining

Page 59: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

59

Trends in data mining (3)Trends in data mining (3)

• New methods for mining complex New methods for mining complex types of datatypes of data

o more research is required towards the integration of data mining methods with existing data analysis techniques for the complex types of data

• Web miningWeb mining

• Privacy protection and information Privacy protection and information security in data miningsecurity in data mining

Page 60: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

60

Summary (1)Summary (1)

• Data mining: Data mining: semi-automatic semi-automatic discovery of interesting patterns discovery of interesting patterns from large data setsfrom large data sets

• Knowledge discovery is a Knowledge discovery is a process:process:

o preprocessing

o data mining

o postprocessing

• Application areas:Application areas: retail, telecommunication, Web mining, log analysis, …

Page 61: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

61

Summary (2)Summary (2)

• Knowledge can be mined from Knowledge can be mined from different kinds of databasesdifferent kinds of databases (relational, object-oriented, spatial, WWW, …)

• We can mine different kinds of We can mine different kinds of knowledgeknowledge (characterization, clustering, association, …)

• Data mining uses also techniques Data mining uses also techniques from other areas of computer from other areas of computer sciencescience (machine learning, statistics, visualization, …)

Page 62: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

62

Summary (3)Summary (3)• Some useful data mining Some useful data mining

techniques: techniques:

o association rules

o episodes

o text mining

o classification

o clustering

• There are also many other data There are also many other data mining methods/techniques mining methods/techniques developed, but not covered in developed, but not covered in this coursethis course

Page 63: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

63

Summary (4)Summary (4)

• It is important to It is important to

o study theoretical foundations of data mining

o watch privacy and security issues in data mining

• The future of data mining The future of data mining seems promising, even seems promising, even without hypewithout hype

Page 64: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

64

References - References - Applications etc. Applications etc. (1)(1)

• M. Ankerst, C. Elsen, M. Ester, and H.-P. Kriegel. Visual classification: An interactive approach to decision tree construction. KDD'99, San Diego, CA, Aug. 1999.

• P. Baldi and S. Brunak. Bioinformatics: The Machine Learning Approach. MIT Press, 1998.• S. Benninga and B. Czaczkes. Financial Modeling. MIT Press, 1997.• L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth

International Group, 1984.• M. Berthold and D. J. Hand. Intelligent Data Analysis: An Introduction. Springer-Verlag, 1999.• M. J. A. Berry and G. Linoff. Mastering Data Mining: The Art and Science of Customer

Relationship Management. John Wiley & Sons, 1999.• A. Baxevanis and B. F. F. Ouellette. Bioinformatics: A Practical Guide to the Analysis of Genes and

Proteins. John Wiley & Sons, 1998.• Q. Chen, M. Hsu, and U. Dayal. A data-warehouse/OLAP framework for scalable

telecommunication tandem traffic analysis. ICDE'00, San Diego, CA, Feb. 2000.• W. Cleveland. Visualizing Data. Hobart Press, Summit NJ, 1993.• S. Chakrabarti, S. Sarawagi, and B. Dom. Mining surprising patterns using temporal description

length. VLDB'98, New York, NY, Aug. 1998.• J. L. Devore. Probability and Statistics for Engineering and the Science, 4th ed. Duxbury Press,

1995.

Page 65: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

65

References - References - Applications etc. Applications etc. (2)(2)

• A. J. Dobson. An Introduction to Generalized Linear Models. Chapman and Hall, 1990.• B. Gates. Business @ the Speed of Thought. New York: Warner Books, 1999.• M. Goebel and L. Gruenwald. A survey of data mining and knowledge discovery software tools.

SIGKDD Explorations, 1:20-33, 1999.• D. Gusfield. Algorithms on Strings, Trees and Sequences, Computer Science and Computation

Biology. Cambridge University Press, New York, 1997.• J. Han, Y. Huang, N. Cercone, and Y. Fu. Intelligent query answering by knowledge discovery

techniques. IEEE Trans. Knowledge and Data Engineering, 8:373-390, 1996.• R. C. Higgins. Analysis for Financial Management. Irwin/McGraw-Hill, 1997.• C. H. Huberty. Applied Discriminant Analysis. New York: John Wiley & Sons, 1994.• T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communications of

ACM, 39:58-64, 1996.• D. A. Keim and H.-P. Kriegel. VisDB: Database exploration using multidimensional visualization.

Computer Graphics and Applications, pages 40-49, Sept. 94.• J. M. Kleinberg, C. Papadimitriou, and P. Raghavan. A microeconomic view of data mining. Data

Mining and Knowledge Discovery, 2:311-324, 1998.• H. Mannila. Methods and problems in data mining. ICDT'99 Delphi, Greece, Jan. 1997.

Page 66: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

66

References - References - Applications etc. Applications etc. (3)(3)

• R. Mattison. Data Warehousing and Data Mining for Telecommunications. Artech House, 1997.

• R. G. Miller. Survival Analysis. New York: Wiley, 1981.

• G. A. Moore. Crossing the Chasm: Marketing and Selling High-Tech Products to Mainstream Customers. Harperbusiness, 1999.

• R. H. Shumway. Applied Statistical Time Series Analysis. Prentice Hall, 1988.

• E. R. Tufte. The Visual Display of Quantitative Information. Graphics Press, Cheshire, CT, 1983.

• E. R. Tufte. Envisioning Information. Graphics Press, Cheshire, CT, 1990.

• E. R. Tufte. Visual Explanations : Images and Quantities, Evidence and Narrative. Graphics Press, Cheshire, CT, 1997.

• M. S. Waterman. Introduction to Computational Biology: Maps, Sequences, and Genomes (Interdisciplinary Statistics). CRC Press, 1995.

Page 67: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

67

Data mining conferencesData mining conferences

• 1989 IJCAI Workshop1989 IJCAI Workshop

• 1991-1994 KDD Workshops1991-1994 KDD Workshops

• 1995-1998 KDD Conferences1995-1998 KDD Conferences

• 1998 ACM SIGKDD1998 ACM SIGKDD

• 1999-> SIGKDD Conferences1999-> SIGKDD Conferences

• And many smaller/new DM conferences, e.g., And many smaller/new DM conferences, e.g.,

o PAKDD, PKDDPAKDD, PKDD

o SIAM-Data Mining, (IEEE) ICDMSIAM-Data Mining, (IEEE) ICDM

Page 68: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

68

Useful References on Data MiningUseful References on Data Mining

• DM:DM:

o Conferences: Conferences: KDD, PKDD, PAKDD, ...

o Journals:Journals: Data Mining and Knowledge Discovery, CACM

• DM/DB:DM/DB:

o Conferences: Conferences: ACM-SIGMOD/PODS, VLDB, ...

o Journals:Journals: ACM-TODS, J. ACM, IEEE-TKDE, JIIS, ...

• AI/ML:AI/ML:

o Conferences: Conferences: Machine Learning, AAAI, IJCAI, ...

o Journals:Journals: Machine Learning, Artifical Intelligence, ...Machine Learning, Artifical Intelligence, ...

Page 69: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

69

Reminder: Course OrganizationReminder: Course Organization

Course EvaluationCourse EvaluationCourse EvaluationCourse Evaluation

• Passing the course: min 30 pointsPassing the course: min 30 pointso home exam: min 13 points (max 30

points)o exercises/experiments: min 8 points

(max 20 points) at least 3 returned and reported

experimentso group presentation: min 4 points (max

10 points)• Remember also the other requirements:Remember also the other requirements:

o attending the lectures (5/7)o attending the seminars (4/5)o attending the exercises (4/5)

Page 70: 28.11.2001Data mining - Applications, future, and summary 1 Intro/Ass. Rules EpisodesEpisodes Text Mining Home Exam 24./26.10.30.10. ClusteringClustering.

28.11.2001 Data mining - Applications, future, and summary

70

Thanks to Thanks to Jiawei Han from Simon Fraser University Jiawei Han from Simon Fraser University

for his slides for his slides which greatly helped in preparing this lecture! which greatly helped in preparing this lecture!

Data mining applications, Data mining applications, future, and summaryfuture, and summary


Recommended