Date post: | 05-Jan-2016 |
Category: |
Documents |
Upload: | ayan-chakravorty |
View: | 19 times |
Download: | 3 times |
1
Data Mining
Data Mining References Jiawei Han and Micheline Kamber, ”Data Mining: Concepts and
Techniques”, Morgan Kaufmann Publishers, Elsevier, 3rd Edition, 2012.
Margaret H. Dunham, “Data Mining: Introduction and Advanced Topics”, Pearson Education, 2006.
Pang-Ning Tan, Michael Steinbach and Vipin Kumar, “Introduction to Data Mining “, Pearson Education, 2006.
Richard O. Duda, Peter E. Hart and David G. Stork , “Pattern Classification”, Wiley Publication, 2nd Edition, 2000.
Ian H. Witten, Eibe Frank and Mark A. Hall, “Data Mining Practical Machine Learning Tools and Techniques”, Morgan Kaufmann Publishers, Elsevier, 3rd Edition, 2011.
IEEE Transactions Knowledge and Data Engineering
ACM Transactions Information Systems Database Systems Internet Technology 2
Data Mining Objectives
Data Mining or Knowledge Discovery from Data
OBJECTIVES Understanding basic data mining concepts &
techniques: uncovering interesting data patterns, hidden in large data
sets
Development of data mining tools: scalable and efficient
3
Evolution of Sciences
Before 1600, empirical science
1600-1950s, theoretical science Each discipline has grown a theoretical component. Theoretical models
often motivate experiments and generalize our understanding.
1950s-1990s, computational science Over the last 50 years, most disciplines have grown a third, computational
branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.)
Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models.
1990-now, data science The flood of data from new scientific instruments and simulations The ability to economically store and manage petabytes of data online The Internet and computing Grid that makes all these archives universally
accessible Scientific info. management, acquisition, organization, query, and
visualization tasks scale almost linearly with data volumes. Data mining is a major new challenge!
5
Evolution of Database Technology
1960s: data creation & collection
electronic mode IMS
hierarchical database system by IBM network DBMS
1970s: relational data model relational DBMS implementation
1980s: RDBMS advanced data models
extended-relational, OO, deductive, etc. application-oriented DBMS
spatial, scientific, engineering, etc.
6
Evolution of Database Technology
1990s: Data mining Data warehousing Multimedia databases Web databases
2000s: Stream data management and mining Data mining and its applications Web technology
XML data integration social networks global information systems
DM Evolution
7
8
Data Mining Importance
The Explosive Growth of Data: Terabytes (240 bytes)
Petabytes
Exabytes
Zitabytes
Drowning in DATA, but STARVING for KNOWLEDGE !
Data Tombs to Golden Nuggets
PLATO Greek philosopher and mathematician
Necessity is the Mother of Invention
Data Mining— automated analysis of massive data sets
9
Data Mining Definition
Data mining definition: Extraction or mining of interesting (non-trivial, implicit, previously
unknown and potentially useful) patterns or knowledge from large amounts of data stored in databases, data warehouses, or other information repositories
Alternative names knowledge discovery (mining) in databases (KDD) knowledge extraction data/pattern analysis data archeology data dredging information harvesting business intelligence etc.
10
Knowledge Discovery (KDD) Process
Data mining core of knowledge discovery process
Data Cleaning (remove noise and inconsistent data)
Data Integration (combine multiple data sources)
Databases
Data Warehouse
Task-relevant Data
Selection(retrieve relevant data)
Data Mining(intelligent methods applied to extract patterns)
Pattern
Transformation (summary, aggregation etc.)
Data Mining
Pattern Evaluation(identify true interesting patterns representing knowledge)
Data Mining TOOLS
EXPLORE !!!!!!!!!!!!!!
R TOOL
PYTHON TOOL
WEKA TOOL
SPSS TOOL
ORANGE TOOL
CLEMENTINE TOOL
And many more….
References: “DM Papers”
11
12
Data Mining and Business Intelligence
Increasing potentialto supportbusiness decisions End User
Business Analyst
DataAnalyst
DBA
Decision
MakingData Presentation
Visualization Techniques
Data MiningInformation Discovery
Data ExplorationStatistical Summary, Querying, and Reporting
Data Preprocessing/Integration, Data Warehouses
Data SourcesPaper, Files, Web documents, Scientific experiments, Database Systems
13
Data Mining: Confluence of Multiple Disciplines
Data Mining
Database Technology Statistics
MachineLearning
PatternRecognition
AlgorithmOther
Disciplines
Visualization
14
Why Not Traditional Data Analysis?
Tremendous amount of data Algorithms must be highly scalable to handle such as tera-
bytes of data High-dimensionality of data
Micro-array may have tens of thousands of dimensions High complexity of data
Data streams and sensor data Time-series data, temporal data, sequence data Structure data, graphs, social networks and multi-linked data Heterogeneous databases and legacy databases Spatial, spatiotemporal, multimedia, text and Web data Software programs, scientific simulations
New and sophisticated applications
15
Data Mining: Classification Schemes
General functionality
Descriptive data mining
Predictive data mining
Different views lead to different classifications
Data view: Kinds of data to be mined
Knowledge view: Kinds of knowledge to be
discovered
Method view: Kinds of techniques utilized
Application view: Kinds of applications adapted
16
Multi-Dimensional View of Data Mining
Data to be mined Relational, data warehouse, transactional, stream, object-
oriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWW
Knowledge to be mined Characterization, discrimination, association, classification,
clustering, trend/deviation, outlier analysis, etc. Multiple/integrated functions and mining at multiple levels
Techniques utilized Database-oriented, data warehouse (OLAP), machine learning,
statistics, visualization, etc. Applications adapted
Retail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.
Data Warehousing
consolidation of data from several databases which are in turn maintained by individual business units along with historical and summary information
17
Roll-up
Multi-Tiered ArchitectureMulti-Tiered Architecture
18
DataWarehouse
ExtractTransformLoadRefresh
OLAP Engine
AnalysisQueryReportsData mining
Monitor&
IntegratorMetadata
Data Sources Front-End Tools
Serve
Data Marts
Operational DBs
othersources
Data Storage
OLAP Server
Data Mining Research Publications
Tayal, D. K., Jain, A., Arora, S. , Agarwal, S., Gupta, T. and Tyagi, N., “Crime Detection and Criminal Identification in India Using Data Mining Techniques”, Artificial Intelligence & Society (AIS), SPRINGER, vol. 30, no. 1, pp. 117-127, Feb 2015. [Indexed: Scopus, Google Scholar, EDSCO, ACM Digital Library, DBLP]
Jain, A. Yadav, D., and Tayal, D. K., “ NER for Hindi Language Using Association Rules”, International Conference on Data Mining and Intelligent Computing (ICDMIC 2014), IGDTUW Delhi, India, IEEE, 5th-6th Sept 2014. [Indexed: Scopus]
19