Home >Documents >Data Mining: �Concepts and Techniques

Data Mining: �Concepts and Techniques

Date post:01-Jan-2016
Category:
View:124 times
Download:1 times
Share this document with a friend
Description:
Data Mining: Concepts and Techniques — Chapter 1 —— Introduction —
Transcript:
  • Data Mining: Concepts and Techniques Chapter 1 Introduction Jiawei Han and Micheline KamberDepartment of Computer Science University of Illinois at Urbana-Champaignwww.cs.uiuc.edu/~hanj2006 Jiawei Han and Micheline Kamber. All rights reserved.

    Data Mining: Concepts and Techniques

  • Data Mining: Concepts and Techniques

  • Data and Information Systems(DAIS:) Course Structures at CS/UIUCCoverage: Database, data mining, text information systems and bioinformaticsData miningIntro. to data warehousing and mining (CS412: HanFall)Data mining: Principles and algorithms (CS512: HanSpring)Seminar: Advanced Topics in Data mining (CS591HanFall and Spring. 1 credit unit)Independent Study: only if you seriously plan to do your Ph.D. on data mining and try to demonstrate your abilityDatabase Systems:Database mgmt systems (CS411: Kevin Chang Fall and Spring)Advanced database systems (CS511: Kevin Chang Fall)Text information systemsText information system (CS410 ChengXiang Zhai) BioinformaticsIntroduction to BioInformatics (Saurabh Sinha)CS591 Seminar on Bioinformatics (Sinha, Zhai, Han, Schatz, Zhong)

    Data Mining: Concepts and Techniques

  • CS412 Coverage (Chapters 1-7 of This Book)The book will be covered in two courses at CS, UIUCCS412: Introduction to data warehousing and data mining (Fall)CS512: Data mining: Principles and algorithms (Spring)CS412 CoverageIntroductionData PreprocessingData Warehouse and OLAP Technology: An IntroductionAdvanced Data Cube Technology and Data GeneralizationMining Frequent Patterns, Association and CorrelationsClassification and PredictionCluster Analysis

    Data Mining: Concepts and Techniques

  • CS512 Coverage (Chapters 8-11 of This Book)Mining data streams, time-series, and sequence dataMining graphs, social networks and multi-relational data Mining object, spatial, multimedia, text and Web dataMining complex data objectsSpatial and spatiotemporal data miningMultimedia data miningText miningWeb miningApplications and trends of data miningMining business & biological dataVisual data miningData mining and society: Privacy-preserving data miningAdditional (often current) themes could be added to the course

    Data Mining: Concepts and Techniques

  • Data Mining: Concepts and Techniques

  • Chapter 1. IntroductionMotivation: Why data mining?What is data mining?Data Mining: On what kind of data?Data mining functionalityClassification of data mining systemsTop-10 most popular data mining algorithmsMajor issues in data miningOverview of the course

    Data Mining: Concepts and Techniques

  • Why Data Mining? The Explosive Growth of Data: from terabytes to petabytesData collection and data availabilityAutomated data collection tools, database systems, Web, computerized societyMajor sources of abundant dataBusiness: Web, e-commerce, transactions, stocks, Science: Remote sensing, bioinformatics, scientific simulation, Society and everyone: news, digital cameras, YouTube We are drowning in data, but starving for knowledge! Necessity is the mother of inventionData miningAutomated analysis of massive data sets

    Data Mining: Concepts and Techniques

  • Evolution of SciencesBefore 1600, empirical science1600-1950s, theoretical scienceEach discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding. 1950s-1990s, computational scienceOver the last 50 years, most disciplines have grown a third, computational branch (e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.)Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models. 1990-now, data scienceThe flood of data from new scientific instruments and simulationsThe ability to economically store and manage petabytes of data onlineThe Internet and computing Grid that makes all these archives universally accessible Scientific info. management, acquisition, organization, query, and visualization tasks scale almost linearly with data volumes. Data mining is a major new challenge!Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science, Comm. ACM, 45(11): 50-54, Nov. 2002

    Data Mining: Concepts and Techniques

  • Evolution of Database Technology1960s:Data collection, database creation, IMS and network DBMS1970s: Relational data model, relational DBMS implementation1980s: RDBMS, advanced data models (extended-relational, OO, deductive, etc.) Application-oriented DBMS (spatial, scientific, engineering, etc.)1990s: Data mining, data warehousing, multimedia databases, and Web databases2000sStream data management and miningData mining and its applicationsWeb technology (XML, data integration) and global information systems

    Data Mining: Concepts and Techniques

  • What Is Data Mining?Data mining (knowledge discovery from data) Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of dataData mining: a misnomer?Alternative namesKnowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.Watch out: Is everything data mining? Simple search and query processing (Deductive) expert systems

    Data Mining: Concepts and Techniques

  • Knowledge Discovery (KDD) ProcessData miningcore of knowledge discovery processData CleaningData IntegrationDatabasesData WarehouseTask-relevant DataSelectionData MiningPattern Evaluation

    Data Mining: Concepts and Techniques

  • Data Mining and Business Intelligence Increasing potentialto supportbusiness decisionsEnd UserBusiness Analyst DataAnalystDBADecision MakingData PresentationVisualization TechniquesData MiningInformation DiscoveryData ExplorationStatistical Summary, Querying, and ReportingData Preprocessing/Integration, Data WarehousesData SourcesPaper, Files, Web documents, Scientific experiments, Database Systems

    Data Mining: Concepts and Techniques

  • Data Mining: Confluence of Multiple Disciplines

    Data Mining: Concepts and Techniques

  • Why Not Traditional Data Analysis?Tremendous amount of dataAlgorithms must be highly scalable to handle such as tera-bytes of dataHigh-dimensionality of data Micro-array may have tens of thousands of dimensionsHigh complexity of dataData streams and sensor dataTime-series data, temporal data, sequence data Structure data, graphs, social networks and multi-linked dataHeterogeneous databases and legacy databasesSpatial, spatiotemporal, multimedia, text and Web dataSoftware programs, scientific simulationsNew and sophisticated applications

    Data Mining: Concepts and Techniques

  • Multi-Dimensional View of Data MiningData to be minedRelational, data warehouse, transactional, stream, object-oriented/relational, active, spatial, time-series, text, multi-media, heterogeneous, legacy, WWWKnowledge to be minedCharacterization, discrimination, association, classification, clustering, trend/deviation, outlier analysis, etc.Multiple/integrated functions and mining at multiple levelsTechniques utilizedDatabase-oriented, data warehouse (OLAP), machine learning, statistics, visualization, etc.Applications adaptedRetail, telecommunication, banking, fraud analysis, bio-data mining, stock market analysis, text mining, Web mining, etc.

    Data Mining: Concepts and Techniques

  • Data Mining: Classification SchemesGeneral functionalityDescriptive data mining Predictive data miningDifferent views lead to different classificationsData view: Kinds of data to be minedKnowledge view: Kinds of knowledge to be discoveredMethod view: Kinds of techniques utilizedApplication view: Kinds of applications adapted

    Data Mining: Concepts and Techniques

  • Data Mining: On What Kinds of Data?Database-oriented data sets and applicationsRelational database, data warehouse, transactional databaseAdvanced data sets and advanced applications Data streams and sensor dataTime-series data, temporal data, sequence data (incl. bio-sequences) Structure data, graphs, social networks and multi-linked dataObject-relational databasesHeterogeneous databases and legacy databasesSpatial data and spatiotemporal dataMultimedia databaseText databasesThe World-Wide Web

    Data Mining: Concepts and Techniques

  • Data Mining FunctionalitiesMultidimensional concept description: Characterization and discriminationGeneralize, summarize, and contrast data characteristics, e.g., dry vs. wet regionsFrequent patterns, association, correlation vs. causalityDiaper Beer [0.5%, 75%] (Correlation or causality?)Classification and prediction Construct models (functions) that describe and distinguish classes or concepts for future predictionE.g., classify countries based on (climate), or classify cars based on (gas mileage)Predict some unknown or missing numerical values

    Data Mining: Concepts and Techniques

  • Data Mining Functionalities (2)Cluster analysisClass label is unknown: Group data to form new classes, e.g., cluster houses to find distribution patternsMaximizing intra-class similarity & minimizing interclass similarityOutlier analysisOutlier: Data object that does not comply with the general behavior of the dataNoise or exception? Useful in fraud detection, rare events analysisTrend and evolution analysisTrend and deviation: e.g., regression analysisSequential pattern mining: e.g., digital camera large SD memoryPeriodicity analysisSimilarity-based analysisOther pattern-directed or statistical analyses

    Data Mining: Concepts and Techniques

  • Top-10 Most Popular DM Algorithms:18 Identified Candidates (I) Classification

Popular Tags:

Click here to load reader

Reader Image
Embed Size (px)
Recommended