Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | bahria-university- |
View: | 2,398 times |
Download: | 4 times |
Data Mining and Data Warehousing Techniques
Presented to : Muhammad Faisal
Presented by:
Faizan Saleem
Pireh Pirzada
Ahmed Hassan
Muhammad Usman
BSE-4 | DATABASE MANAGEMENT SYSTEM
Topics
Why we need Data warehouses and Data mining?
What Data warehouses and Data mining?
History of Data warehouses and Data mining?
Techniques of Data warehouses and Data mining
Why we need Data Mining and Ware-housing
Problem Scenario
Solution
Needs of Data warehouses and Data Mining
Why Data Warehouse?
Necessity is the mother of invention
Information
Problem Scenario 1
ABC Pvt Ltd is a company with branches at Karachi, Lahore, Peshawar and Islamabad.
The Sales Manager wants quarterly sales report.
Each branch has a separate operational system.
ABC Pvt Ltd.
Karachi
Lahore
Peshawar
Islamabad
SalesManager
Sales per item type per branchfor first quarter.
Solution for ABC Pvt Ltd.
Extract sales information from each database and Store the information in a common repository at a single site.
Solution ABC Pvt Ltd.
Karachi
Lahore
Peshawar
Islamabad
DataWarehouse
SalesManager
Query &Analysis tools
Reports
Problem Scenario 2
A Shopping Super Market has hugeoperational database. Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.
Problem
OperationalDatabase
Data Entry Operator
Data Entry Operator
ManagementWait
Report
Solutions for Shopping Mart
Extract data needed for analysis from operational database and Store it in warehouse.
Refresh warehouse at regular interval so that it contains up to date information for analysis.
Warehouse will contain data with historical perspective.
Solution
Operationaldatabase
DataWarehouse
Extractdata
Data EntryOperator
Data EntryOperator
Manager
Report
Transaction
Need for Data Warehousing
Industry has huge amount of operational data
Knowledge worker wants to turn this data into useful information.
This information is used by them to support strategic decision making .
Need for Data Warehousing
It is a platform for consolidated historical data for analysis.
It stores data of good quality so that knowledge worker can make correct decisions.
Need for Data Warehousing
From business perspective
It is latest marketing weapon
Helps to keep customers by learning more about their needs .
Valuable tool in today’s competitive fast evolving world.
Why Mine Data? Commercial Viewpoint
Lots of data is being collected and warehoused Web data, e-commerce Purchases at department/ grocery stores Bank/Credit Card
transactions
Computers have become cheaper and more powerful
Competitive Pressure is Strong Provide better, customized services for an edge
(e.g. in Customer Relationship Management)
Why Mine Data in Scientific Viewpoint
Data collected and stored at enormous speeds (GB/hour) Remote sensors on a satellite telescopes scanning the skies Microarrays generating gene expression
data Scientific simulations generating terabytes
of data
What is Data Mining and Ware-housing?
Definition Data Warehouse
Data Ware houses Uses
Definition Data Warehouse
Data Mining Uses
Data Ware Housing Verses Data Mining
Examples
What is Data Ware-Housing?
20
Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository.
A process of transforming data into information and making it available to users in a timely enough manner to make a difference.
DataInformation
Data Ware-Housing Uses
Reporting and Data Analysis.
Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.
What is Data Mining?
23
Data mining is the process of mining and discovering of new information in terms of patterns or rules from vast amounts of data involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.
What is Data Mining?
Extract information and transform it into an understandable structure.
Uses past data to analyze the outcome of a particular problem or situation.
Data Mining Uses
To decide upon marketing strategies for their product.
They can use data to compare and contrast among competitors.
Data mining interprets its data into real time analysis that can be used to: increase sales,
promote new product,
or delete product that is not value-added to the company.
Data Mining works with Warehouse Data
26
Data Warehousing provides the Enterprise with a memory
Data Mining provides the Enterprise with intelligence
Data ware-housing VS data miningData Ware Housing Occurs before any
Data mining process.
data warehousing is the process of compiling and organizing data into one common database
Data Mining Relies on data
warehousing data to detect meaningful patterns.
data mining is the process of extracting meaningful data from that database.
Example of data mining Credit Card Fraud.
Data it collection on shoppers to find patterns in their shopping habits.
A great example of data warehousing that everyone can relate to is what Facebook does.
History of Data Mining and Ware-housing?Data Warehouse History
Data Mining History
History of Data warehouse
1960s — General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.
1970s — ACNielsen and IRI provide dimensional data marts for retail sales.
1970s — Bill Inmon begins to define and discuss the term: Data Warehouse
History of Data warehouse
1975 — Sperry Univac Introduce MAPPER (MAintain, Prepare, and Produce Executive Reports) is a database management and reporting system that includes the world's first 4GL.
History of Data warehouse
1983 — Tera data introduces a database management system specifically designed for decision support.
1983 — Sperry Corporation Martyn Richard Jones defines the Sperry Information Center approach, which while not being a true DW in the Inmon sense, did contain many of the characteristics of DW structures.
History of Data warehouse
1984 — Metaphor Computer Systems releases Data Interpretation System (DIS). DIS was a hardware/software package and GUI for business users to create a database management and analytic system.
History of Data warehouse
1988 — Barry Devlin and Paul Murphy publish the article in IBM Systems Journal where they introduce the term "business data warehouse".
1990 — Red Brick Systems, founded by Ralph Kimball, introduces Red Brick Warehouse, a database management system specifically for data warehousing.
1991 — Prism Solutions, founded by Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse.
History of Data warehouse
1992 — Bill Inmon publishes the book Building the Data Warehouse.
1995 — The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.
History of Data warehouse
1996 — Ralph Kimball publishes the book The Data Warehouse Toolkit.
2000 — Daniel Linstedt releases the Data Vault, enabling real time auditable Data Warehouses warehouse.
Brief History Of Data Mining
The term "Data mining" was introduced in the 1990s.
Data mining can be tracked through classical statistics, artificial intelligence, and machine learning.
Statistics are the foundation of most technologies on which data mining is built. All of these are used to study data and data relationships.
Artificial intelligence, or AI, which is built upon heuristics as opposed to statistics, attempts to apply human-thought-like processing to statistical problems. AI concepts were adopted for RDBMS ‘s Query processor.
Brief History Of Data Mining
Brief History Of Data Mining
Machine learning is the union of statistics and AI. It could be considered an evolution of AI, because it blends AI heuristics with advanced statistical analysis.
Data Mining TechniquesTask of data mining
Applications of data mining
Processes Used in Data Mining
It is done by two Methods:
• Prediction Methods
• Description Methods
How it works
Data mining involves six common tasks
o Classification [Predictive]
o Clustering [Descriptive]
o Association Rule Discovery [Descriptive]
o Sequential Pattern Discovery [Descriptive]
o Regression [Predictive]
o Deviation Detection [Predictive]
Anomaly detection
What is Anomaly Detection ?
Types of Anomaly Detection:
• Unsupervised anomaly detection
• Supervised anomaly detection
• Semi-supervised anomaly detection
Association rule learning
What is Association rule learning
The examples:
• In super Market
• Inventory Management
Classification
What is it ?
Given a collection of records (training set )
Find a model for class attribute as a function of the values of other attributes
Goal: previously unseen records should be assigned a class as accurately as possible.
Example:
Clusters What is it ?
Example:
Sequential Pattern Discovery
What is it?
Example:
In point-of-sale transaction sequences,
Computer Bookstore:
(Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies,Tcl_Tk)
Athletic Apparel Store:
(Shoes) (Racket, Racketball) --> (Sports_Jacket)
(A B) (C) (D E)
Regression
What is it ?
Example:
Pagerank as used by google
• Page structure implicitly holds importance of a page
• Important pages are linked to by important pages
Applications Of Data Mining
Data Mining Applications in Sales/Marketing
Data Mining Applications in Banking / Finance
Data Mining Applications in Health Care and Insurance
Data Mining Applications in Transportation
Data Mining Applications in Medicine
Data Mining Applications in Sales/Marketing
enables businesses to understand the hidden patterns inside historical purchasing transaction
Market basket analysis
Identify customer’s behavior
Data Mining Applications in Banking / Finance
credit card fraud detection
identify customers loyalty
identify stock trading rules
Identify users by method of payment/transaction
Data Mining Applications in Health Care and Insurance
Claims analysis
Forecasts of customers
Detect risky customers
Fraudulent behavior
Data Mining Applications in Transportation
Determine the distribution schedules
Data Mining Applications in Medicine
Characterize patient activities
Identify the patterns
Data Ware-housing Techniques
Star Schema
Elements
Example
Star Schema VS Snowflake Schema
Star Schema
Star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions.
A star schema is diagramed by surrounding each fact with its associated dimensions.
The resulting diagram resembles a star.
Star schemas are optimized for querying large data sets and are used in data warehouses and data marts to support OLAP cubes, business intelligence and analytic applications, and queries.
Elements of star schema Dimension tables
A dimension contains reference information about the fact, such as date, product, or customer.
Demoralized, decoded and cleaned set of descriptive data elements
Geography dimension tables describe location data, such as country, state, or city
Employee dimension tables describe employees, such as salespeople
Fact Tables
A fact is an event that is counted or measured, such as a sale or login.
Contains foreign keys referencing dimension records
Contain either additive or semi-additive measures for analysis
Example
Each dimension table has a primary key on its Id column, relating to one of the columns (viewed as rows in the example schema) of the Fact_Sales table's three-column (compound) primary key (Date_Id, Store_Id, Product_Id).
The non-primary key Units_Sold column of the fact table in this example represents a measure or metric that can be used in calculations and analysis.
The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim_Date dimension).
For example, the following query answers how many TV sets have been sold, for each brand and country, in 1997:
SELECT P.Brand, S.Country, SUM(F.Units_Sold)FROM Fact_Sales FINNER JOIN Dim_Date D ON F.Date_Id = D.IdINNER JOIN Dim_Store S ON F.Store_Id = S.IdINNER JOIN Dim_Product P ON F.Product_Id = P.IdWHERE D.YEAR = 1997AND P.Product_Category = 'tv'GROUP BY P.Brand, S.Country
Snowflake Schema
Star Schema
Ease of maintenance/change:
No redundancy and hence more easy to maintain and change
Has redundant data and hence less easy to maintain/change
Ease of Use:
More complex queries and hence less easy to understand
Less complex queries and easy to understand
Query Performance:
More foreign keys-and hence more query execution time
Less no. of foreign keys and hence lesser query execution time
Normalization:Has normalized tables
Has De-normalized tables
Type of Datawarehouse:
Good to use for datawarehouse core to simplify complex relationships (many:many)
Good for datamarts with simple relationships (1:1 or 1:many)
Joins:Higher number of Joins
Fewer Joins
Dimension table:
It may have more than one dimension table for each dimension
Contains only single dimension table for each dimension
When to use:
When dimension table is relatively big in size, snowflaking is better as it reduces space.
When dimension table contains less number of rows, we can go for Star schema.
References
http://www.programmerinterview.com/index.php/database-sql/data-mining-vs-warehousing/
http://en.wikipedia.org/wiki/Data_mining
http://en.wikipedia.org/wiki/Data_warehouse
Thank you For Your Attention
Any Questions
Presented by
Engr.Faizan SaleemSoftware EngineerBahria University Karachi Campus
www.facebook.com/faiz.saleem