+ All Categories
Home > Technology > Data warehousing and Data mining

Data warehousing and Data mining

Date post: 27-Jan-2015
Category:
Upload: bahria-university-
View: 2,398 times
Download: 4 times
Share this document with a friend
Description:
Content is about why we need Data ware Housing and data mining and describe why we need this and it's applications
Popular Tags:
65
Data Mining and Data Warehousing Techniques Presented to : Muhammad Faisal Presented by: Faizan Saleem Pireh Pirzada Ahmed Hassan Muhammad Usman BSE-4 | DATABASE MANAGEMENT SYSTEM
Transcript
Page 1: Data warehousing and Data mining

Data Mining and Data Warehousing Techniques

Presented to : Muhammad Faisal

Presented by:

Faizan Saleem

Pireh Pirzada

Ahmed Hassan

Muhammad Usman

BSE-4 | DATABASE MANAGEMENT SYSTEM

Page 2: Data warehousing and Data mining

Topics

Why we need Data warehouses and Data mining?

What Data warehouses and Data mining?

History of Data warehouses and Data mining?

Techniques of Data warehouses and Data mining

Page 3: Data warehousing and Data mining

Why we need Data Mining and Ware-housing

Problem Scenario

Solution

Needs of Data warehouses and Data Mining

Page 4: Data warehousing and Data mining

Why Data Warehouse?

Necessity is the mother of invention

Page 5: Data warehousing and Data mining

Information

Page 6: Data warehousing and Data mining

Problem Scenario 1

ABC Pvt Ltd is a company with branches at Karachi, Lahore, Peshawar and Islamabad.

The Sales Manager wants quarterly sales report.

Each branch has a separate operational system.

Page 7: Data warehousing and Data mining

ABC Pvt Ltd.

Karachi

Lahore

Peshawar

Islamabad

SalesManager

Sales per item type per branchfor first quarter.

Page 8: Data warehousing and Data mining

Solution for ABC Pvt Ltd.

Extract sales information from each database and Store the information in a common repository at a single site.

Page 9: Data warehousing and Data mining

Solution ABC Pvt Ltd.

Karachi

Lahore

Peshawar

Islamabad

DataWarehouse

SalesManager

Query &Analysis tools

Reports

Page 10: Data warehousing and Data mining

Problem Scenario 2

A Shopping Super Market has hugeoperational database. Whenever Executives wants some report the OLTP system becomes slow and data entry operators have to wait for some time.

Page 11: Data warehousing and Data mining

Problem

OperationalDatabase

Data Entry Operator

Data Entry Operator

ManagementWait

Report

Page 12: Data warehousing and Data mining

Solutions for Shopping Mart

Extract data needed for analysis from operational database and Store it in warehouse.

Refresh warehouse at regular interval so that it contains up to date information for analysis.

Warehouse will contain data with historical perspective.

Page 13: Data warehousing and Data mining

Solution

Operationaldatabase

DataWarehouse

Extractdata

Data EntryOperator

Data EntryOperator

Manager

Report

Transaction

Page 14: Data warehousing and Data mining

Need for Data Warehousing

Industry has huge amount of operational data

Knowledge worker wants to turn this data into useful information.

This information is used by them to support strategic decision making .

Page 15: Data warehousing and Data mining

Need for Data Warehousing

It is a platform for consolidated historical data for analysis.

It stores data of good quality so that knowledge worker can make correct decisions.

Page 16: Data warehousing and Data mining

Need for Data Warehousing

From business perspective

It is latest marketing weapon

Helps to keep customers by learning more about their needs .

Valuable tool in today’s competitive fast evolving world.

Page 17: Data warehousing and Data mining

Why Mine Data? Commercial Viewpoint

Lots of data is being collected and warehoused Web data, e-commerce Purchases at department/ grocery stores Bank/Credit Card

transactions

Computers have become cheaper and more powerful

Competitive Pressure is Strong Provide better, customized services for an edge

(e.g. in Customer Relationship Management)

Page 18: Data warehousing and Data mining

Why Mine Data in Scientific Viewpoint

Data collected and stored at enormous speeds (GB/hour) Remote sensors on a satellite telescopes scanning the skies Microarrays generating gene expression

data Scientific simulations generating terabytes

of data

Page 19: Data warehousing and Data mining

What is Data Mining and Ware-housing?

Definition Data Warehouse

Data Ware houses Uses

Definition Data Warehouse

Data Mining Uses

Data Ware Housing Verses Data Mining

Examples

Page 20: Data warehousing and Data mining

What is Data Ware-Housing?

20

Data warehousing can be said to be the process of centralizing or aggregating data from multiple sources into one common repository.

A process of transforming data into information and making it available to users in a timely enough manner to make a difference.

DataInformation

Page 21: Data warehousing and Data mining

Data Ware-Housing Uses

Reporting and Data Analysis.

Data warehouses store current as well as historical data and are used for creating trending reports for senior management reporting such as annual and quarterly comparisons.

Page 22: Data warehousing and Data mining
Page 23: Data warehousing and Data mining

What is Data Mining?

23

Data mining is the process of mining and discovering of new information in terms of patterns or rules from vast amounts of data involving methods at the intersection of artificial intelligence, machine learning, statistics, and database systems.

Page 24: Data warehousing and Data mining

What is Data Mining?

Extract information and transform it into an understandable structure.

Uses past data to analyze the outcome of a particular problem or situation.

Page 25: Data warehousing and Data mining

Data Mining Uses

To decide upon marketing strategies for their product.

They can use data to compare and contrast among competitors.

Data mining interprets its data into real time analysis that can be used to: increase sales,

promote new product,

or delete product that is not value-added to the company.

Page 26: Data warehousing and Data mining

Data Mining works with Warehouse Data

26

Data Warehousing provides the Enterprise with a memory

Data Mining provides the Enterprise with intelligence

Page 27: Data warehousing and Data mining

Data ware-housing VS data miningData Ware Housing Occurs before any

Data mining process.

data warehousing is the process of compiling and organizing data into one common database

Data Mining Relies on data

warehousing data to detect meaningful patterns.

data mining is the process of extracting meaningful data from that database.

Page 28: Data warehousing and Data mining

Example of data mining Credit Card Fraud.

Data it collection on shoppers to find patterns in their shopping habits.

A great example of data warehousing that everyone can relate to is what Facebook does.

Page 29: Data warehousing and Data mining

History of Data Mining and Ware-housing?Data Warehouse History

Data Mining History

Page 30: Data warehousing and Data mining

History of Data warehouse

1960s — General Mills and Dartmouth College, in a joint research project, develop the terms dimensions and facts.

1970s — ACNielsen and IRI provide dimensional data marts for retail sales.

1970s — Bill Inmon begins to define and discuss the term: Data Warehouse

Page 31: Data warehousing and Data mining

History of Data warehouse

1975 — Sperry Univac Introduce MAPPER (MAintain, Prepare, and Produce Executive Reports) is a database management and reporting system that includes the world's first 4GL.

Page 32: Data warehousing and Data mining

History of Data warehouse

1983 — Tera data introduces a database management system specifically designed for decision support.

1983 — Sperry Corporation Martyn Richard Jones defines the Sperry Information Center approach, which while not being a true DW in the Inmon sense, did contain many of the characteristics of DW structures.

Page 33: Data warehousing and Data mining

History of Data warehouse

1984 — Metaphor Computer Systems releases Data Interpretation System (DIS). DIS was a hardware/software package and GUI for business users to create a database management and analytic system.

Page 34: Data warehousing and Data mining

History of Data warehouse

1988 — Barry Devlin and Paul Murphy publish the article in IBM Systems Journal where they introduce the term "business data warehouse".

1990 — Red Brick Systems, founded by Ralph Kimball, introduces Red Brick Warehouse, a database management system specifically for data warehousing.

1991 — Prism Solutions, founded by Bill Inmon, introduces Prism Warehouse Manager, software for developing a data warehouse.

Page 35: Data warehousing and Data mining

History of Data warehouse

1992 — Bill Inmon publishes the book Building the Data Warehouse.

1995 — The Data Warehousing Institute, a for-profit organization that promotes data warehousing, is founded.

Page 36: Data warehousing and Data mining

History of Data warehouse

1996 — Ralph Kimball publishes the book The Data Warehouse Toolkit.

2000 — Daniel Linstedt releases the Data Vault, enabling real time auditable Data Warehouses warehouse.

Page 37: Data warehousing and Data mining

Brief History Of Data Mining

The term "Data mining" was introduced in the 1990s.

Data mining can be tracked through classical statistics, artificial intelligence, and machine learning.

Statistics are the foundation of most technologies on which data mining is built. All of these are used to study data and data relationships.

Page 38: Data warehousing and Data mining

Artificial intelligence, or AI, which is built upon heuristics as opposed to statistics, attempts to apply human-thought-like processing to statistical problems. AI concepts were adopted for RDBMS ‘s Query processor.

Brief History Of Data Mining

Page 39: Data warehousing and Data mining

Brief History Of Data Mining

Machine learning is the union of statistics and AI. It could be considered an evolution of AI, because it blends AI heuristics with advanced statistical analysis.

Page 40: Data warehousing and Data mining

Data Mining TechniquesTask of data mining

Applications of data mining

Page 41: Data warehousing and Data mining

Processes Used in Data Mining

It is done by two Methods:

• Prediction Methods

• Description Methods

Page 42: Data warehousing and Data mining

How it works

Data mining involves six common tasks

o Classification [Predictive]

o Clustering [Descriptive]

o Association Rule Discovery [Descriptive]

o Sequential Pattern Discovery [Descriptive]

o Regression [Predictive]

o Deviation Detection [Predictive]

Page 43: Data warehousing and Data mining

Anomaly detection

What is Anomaly Detection ?

Types of Anomaly Detection:

• Unsupervised anomaly detection

• Supervised anomaly detection

• Semi-supervised anomaly detection

Page 44: Data warehousing and Data mining

Association rule learning

What is Association rule learning

The examples:

• In super Market

• Inventory Management

Page 45: Data warehousing and Data mining

Classification

What is it ?

Given a collection of records (training set )

Find a model for class attribute as a function of the values of other attributes

Goal: previously unseen records should be assigned a class as accurately as possible.

Example:

Page 46: Data warehousing and Data mining

Clusters What is it ?

Example:

Page 47: Data warehousing and Data mining

Sequential Pattern Discovery

What is it?

Example:

In point-of-sale transaction sequences,

Computer Bookstore:

(Intro_To_Visual_C) (C++_Primer) --> (Perl_for_dummies,Tcl_Tk)

Athletic Apparel Store:

(Shoes) (Racket, Racketball) --> (Sports_Jacket)

(A B) (C) (D E)

Page 48: Data warehousing and Data mining

Regression

What is it ?

Example:

Pagerank as used by google

• Page structure implicitly holds importance of a page

• Important pages are linked to by important pages

Page 49: Data warehousing and Data mining

Applications Of Data Mining

Data Mining Applications in Sales/Marketing

Data Mining Applications in Banking / Finance

Data Mining Applications in Health Care and Insurance

Data Mining Applications in Transportation

Data Mining Applications in Medicine

Page 50: Data warehousing and Data mining

Data Mining Applications in Sales/Marketing

enables businesses to understand the hidden patterns inside historical purchasing transaction 

Market basket analysis

Identify customer’s behavior

Page 51: Data warehousing and Data mining

Data Mining Applications in Banking / Finance

 credit card fraud detection

identify customers loyalty

identify stock trading rules

Identify users by method of payment/transaction

Page 52: Data warehousing and Data mining

Data Mining Applications in Health Care and Insurance

Claims analysis

Forecasts of customers

Detect risky customers

Fraudulent behavior

Page 53: Data warehousing and Data mining

Data Mining Applications in Transportation

Determine the distribution schedules

Page 54: Data warehousing and Data mining

Data Mining Applications in Medicine

 Characterize patient activities

Identify the patterns

Page 55: Data warehousing and Data mining

Data Ware-housing Techniques

Star Schema

Elements

Example

Star Schema VS Snowflake Schema

Page 56: Data warehousing and Data mining

Star Schema

Star schema is the simplest form of a dimensional model, in which data is organized into facts and dimensions. 

A star schema is diagramed by surrounding each fact with its associated dimensions.

The resulting diagram resembles a star. 

Star schemas are optimized for querying large data sets and are used in data warehouses and data marts to support OLAP cubes, business intelligence and analytic applications, and queries.

Page 57: Data warehousing and Data mining

Elements of star schema Dimension tables

A dimension contains reference information about the fact, such as date, product, or customer. 

Demoralized, decoded and cleaned set of descriptive data elements

Geography dimension tables describe location data, such as country, state, or city

Employee dimension tables describe employees, such as salespeople

Page 58: Data warehousing and Data mining

Fact Tables

A fact is an event that is counted or measured, such as a sale or login. 

Contains foreign keys referencing dimension records

Contain either additive or semi-additive measures for analysis

Page 59: Data warehousing and Data mining
Page 60: Data warehousing and Data mining

Example

Each dimension table has a primary key on its Id column, relating to one of the columns (viewed as rows in the example schema) of the Fact_Sales table's three-column (compound) primary key (Date_Id, Store_Id, Product_Id).

The non-primary key Units_Sold column of the fact table in this example represents a measure or metric that can be used in calculations and analysis.

The non-primary key columns of the dimension tables represent additional attributes of the dimensions (such as the Year of the Dim_Date dimension).

For example, the following query answers how many TV sets have been sold, for each brand and country, in 1997:

SELECT P.Brand, S.Country, SUM(F.Units_Sold)FROM Fact_Sales FINNER JOIN Dim_Date D ON F.Date_Id = D.IdINNER JOIN Dim_Store S ON F.Store_Id = S.IdINNER JOIN Dim_Product P ON F.Product_Id = P.IdWHERE D.YEAR = 1997AND P.Product_Category = 'tv'GROUP BY P.Brand, S.Country

Page 61: Data warehousing and Data mining

  Snowflake Schema

Star Schema

Ease of maintenance/change:

No redundancy and hence more easy to maintain and change

Has redundant data and hence less easy to maintain/change

Ease of Use:

More complex queries and hence less easy to understand

Less complex queries and easy to understand

Query Performance:

More foreign keys-and hence more query execution time

Less no. of foreign keys and hence lesser query execution time

Normalization:Has normalized tables

Has De-normalized tables

Page 62: Data warehousing and Data mining

Type of Datawarehouse:

Good to use for datawarehouse core to simplify complex relationships (many:many)

Good for datamarts with simple relationships (1:1 or 1:many)

Joins:Higher number of Joins

Fewer Joins

Dimension table:

It may have more than one dimension table for each dimension

Contains only single dimension table for each dimension

When to use:

When dimension table is relatively big in size, snowflaking is better as it reduces space.

When dimension table contains less number of rows, we can go for Star schema.

Page 64: Data warehousing and Data mining

Thank you For Your Attention

Any Questions

Page 65: Data warehousing and Data mining

Presented by

Engr.Faizan SaleemSoftware EngineerBahria University Karachi Campus

[email protected]

www.facebook.com/faiz.saleem


Recommended