+ All Categories
Home > Education > Data warehousing

Data warehousing

Date post: 04-Dec-2014
Category:
Upload: shifali-goyal
View: 575 times
Download: 3 times
Share this document with a friend
Description:
 
Popular Tags:
43
PRESENTATION ON DATA WAREHOUSING AND DATA MINING SUBMITTED TO: SUBMITTED BY:- MRS.MANISHA BHATNAGAR (HOD OF COMP. SCI DEPT) MCA-III MRS.HARKAWNALJEET KAUR ROLL NO:9 (ASST. PROF OF COMP. SCIENCE DEPT)
Transcript
Page 1: Data warehousing

PRESENTATIONON

DATA WAREHOUSING AND

DATA MININGSUBMITTED TO: SUBMITTED BY:-MRS.MANISHA BHATNAGAR (HOD OF COMP. SCI DEPT) MCA-IIIMRS.HARKAWNALJEET KAUR ROLL NO:9(ASST. PROF OF COMP. SCIENCE DEPT)

Page 2: Data warehousing

CONTENTS

DATA WAREHOUSECHARACTERSTICS OF DATA WAREHOUSEARCHITECTURE OF DATA WAREHOUSEDATA STORING IN DATA WAREHOUSEDATA WAREHOUSE DIMESIONAL MODELLINGINSTALLING THE SERVICE MANAGER DATA WAREHOUSE SERVEREXAMPLE OF DATA WAREHOUSEADVANTAGE OF DATA WAREHOUSEDISADVANTAGE OF DATA WAREHOUSE

Page 3: Data warehousing

CONTENTS

DATA MININGELEMENTS OF DATA MININGDATA MINING PROCESSARCHITECTURE OF DATA MININGADDING THE OPTION OF DATA MINING TO A DATABASEADVANTAGES OF DATA MININGDISADVANTAGES OF DATA MINING

Page 4: Data warehousing

DATA WAREHOUSE

A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but can include data from other sources.

DEFINITION OF DATA WAREHOUSE “ A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of different sources and made available to end users in a way they can understand and use it in a business context.”

BARLIEN DEVLIN, IBM CONSULTANT

Page 5: Data warehousing

CHARACTERSTICS OF DATA WAREHOUSING

  

Subject Oriented: -Data are organized according to subject instead of application. Data warehouses are designed to help you analyze data

Integrated: -Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format.

Nonvolatile: - Nonvolatile means that, once entered into the data warehouse, data should not change.

Time Variant: - Data warehouse maintains historical data which are used to analyze the business or market trends and facilitate future predictions. 

  

Page 6: Data warehousing

Data Warehouse Architectures 

Data warehouses and their architectures vary depending upon the specifics of an organization's situation. Three common architectures are:

■ Data Warehouse Architecture (Basic)

■ Data Warehouse Architecture (with a Staging Area)

■ Data Warehouse Architecture (with a Staging Area and Data Marts) 

Page 7: Data warehousing

Data Warehouse Architecture (Basic)

Figure shows a simple architecture for a data warehouse. End users directly access data derived from several source systems through the data warehouse.

Page 8: Data warehousing

Data Warehouse Architecture (with a Staging Area)

you need to clean and process your operational data before putting it into the warehouse. You can do this programmatically, although most data warehouses use a staging area instead. A staging area simplifies building summaries and general warehouse management. Figure illustrates this typical architecture.

Page 9: Data warehousing

Data Warehouse Architecture (with a Staging Area)

Page 10: Data warehousing

Data Warehouse Architecture (with a Staging Area and Data Marts)

Although the architecture in Figure  is quite common, you may want to customize your warehouse's architecture for different groups within your organization. You can do this by adding data marts, which are systems designed for a particular line of business. Figure illustrates an example where purchasing, sales, and inventories are separated. In this example, a financial analyst might want to analyze historical data for purchases and sales.

Page 11: Data warehousing

Data Warehouse Architecture (with a Staging Area and Data Marts)

Page 12: Data warehousing

DATA STORING IN DATA WAREHOUSE

FACT TABLE: - The central table that contains the fact data. Fact tables represent data usually numeric that are analyzed and examined. DIMENSION TABLE:-Dimension tables store the information you normally use to contain queries.

Page 13: Data warehousing

Data Warehouse Dimensional Modelling (Types of Schemas)

There are four types of schemas are available in data warehouse.

SCHEMA

STAR SCHEMA

SNOWFLAKESCHEMA

GALAXYSCHEMA

FACT CONSTELLATION

SCHEMA

Page 14: Data warehousing

Star Schema

A star schema is the one in which a central fact table is sourrounded by denormalized dimensional tables.

Page 15: Data warehousing

Snowflake schema

A snow flake schema is an enhancement of star schema by adding additional dimensions.

Page 16: Data warehousing

Snowflake Schema

NumberProd_idstore_idquantity

NumberProd_idstore_idquantity

Sale fact table

idNameGeography_id

Store

Prod_idBrand_id cost

product

IdBrand

Brand

IdStatecountry

geography

Page 17: Data warehousing

Galaxy Schema

Galaxy schema contains many fact tables with some common dimensions (conformed dimensions). It is also known as Fact Constellation Schema

Page 18: Data warehousing

Galaxy Schema

NumberProd_idRetail_idquantity

NumberProd_idRetail_idquantity

Sale fact table

NumberProd_idsupplier_idquantity

NumberProd_idsupplier_idquantity

Purchase fact table

Retail_idNamecity

Retailer

Prod_idTypecost

product

Supplier_idNamecountry

supplier

Page 19: Data warehousing

Installing the Service Manager Data Warehouse Server

By using Service Manager Setup .

Page 20: Data warehousing

EXAMPLE OF DATA WAREHOUSE

McMaster’s Data Warehouse design

Page 21: Data warehousing

EXAMPLE OF DATA WAREHOUSE

ARCHITECTURE (PRODUCTION ENVIRONMENT) We are considering implementing the following three-tier platform which will allow us to scale horizontally in the future: Our development environment consists of a server with 2 x Intel Xeon 2.8GHz Processors, 2GB of RAM and is running Windows 2000 – Service Pack 4. We are considering the following for the scaled roll-out of our production environment. A. Hardware 1. Server 1 - SAS® Data Server - 4 way 64 bit 1.5Ghz Itanium2 server - 16 Gb RAM - 2 73 Gb Drives (RAID 1) for the OS - 1 10/100/1Gb Cu Ethernet card

Page 22: Data warehousing

EXAMPLE OF DATA WAREHOUSE

ARCHITECTURE (PRODUCTION ENVIRONMENT)- 1 Windows 2003 Enterprise Edition for Itanium 2 Mid-Tier (Web) Server - 2 way 32 bit 3Ghz Xeon Server - 4 Gb RAM - 1 10/100/1Gb Cu Ethernet card - 1 Windows 2003 Enterprise Edition for x86 3. SAN Drive Array (modular and can grow with the warehouse) - 6 – 72GB Drives (RAID 5) total 360GB for SAS® and Data  

Page 23: Data warehousing

EXAMPLE OF DATA WAREHOUSE

ARCHITECTURE (PRODUCTION ENVIRONMENT) B. Software 1. Server 1 - SAS® Data Server - SAS® 9.1.3 - SAS® Metadata Server - SAS® WorkSpace Server - SAS® Stored Process Server - Platform JobScheduler 2. Mid -Tier Server - SAS® Web Report Studio - SAS® Information Delivery Portal - BEA Web Logic for future SAS® SPM Platform - Xythos Web File System (WFS)  

Page 24: Data warehousing

EXAMPLE OF DATA WAREHOUSE

ARCHITECTURE (PRODUCTION ENVIRONMENT) 

3. Client –Tier Server - SAS® Enterprise Guide - SAS® Add-In for Microsoft Office

Page 25: Data warehousing

BENFITS OF DATA WAREHOUSE

1. A Data Warehouse Delivers Enhanced Business Intelligence: -Insights will be gained through improved information access.

 Managers and executives will be freed from making their decisions based on limited data and their own “gut feelings”.  Decisions that affect the strategy and operations of organizations will be based upon credible facts and will be backed up with evidence and actual organizational data.

2. A Data Warehouse Saves TimeSince business users can quickly access critical data from a

number of sources—all in one place—they can rapidly make informed decisions on key initiatives. They won’t waste precious time retrieving data from multiple sources.

Page 26: Data warehousing

BENFITS OF DATA WAREHOUSE

3. A Data Warehouse Enhances Data Quality and Consistency

A data warehouse implementation includes the conversion of  data from numerous source systems  into a common format.  So you can have more confidence in the accuracy of your data. And accurate data is the basis for strong business decisions.

4. A Data Warehouse Provides Historical IntelligenceA data warehouse stores large amounts of historical data so

you can analyze different time periods and trends in order to make future predictions. Such data typically cannot be stored in a transactional database or used to generate reports from a transactional system.

Page 27: Data warehousing

BENFITS OF DATA WAREHOUSE

5. A Data Warehouse Generates a High ROIFinally, the piece de resistance—return on investment.

Companies that have implemented data warehouses and complementary BI systems have generated more revenue  and saved more money than companies that haven’t invested in BI systems and data warehouses.

 

Page 28: Data warehousing

DISADVANTAGES OF DATA WAREHOUSE

•Long initial implementation time and associated high cost

•Adding new data sources takes time and associated high cost

•Limited flexibility of use and types of users - requires multiple separate data marts for multiple uses and types of users

•Typically, data is static and dated

•Difficult to accommodate changes in data types and ranges, data source schema.  

Page 29: Data warehousing

 

DATA MINING

Data mining is process of discovering hidden, previously unknown and usable information from a large amount of data. It is often defined as finding hidden information in a database.

DEFINITION OF DATA MINING: -

“The efficient discovery of valuable non-obvious information from a large collection of data.” [BIGUS 96]

Page 30: Data warehousing

Elements of Data mining 

•Extract, transform, and load transaction data onto the data warehouse system.

•Store and manage the data in a multidimensional database system.

•Provide data access to business analysts and information technology professionals.

•Analyze the data by application software.

•Present the data in a useful format, such as a graph or table.   

Page 31: Data warehousing

DATA MINING PROCESS

Slide 1 and slide 2

Page 32: Data warehousing

DATA MINING PROCESS

SELECTION:- Selecting the data according to some criteria.PREPROCESSING:- This is the data cleansing stage where certain information is removed which is unnecessary and may slow down queries.TRANSFORMATION:- The data is not merely transferred across but transformed in that overlays may be added such as demographic overlays commonly used in market research.

Page 33: Data warehousing

DATA MINING PROCESS

DATA MINING:- the stage is concerned with the extraction of patterns from the data. A pattern can be defined as given a set of facts(data) F, a language L, and some measure of certainty C, a pattern is a statement S in L that describes relationships among a subset F(s) of F with a certainty C.

INTERPRETATION AND EVALUTION:- The Patterns identified by the system are interpreted into knowledge which can then be used to support human decision making.

Page 34: Data warehousing

ARCHTITECTURE OF DATA MINING 

There are three tiers in the tight-coupling data mining architecture: Data layer: data layer can be database and/or data warehouse systems. This layer is an interface for all data sources. Data mining results are stored in data layer so it can be presented to end-user in form of reports or other kind of visualization.Application layer: -Data mining application layer is used to retrieve data from database. Some transformation routine can be performed here to transform data into desired format. Front-end layer: -Front-end layer provides intuitive and friendly user interface for end-user to interact with data mining system. Data mining result presented in visualization form to the user in the front-end layer.  

Page 35: Data warehousing

ARCHTITECTURE OF DATA MINING 

Page 36: Data warehousing

Adding the Data Mining Option to a Database

Once you have installed the Oracle Database software, you can build databases as needed. You might build a database without the Data Mining option but later decide to add it.

Page 37: Data warehousing

Advantages of Data Mining

Marketing / RetailData mining helps marketing companies to build models based on historical data to predict who will respond to new marketing campaign such as direct mail, online marketing campaign and etc. Data mining brings a lot of benefit s to retail company in the same way as marketing. Through market basket analysis, the store can have an appropriate production arrangement in the way that customers can buy frequent buying products together with pleasant.

Page 38: Data warehousing

Advantages of Data Mining

Finance / BankingData mining gives financial institutions information about loan information and credit reporting. By building a model from previous customer’s data with common characteristics, the bank and financial can estimate what are the good and/or bad loans and its risk level. In addition, data mining can help banks to detect fraudulent credit card transaction to help credit card’s owner prevent their losses.

Page 39: Data warehousing

Advantages of Data Mining

ManufacturingBy applying data mining in operational engineering data, manufacturers can detect faulty equipments and determine optimal control parameters.

GovernmentsData mining helps government agency by digging and analyzing records of financial transaction to build patterns that can detect money laundering or criminal activity.

Page 40: Data warehousing

Disadvantages of data mining

Privacy IssuesThe concerns about the personal privacy have been increasing enormously recently especially when internet is booming with social networks.Security issuesSecurity is a big issue. Businesses owns information about their employee and customers including social security number, birthday, payroll and etc. However how properly this information is taken is still in questions. There have been a lot of cases that hackers were accesses and stole big data of customers from big corporation such as Ford Motor Credit Company, Sony… with so much personal and financial information available, the credit card stolen and identity theft become a big problem. 

Page 41: Data warehousing

Disadvantages of data mining

Misuse of information/inaccurate informationInformation collected through data mining intended for marketing or ethical purposes can be misused. This information is exploited by unethical people or business to take benefit of vulnerable people or discriminate against a group of people.

Page 42: Data warehousing

QUERIES??

Page 43: Data warehousing

THANK YOU


Recommended