+ All Categories
Home > Technology > DATA WAREHOUSING

DATA WAREHOUSING

Date post: 14-Jul-2015
Category:
Upload: rishikese-mr
View: 490 times
Download: 1 times
Share this document with a friend
Popular Tags:
29
welcome
Transcript
Page 1: DATA WAREHOUSING

welcome

Page 2: DATA WAREHOUSING

Presented By,Neenu C. Paul(12120051)CS B, S7SOE, CUSAT

Guided By,Dr. Sudheep ElayidomDivision of Computer ScienceSOE, CUSAT

Page 3: DATA WAREHOUSING

CONTENTS• What is a data warehouse?

• What is data warehousing?

• Database vs Data warehouse

• OLTP & OLAP

• Data warehouse architecture

• Multidimensional data model

• Data Mart

• ETL

• Advantages of data warehouse

• Disadvantages of data warehouse

• S/W Solutions of data warehouse

• Conclusion

• References

Page 4: DATA WAREHOUSING

A producer wants to know….

Which are our

lowest/highest margin

customers ?

Who are my customers

and what products

are they buying?

What is the most

effective distribution

channel?

What product prom-

-otions have the biggest

impact on revenue? What impact will

new products/services

have on revenue

and margins?

Which customers

are most likely to go

to the competition ?

Page 5: DATA WAREHOUSING

What is a Data Warehouse??

• A data warehouse is an appliance for storing and analyzing data, and reporting.

• Central database that includes information from several different sources.

• Keeps current as well as historical data.

• Used to produce reports to assist in decision-making and management.

Page 6: DATA WAREHOUSING

“Data Warehouse is a subject

oriented, integrated, time-

variant and non-volatile

collection of data in support of

management’s decision making

process.” – W. H. Inmon

Data Warehouse

Subject Oriented

Integrated

Time Variant

Non-volatile

Page 7: DATA WAREHOUSING

What is Data Warehousing?

A process of transforming data into information and making it available to users in a timely enough manner to make a difference

Data

Information

Page 8: DATA WAREHOUSING

Database vs Data Warehouse

Database

• Transaction Oriented

• For saving online bargain data

• E-R modeling techniques are used for designing

• Capture data

• Constitute real time information

Data Warehouse

• Subject oriented

• For saving historical data

• Data modeling techniques are used for designing.

• Analyze data

• Constitute entire information base for all time.

Page 9: DATA WAREHOUSING

Data Processing Technologies

• OLTP (on-line transaction processing)

- The major task is to perform on-line transaction and query processing. Covers most of the day-to-day operations of an organization.

• OLAP(On-Line Analytical Processing)

- Serve knowledge workers(users) in the role of data analysis and decision making.

- Organize and present data in various formats to accommodate the diverse needs of the different users.

Data Processing

Technologies

OLTP OLAP

Page 10: DATA WAREHOUSING

OLTP vs OLAP OLTP OLAP

users clerk, IT professional knowledge worker

function day to day operations decision support

DB design application-oriented subject-oriented

data current, up-to-date

detailed, flat relational

isolated

historical,

summarized, multidimensional

integrated, consolidated

usage repetitive ad-hoc

access read/write dozens of records Millions of record read

unit of work short, simple transaction complex query

# records accessed tens millions

#users thousands hundreds

DB size 100MB-GB 100GB-TB

Page 11: DATA WAREHOUSING

11 October 31, 2014

To summarize ...

OLTP Systems are

used to “run” a business

The Data Warehouse helps

to “optimize” the business

Page 12: DATA WAREHOUSING

Typical DW Architecture

System B

System C

System D

System A

Extract

Transform

Load

The Data

Warehouse

Bu

sin

ess M

od

el

Self Serve

Data Sources ETL Data Store Data Access Presentation

Prompted Views

Dashboards

Scorecards

Ad-Hoc Reporting

12

Page 13: DATA WAREHOUSING

Multidimensional data model

• Developed for implementing data warehouse and data marts.

• Provides both a mechanism to store data and a way for business analysis.

• An alternative to entity-relationship (E/R) model

TYPES OF MULTIDIMENSIONAL DATA MODEL

Data cube model.

Star schema model.

Snow flake schema model.

Fact Constellations.

Page 14: DATA WAREHOUSING

Data cubes

• A data warehouse is based on a multidimensional data model which views data in the form of a data cube.

• Three important concepts are associated with data cubes

- Slicing

- Dicing

- Rotating

•In the cube given below we have the results of the 1991 Canadian Census with ethnic origin, age group and geography representing the dimensions of the cube, while 174 represents the measure. The dimension is a category of data. Each dimension includes different levels of categories. The measures are actual data values that occupy the cells as defined by the dimensions selected.

Page 15: DATA WAREHOUSING

1991 Canadian Census

15

Page 16: DATA WAREHOUSING

Slicing the Data Cube

• Figure 2 illustrates slicing the Ethnic origin Chinese. When the cube is sliced like in this example, we are able to generate data for Chinese origin for the geography and age groups as a result.

• The data that is contained within the cube has effectively been filtered in order to display the measures associated only with the Chinese ethnic origin.

• From an end user perspective, the term slice most often refers to a two- dimensional page selected from the cube.

16

Page 17: DATA WAREHOUSING

Dicing and Rotating

• Dicing is a related operation to slicing in which a sub-cube of the original space is defined

• Dicing provides the user with the smallest available slice of data, enabling you to examine each sub-cube in greater detail.

• Rotating, which is sometimes called pivoting changes the dimensional orientation of the report or page display from the cube data. Rotating may consist of swapping the rows an columns, or moving one of the row dimensions into the column dimension.

17

Page 18: DATA WAREHOUSING

Data Mart

• Contains a subset of the data stored in the data warehouse that is of interest to a specific business community, department, or set of users.

• E.g.: Marketing promotions, finance ,or account collections.

• Data marts are small slices of the data warehouse.

• Data marts improve end-user response time by allowing users to have access to the specific type of data they need to view.

• A data mart is basically a condensed and more focused version of a data warehouse.

Page 19: DATA WAREHOUSING

Data warehouse vs Data mart

DATA WAREHOUSE

• Holds multiple subject areas

• Holds very detailed information

• Works to integrate all data sources

• Does not necessarily use a dimensional model but feeds dimensional models

DATA MART

• Often holds only one subject area-for example, Finance, or Sales

• May hold more summarized data (although many hold full detail)

• Concentrates on integrating information from a given subject area or set of source systems

• Is built focused on a dimensional model using a star schema

Page 20: DATA WAREHOUSING

Reasons for creating a data mart

• Easy access to frequently needed data

• Creates collective view by a group of users

• Improves end-user response time

• Ease of creation

• Lower cost than implementing a full data warehouse

• Potential users are more clearly defined than in a full data warehouse

• Contains only business essential data and is less cluttered.

Page 21: DATA WAREHOUSING

Advantages & Disadvantages of data warehousingAdvantages

Enhances end-user access to a wide variety of data.

Increases data consistency.

Increases productivity and decreases computing costs.

Is able to combine data from different sources, in one place.

It provides an infrastructure that could support changes to data and replication of the changed databack into the operational systems.

Disadvantages

Extracting, cleaning and loading data could be time consuming.

Problems with compatibility with systems already in place e.g. transaction processing system.

Providing training to end-users, who end up not using the data warehouse.

Security could develop into a serious issue, especially if the data warehouse is web accessible.

Page 22: DATA WAREHOUSING

Applications of data warehousing

Industry Application

Finance Credit card Analysis

Insurance Claims, Fraud Analysis

Telecommunication Call record Analysis

Transport Logistics management

Consumer goods Promotion Analysis

Page 23: DATA WAREHOUSING

etl

• Extract-Transform-Load

• Responsible for the operations taking place in the backstage of data warehouse architecture.

• Extract : Get the data from source system as efficiently as possible

• Transform : Perform calculations on data

• Load : Load the data in the target storage

ADVANTAGES OF ETL TOOL

Simple, faster and cheaper

Deliver good performance even for very large data set

Allows reuse of existing complex programs

Page 24: DATA WAREHOUSING

Popular etl tools

Tools Company

Infomix IBM

Oracle Warehouse Builder ORACLE

Microsoft SQL Server Integration Microsoft

Page 25: DATA WAREHOUSING

IBM Infomix

• Informix is one of the world’s most widely used database servers

• High levels of performance and availability, distinctive capabilities in data replication and scalability, and minimal administrative overhead.

HIGHLIGHTS

Real-time Analytics: Informix is a single platform that can power OLTP and OLAP workloads and successfully meet service-level agreements (SLAs) for each

Fast, Always-on Transactions: Provides one of the industry’s widest sets of options for keeping data available at all times, including zero downtime for maintenance

Sensor data management: Solves the big data challenge of sensor data with unmatched performance and scalability for managing time series data

Easy to Use: Informix runs virtually unattended with self-configuring, self-managing and self-healing capabilities

Best-of-breed embeddability: Provides a proven embedded data management platform for ISVs and OEMs to deliver integrated, world-class solutions, enabling platform independence

NoSQL capability:IBM Informix unleashes new capabilities, giving you a way to combine unstructured and structured data in a smart way, bringing NoSQL to your SQL database.

Page 26: DATA WAREHOUSING

Data Warehousing is not a new phenomenon. All large

organizations already have data warehouses, but they are just not

managing them. Over the next few years, the growth of data

warehousing is going to be enormous with new products and

technologies coming out frequently. In order to get the most out of this

period, it is going to be important that data warehouse planners and

developers have a clear idea of what they are looking for and then

choose strategies and methods that will provide them with

performance today and flexibility for tomorrow.

conclusion

Page 27: DATA WAREHOUSING

Reference

1) Data Mining , Gupta

2) Data Warehousing , C.S.R. Prabhu

3) Jeff Lawyer and Shamsul Chowdhury “Best Practices in Data Warehousing to Support Business Initiatiatives and Needs”, IEEE 2004

4) Ruilian Hou “Research and Analysis of Data Warehouse Technologies”, IEEE 2011

5) S. Sai Sathyanarayana Reddy, Dr. L.S.S.Reddy, Dr.V.Khanna, A.Lavanya “Advanced Techniques for Scientific Data Warehousing”, IEEE 2009

6) Murat Obali, Abdul Kadir Gorur, “A Real Time Data Warehouse Approach for Data Processing”, IEEE 2013

7) Ruilian Hou “Analysis and research on the difference between data warehouse and database”, IEEE 2011

Page 28: DATA WAREHOUSING

Questions ????

Page 29: DATA WAREHOUSING

THANK YOU!!!!!


Recommended