+ All Categories
Home > Data & Analytics > Data Warehouse Basic Guide

Data Warehouse Basic Guide

Date post: 16-Jul-2015
Category:
Upload: thomasmary607
View: 71 times
Download: 1 times
Share this document with a friend
Popular Tags:
25
Data Warehouse Definition Definition Importance of Data Warehouse Importance of Data Warehouse Its Components Its Components Two Data Warehousing Strategies Two Data Warehousing Strategies ETL Processes ETL Processes For a Successful Warehouse For a Successful Warehouse Data Warehouse Pitfalls Data Warehouse Pitfalls
Transcript

Data Warehouse

DefinitionDefinitionImportance of Data WarehouseImportance of Data WarehouseIts Components Its Components Two Data Warehousing StrategiesTwo Data Warehousing StrategiesETL Processes ETL Processes For a Successful Warehouse For a Successful Warehouse Data Warehouse PitfallsData Warehouse Pitfalls

Data Warehouse

A subject oriented, integrated, time-variant, non-volatile collection of data in support of management decisions (Bill Inmon) Subject oriented -- data are organized around sales,

products, etc. Integrated -- data are integrated to provide a

comprehensive view Time variant -- historical data are maintained Nonvolatile -- data are not updated by users

Limitations of Traditional Databases

lack of on-line historical data

residing in different operational systems

extremely poor query performance

operational database designs not suited for decision support

The Importance of Data Warehousing

More cost – effective decision making

Increase quality and flexibility of enterprise analysis as data warehouse contain accurate and reliable data

Ability to maintain better customer relationships

Unlimited analyses of enterprise information

Components of Data warehouse

Summarized data Basically of two type: 1) Lightly (departmental information)

2) Highly (enterprise wide decision)

Current detail Comes directly from operational system But stored by subject area and represent entire organization not a department

System of record Maintaining the source of record

Integration and transformation Programs Programs that convert an application – specific data to enterprise data

Cont..

Performs many function like Reformatting, recalculating Adding time element Identifying the default value Summarizing and merging the data Filling up the blank fields

Archives Contain old data which hold some amount of significance to the organization Used for trend analysis

Metadata Control access and analysis of the data warehouse contents To manage and control data warehouse creation and maintenance

Two Data Warehousing Strategies

Enterprise-wide warehouse, top down, the Inmon methodology

Data mart, bottom up, the Kimball methodology

When properly executed, both result in an enterprise-wide data warehouse

The Data Mart Strategy

The most common approach Begins with a single mart and are added over

time for more subject areas Relatively inexpensive and easy to implement Can be used as a proof of concept for data

warehousing Requires an overall integration plan

The Enterprise-wide Strategy

A comprehensive warehouse is built initially An initial dependent data mart is built using a

subset of the data in the warehouse Additional data marts are built using subsets of the

data in the warehouse Like all complex projects, it is expensive, time

consuming, and prone to failure When successful, it results in an integrated, scalable

warehouse

ETL Processes

Extraction, Transformation, and Loading Process

The “plumbing” work of data warehousing Data are moved from source to target data

bases A very costly, time consuming part of data

warehousing

Sample ETL Tools

Teradata Warehouse Builder from Teradata DataStage from Ascential Software SAS System from SAS Institute Power Mart/Power Center from Informatica Sagent Solution from Sagent Software

Reasons for “Dirty” Data

• Dummy Values• Absence of Data• Multipurpose Fields• Inappropriate Use of Address Lines• Violation of Business Rules• Non-Unique Identifiers• Data Integration Problems

I. Data Cleansing and Extracting

Source systems contain “dirty data” that must be cleansed ETL software contains rudimentary data cleansing capabilities Specialized data cleansing software is often used. Important

for performing name and address correction and householding functions

Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)

Steps in Data Cleansing

Parsing Correcting Standardizing Matching Consolidating

Parsing

Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files.

Examples include parsing the first, middle, and last name; street number and street name; and city and state.

Correcting

Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.

Example include replacing a vanity address and adding a zip code.

Standardizing

Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.

Examples include adding a pre name, replacing a nickname, and using a preferred street name

Matching

Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.

Examples include identifying similar names and addresses.

Consolidating

• Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.

II. Data Transformation

Transforms the data in accordance with the business rules and standards that have been established

Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates

III. Data Loading

Data are physically moved to the data warehouse

The loading takes place within a “load window”

The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications

For a Successful Warehouse

From day one establish that warehousing is a joint user/builder project

Establish that maintaining data quality will be an ONGOING joint user/builder responsibility

Train the users one step at a time Consider doing a high level corporate data model in no

more than three weeks Look closely at the data extracting, cleaning, and loading

tools

Cont..

Determine a plan to test the integrity of the data in the warehouse

From the start get warehouse users in the habit of 'testing' complex queries

Coordinate system roll-out with network administration personnel

Implement a user accessible automated directory to information stored in the warehouse

Data Warehouse Pitfalls

Many warehouse end users will be trained and never or seldom apply their training

Large scale data warehousing can become an exercise in data homogenizing

Loading information only because it is available Providing no maintenance to the data warehouse

Contact Us

Business Name: Skyline Business School

Address: Hauz Khas Enclave, New Delhi  110 016, India.

Phone: 911126864848,:911126866968

Email: [email protected]

Resource: www.skylinecollege.com/ourprogrammes/pgpdatawarehousingbusinessintelligence


Recommended