Date post: | 16-Jul-2015 |
Category: |
Data & Analytics |
Upload: | thomasmary607 |
View: | 71 times |
Download: | 1 times |
Data Warehouse
DefinitionDefinitionImportance of Data WarehouseImportance of Data WarehouseIts Components Its Components Two Data Warehousing StrategiesTwo Data Warehousing StrategiesETL Processes ETL Processes For a Successful Warehouse For a Successful Warehouse Data Warehouse PitfallsData Warehouse Pitfalls
Data Warehouse
A subject oriented, integrated, time-variant, non-volatile collection of data in support of management decisions (Bill Inmon) Subject oriented -- data are organized around sales,
products, etc. Integrated -- data are integrated to provide a
comprehensive view Time variant -- historical data are maintained Nonvolatile -- data are not updated by users
Limitations of Traditional Databases
lack of on-line historical data
residing in different operational systems
extremely poor query performance
operational database designs not suited for decision support
The Importance of Data Warehousing
More cost – effective decision making
Increase quality and flexibility of enterprise analysis as data warehouse contain accurate and reliable data
Ability to maintain better customer relationships
Unlimited analyses of enterprise information
Components of Data warehouse
Summarized data Basically of two type: 1) Lightly (departmental information)
2) Highly (enterprise wide decision)
Current detail Comes directly from operational system But stored by subject area and represent entire organization not a department
System of record Maintaining the source of record
Integration and transformation Programs Programs that convert an application – specific data to enterprise data
Cont..
Performs many function like Reformatting, recalculating Adding time element Identifying the default value Summarizing and merging the data Filling up the blank fields
Archives Contain old data which hold some amount of significance to the organization Used for trend analysis
Metadata Control access and analysis of the data warehouse contents To manage and control data warehouse creation and maintenance
Two Data Warehousing Strategies
Enterprise-wide warehouse, top down, the Inmon methodology
Data mart, bottom up, the Kimball methodology
When properly executed, both result in an enterprise-wide data warehouse
The Data Mart Strategy
The most common approach Begins with a single mart and are added over
time for more subject areas Relatively inexpensive and easy to implement Can be used as a proof of concept for data
warehousing Requires an overall integration plan
The Enterprise-wide Strategy
A comprehensive warehouse is built initially An initial dependent data mart is built using a
subset of the data in the warehouse Additional data marts are built using subsets of the
data in the warehouse Like all complex projects, it is expensive, time
consuming, and prone to failure When successful, it results in an integrated, scalable
warehouse
ETL Processes
Extraction, Transformation, and Loading Process
The “plumbing” work of data warehousing Data are moved from source to target data
bases A very costly, time consuming part of data
warehousing
Sample ETL Tools
Teradata Warehouse Builder from Teradata DataStage from Ascential Software SAS System from SAS Institute Power Mart/Power Center from Informatica Sagent Solution from Sagent Software
Reasons for “Dirty” Data
• Dummy Values• Absence of Data• Multipurpose Fields• Inappropriate Use of Address Lines• Violation of Business Rules• Non-Unique Identifiers• Data Integration Problems
I. Data Cleansing and Extracting
Source systems contain “dirty data” that must be cleansed ETL software contains rudimentary data cleansing capabilities Specialized data cleansing software is often used. Important
for performing name and address correction and householding functions
Leading data cleansing vendors include Vality (Integrity), Harte-Hanks (Trillium), and Firstlogic (i.d.Centric)
Parsing
Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files.
Examples include parsing the first, middle, and last name; street number and street name; and city and state.
Correcting
Corrects parsed individual data components using sophisticated data algorithms and secondary data sources.
Example include replacing a vanity address and adding a zip code.
Standardizing
Standardizing applies conversion routines to transform data into its preferred (and consistent) format using both standard and custom business rules.
Examples include adding a pre name, replacing a nickname, and using a preferred street name
Matching
Searching and matching records within and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.
Examples include identifying similar names and addresses.
Consolidating
• Analyzing and identifying relationships between matched records and consolidating/merging them into ONE representation.
II. Data Transformation
Transforms the data in accordance with the business rules and standards that have been established
Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates
III. Data Loading
Data are physically moved to the data warehouse
The loading takes place within a “load window”
The trend is to near real time updates of the data warehouse as the warehouse is increasingly used for operational applications
For a Successful Warehouse
From day one establish that warehousing is a joint user/builder project
Establish that maintaining data quality will be an ONGOING joint user/builder responsibility
Train the users one step at a time Consider doing a high level corporate data model in no
more than three weeks Look closely at the data extracting, cleaning, and loading
tools
Cont..
Determine a plan to test the integrity of the data in the warehouse
From the start get warehouse users in the habit of 'testing' complex queries
Coordinate system roll-out with network administration personnel
Implement a user accessible automated directory to information stored in the warehouse
Data Warehouse Pitfalls
Many warehouse end users will be trained and never or seldom apply their training
Large scale data warehousing can become an exercise in data homogenizing
Loading information only because it is available Providing no maintenance to the data warehouse
Contact Us
Business Name: Skyline Business School
Address: Hauz Khas Enclave, New Delhi 110 016, India.
Phone: 911126864848,:911126866968
Email: [email protected]
Resource: www.skylinecollege.com/ourprogrammes/pgpdatawarehousingbusinessintelligence