Date post: | 27-Jan-2015 |
Category: |
Technology |
Upload: | pcherukumalla |
View: | 111 times |
Download: | 0 times |
Data Warehousing Concepts and Design
Introduction & Ground Rules
Objectives
Data Warehousing Concepts
• What is Business Intelligence (BI)?• Evolution of BI• Characteristics of an OLTP system• Why OLTP is not suitable for complex analysis?• Characteristics of a Data Warehouse• Define DWH and its properties – • Subject Oriented, Integrated, Time variant, Non-Volatile• Define Grain/Granularity• Differentiate between OLTP and Data Warehouse• User expectations and User community• Enterprise Data Warehouse• Data Warehouse versus Data marts• Dependent Data marts• Independent Data marts• Data Warehouse components – • Source systems, Staging area, Presentation area, Access tools
Objectives
Data Warehousing Concepts
• Goals of a Data Warehouse• Data Warehouse development approaches - • Top-down, Bottom-up, Hybrid, Federated• Incremental approach to warehouse development• Dimensional Modeling• Star Schema – Fact and Dimension tables• Dimensions and Measure objects• Snowflake Schema• Types of Fact tables• Factless Fact table• OLAP storage modes – MOLAP, ROLAP, HOLAP, DOLAP• Slowly and Rapidly changing Dimensions- Type I, II, III• Degenarated Dimension• Junk Dimension• CASE-STUDIES
What is Business Intelligence (BI)?
“Business Intelligence (BI) is the process of transforming data into information, information into knowledge and through iterative discoveries turning knowledge into Intelligence.”
– — Gartner group
Objective of Business Intelligence
Value
Volume
Intelligence
Knowledge
Information
Data
BI can be defined as taking ‘Decisions based on Data’.The objective of BI is to transform large volumes of data into useful information.
Evolution of BI
– Executive information systems (EIS)– Management Information System (MIS)– Decision Support Systems (DSS)– Business Intelligence (BI)
EIS
MIS
DSS
BI
Information
Information in an organization could exists in two different types of systems:
– Online Transaction Processing (OLTP) systems(Operational Systems)
– Data Warehouse (DWH) systems
Both OLTP and DWH systems have different purpose, business needs and users.
Features of OLTP Systems
OLTP systems handle day-to-day transactions and operations of the business. They are high performance, high throughput systems. They run mission critical applications.
OLTP systems store, update and retrieve Operational Data. Operational Data is the data that runs the business.
Some of the Operational systems that we interact with are Net Banking system, Tax Accounting system, Payroll package, Order-processing system, SAP, Airline reservation system etc.
Why OLTP systems are not suitable for analysis?
OLTP Analytical Reporting
Supports day-to-day operations Historical information to analyze
Data stored at transaction level Data required at summary level
Islands of operational systems Data needs to be integrated
Database design: Normalized
Database design: Dimensional
OLTP Versus Data Warehouse
Property OLTP Data Warehouse
Response Time Sub seconds to seconds Seconds to hours
Operations DMLData goes in
Primarily Read onlyData goes out
Age of Data 30 – 60 days or 1 year - 2 years.Current
Snapshots over time(Quarter, Month, etc).Historical
Data Organization Application Subject, time
Size Small to large Few MB to GB
Large to very large,Few GB to TB
OLTP Versus Data Warehouse
Property OLTP Data Warehouse
Data Sources Operational, Internal Operational,Internal, External
Activities Processes Analysis
No. of records One record at a time Thousands to millionsof records
Grain Atomic (Detail),transactional level,Highest granularity
Atomic and/or Summarized (aggregate),less granularity
Database Design Normalized De-Normalized, Star schema
Data Extract Processing
A logical progression towards a data warehouse – Data Extracts
– End user computing offloaded from the operational environment– User’s own data
Decision
makers
Operational
systems
Extracts
Issues with Data Extract Programs
ExtractsOperational systems
Decisionmakers
Extract Explosion
Data Quality Issues with Extract Processing
– No common time basis– Different calculation algorithms– Different levels of extraction– Different levels of granularity– Different data field names– Different data field meanings– Missing information– No data correction rules– No Metadata– No drill-down capability
Data Warehousing and Business Intelligence
Advances Enabling Data Warehousing
Technology
– Hardware– Operating system– Database– BI Tools & Applications
Business
– Competition
Definition of a Data Warehouse
“A data warehouse is a subject oriented, integrated, non-volatile,
and time-variant collection of data to support management decisions.”
— Bill Inmon
Data Warehouse Properties
Integrated
Time-variantNonvolatile
Subject-oriented
DataWarehouse
Subject-Oriented
• Data is categorized and stored by business subject rather than by application.
OLTP Applications
Equity Plans
Shares
Insurance
Loans
Savings
Data Warehouse
Subject
Customer
financial
information
Integrated
• Data on a given subject is collected from various sources and stored once.
Data WarehouseOLTP Applications
Customer
Savings
Current Accounts
Loans
Data Warehouse
Time-Variant
• Data is stored as a series of snapshots, each representing a period of time.
Non-volatile
• Typically data in the data warehouse is not updated or deleted.
Warehouse
Read
Load
Operational
Insert, Update, Delete, or Read
Changing Warehouse Data
Operational Databases Warehouse Database
First time load
Refresh
Refresh
RefreshPurge or Archive
Goals of a Data Warehouse
• The Data Warehouse must assist in decision making process
• The Data Warehouse must meet the requirements of the business community
• The Data Warehouse must provide easy access to information
• The Data Warehouse must present information consistently and accurately
• The Data Warehouse must be adaptive and resilient to change
• The Data Warehouse must provide a secured access to information
Usage Curves
– Operational system is predictable
– Data warehouse:• Variable• Random
User Expectations
– Control expectations– Set achievable targets for query response– Set SLAs– Educate business and end users– Growth and use is exponential
Enterprisewide Data Warehouse
– Large scale implementation– Scopes the entire business– Data from all subject areas– Developed incrementally– Single source of enterprisewide data– Synchronized enterprisewide data– Single distribution point to dependent data marts
Data Warehouse Vocabulary
– Grain of Data - Granularity
Grain is defined as the level of detail of data captured in the data warehouse. More the detail, higher the granularity and vice-versa
– Fact table
It is similar to the transaction table in an OLTP system. It stores the facts or measures of the business. Eg: SALES, ORDERS
– Dimension table
It is similar to the master table in an OLTP system. It stores the textual descriptors of the business. Eg: CUSTOMER, PRODUCT
Data Marts
• A Data mart is a subset of data warehouse.
• A data mart is designed for a single line of business (LOB) or functional area such as sales, finance, or marketing.
Data Warehouses Versus Data Marts
Property Data Warehouse Data Mart
Scope Enterprise Department
Subjects Multiple Single-subject, LOB
Data Source Many Few
Implementation time Months to years Months
Size 100 GB to > 1 TB < 100 GB
Initial effort, cost, Risk Higher Lower
Next level of migration Data Mart Data Warehouse
Approach Top-Down Bottom-up
Dependent Data Mart
Data Warehouse
Data Marts
Flat FilesMarketing
Sales
Finance
MarketingSales
FinanceHR
OperationalSystems
External Data
Operations Data
Legacy Data
External Data
Independent Data Mart
Sales orMarketing
Flat Files
OperationalSystems
External Data
Operations Data
Legacy Data
External Data
Warehouse Development Approaches
• Top-down approach(Big-Bang)
• Bottom-up approach
• Hybrid approach(Combination)
• Federated approach
Top-Down Approach
Build the Data Warehouse
Build the Data Marts
Top-Down Approach
Data Warehouse
Data Marts
Flat FilesMarketing
Sales
Finance
MarketingSales
FinanceHR
OperationalSystems
External Data
Operations Data
Legacy Data
External Data
Bottom-Up Approach
Build Data Marts
Build the Data Warehouse
Bottom-Up Approach
Data Warehouse
Data Marts
Marketing
Sales
Finance
OperationalSystems
External Data
Operations Data
Legacy Data
Hybrid Approach
The hybrid approach tries to blend the best of both “top-down and “bottom-up” approaches
Starts by designing DW and DM models synchronously,Build out first 2-3 DMs that are mutually exclusive and criticalBackfill a DW behind the DMs Build the enterprise model and move atomic data to the DW
Federated Approach
This approach is referred to as “an architecture of architectures”.
Emphasizes the need to integrate new and existing heterogeneous BI environments.
Data Warehouse Components
Source Systems
Staging Area
Presentation Area
AccessTools
ODS
Operational
External
Legacy
Metadata Repository
Data Marts
Data Warehouse
Examining Data Sources
– Production– Archive– Internal– External
Production Data
– Operating system platforms– File systems– Database systems – Vertical applications
IMS
DB2
Oracle
Sybase
Informix
VSAM
SAP
Dun and Bradstreet Financials
Oracle Financials
Baan
PeopleSoft
Archive Data
– Historical data– Useful for analysis over long periods of time– Useful for first-time load
Operation databases
Warehouse database
Internal Data
– Planning, sales, and marketing organization data– Maintained in the form of:
• Spreadsheets (structured)• Documents (unstructured)
– Treated like any other source data
Warehouse database
Planning
Accounting
Marketing
External Data
– Information from outside the organization– Issues of frequency, format, and predictability – Described and tracked using metadata
A.C. Nielsen, IRI, IMRB, ORG-MARG
Barron's
Dun and Bradstreet
Purchased databases
Wall Street Journal
Economic forecasts
Competitive information
Warehousingdatabases
Extraction, Transformation and Loading (ETL)
Extraction, Transformation and Loading (ETL)
• “Effective data extract, transform and load (ETL) processes represent the number one success factor for your data warehouse project and can absorb up to 70 percent of the time spent on a typical data warehousing project.”
– DM Review, March 2001
Source TargetStaging Area
Staging Models
• Remote staging model
• Onsite staging model
Remote Staging Model
LoadWarehouse
LoadWarehouse
Data staging area within the warehouse environment
Data staging area in its own independent environment
Operationalsystem
Extract
Operationalsystem
Extract
Transform
Staging area
Transform
Staging area
On-site Staging Model
• Data staging area within the operational environment, possibly affecting the operational system
Extract Load
Warehouse
Operational system
Transform
Staging area
Extraction Methods
– Logical Extraction methods:• Full Extraction• Incremental Extraction
Extraction Methods
– Physical Extraction methods:• Online Extraction• Offline Extraction
ETL Techniques
– Programs: C, C++, COBOL, PL/SQL, Java
– Gateways: Transparent Database Access
– Tools:• In-house developed tools • Vendor’s ETL tools (Ideal technique)
Mapping Data
• Mapping data defines:– Which operational attributes to use– How to transform the attributes for the warehouse– Where the attributes exist in the warehouse
Metadata
File A
F1
Staging File One
Number
F2
F3
Name
DOB
Staging File OneNumber USA123Name Mr. BloggsDOB 10-Dec-56
File AF1 123F2 BloggsF3 10/12/56
Transformation Routines
– Cleaning data– Eliminating inconsistencies– Adding elements– Merging data– Integrating data– Transforming data before load
Transforming Data: Problems and Solutions
– Data Anomalies– Multipart keys– Multiple local standards– Multiple files– Missing values– Duplicate values– Element names– Element meanings– Input formats– Referential Integrity constraints– Name and address
Data Anomalies
– No unique key– Data naming and coding anomalies– Data meaning anomalies between groups– Spelling and text inconsistencies
CUSNUM NAME ADDRESS
90233479 Oracle Limited 100 N.E. 1st St.
90233489 Oracle Computing 15 Main Road, Ft. Lauderdale
90234889 Oracle Corp. UK 15 Main Road, Ft. Lauderdale, FLA
90345672 Oracle Corp UK Ltd 181 North Street, Key West, FLA
Multipart Keys Problem
• Multipart keys
Country code
Sales territory
Productnumber
Salesperson code
Product code = 12 M 654313 45
Multiple Local Standards Problem
– Multiple local standards– Tools or filters to preprocess
cm
inches
cm USD 600
1,000 GBP
FF 9,990
DD/MM/YY
MM/DD/YY
DD-Mon-YY
Multiple Source Files Problem
– Added complexity of multiple source files
Transformeddata
Multiple source files
Logic to detectcorrect source
Missing Values Problem
• Solution:– Ignore– Wait– Mark rows– Extract when time-stamped
If NULL thenfield = ‘A’
A
Duplicate Values Problem
• Solution:– SQL self-join techniques– RDMBS constraint utilities
ACME Inc
ACME Inc
ACME Inc
SQL> SELECT ... 2 FROM table_a, table_b 3 WHERE table_a.key (+)= table_b.key 4 UNION 5 SELECT ... 6 FROM table_a, table_b 7 WHERE table_a.key = table_b.key (+);
Element Names Problem
• Solution:– Common naming conventions
Customer
Customer
Client
Contact
Name
Element Meaning Problem
– Avoid misinterpretation– Complex solution– Document meaning in metadata
Product number
p_no
Purchase order number Policy number
Input Format Problem
ASCIIEBCDIC
12373“123-73”
ACME Co.
áøåëéí äáàéí Beer (Pack of 8)
• Different character sets or data-types
Referential Integrity Problem
• Solution:– SQL anti-join (outer join)– Server constraints– Dedicated tools
Department10
20
30
40
Emp Name Department1099 Smith 10
1289 Jones 20
1234 Doe 50
6786 Harris 60
Name and Address Problem
– Single-field format– Multiple-field format
Mr. J. Smith,100 Main St., Bigtown, County Luth, 23565
Database 1NAME LOCATIONDIANNE ZIEFELD N100
HARRY H. ENFIELD M300
Database 2NAME LOCATIONZIEFELD, DIANNE 100
ENFIELD, HARRY H 300
Name Mr. J. Smith
Street 100 Main St.
Town Bigtown
Country County Luth
Code 23565
Transformation Timing and Location
– Transformation is performed:• Before load• In parallel while loading
– Can be initiated at different points:• On the operational platform• In a separate staging area
Adding a Date Stamp: Fact Tables and Dimensions
Item TableItem_idDept_id
Time_key
Store TableStore_id
District_idTime_key
Sales Fact TableItem_idStore_idTime_key
Sales_dollarsSales_units
Time TableWeek_idPeriod_idYear_id
Time_key
Product TableProduct_idTime_key
Product_desc
Summarizing Data
1. During extraction on staging area
2. After loading to the warehouse server
Operationaldatabases
Warehousedatabase
Staging area
Loading Data into the Warehouse
– Loading moves the data into the warehouse– Loading can be time-consuming:
• Consider the load window• Schedule and automate the loading
– Initial load moves large volumes of data– Subsequent refresh moves smaller volumes of data
Operationaldatabases
Warehousedatabase
Staging area
Extract
Transform
Transport,Load
Load Window Requirements
– Time available for entire ETL process– Plan– Test– Prove – Monitor
0 3 am 6 9 12 pm 3 6 9 12
User Access PeriodLoad Window Load Window
0 3 am 6 9 12 pm 3 6 9 12
User Access Period
Planning the Load Window
– Plan and build processes according to a strategy.– Consider volumes of data.– Identify technical infrastructure.– Ensure currency of data.– Consider user access requirements first.– High availability requirements may mean a small load window.
Initial Load and Refresh
• Initial Load:– Single event that populates the database with historical data– Involves large volumes of data– Employs distinct ETL tasks– Involves large amounts of processing after load
• Refresh:– Performed according to a business cycle– Less data to load than first-time load– complex ETL tasks– Smaller amounts of post-load processing
Data Refresh Models
Extract Processing Environment– After each time interval, build a new snapshot of the database.– Purge old snap shots.
T1 T2 T3
Operationaldatabases
Data Refresh Models
Warehouse Environment– Build a new database the first time.– After each time interval, add delta changes to database.– Archive or purge oldest data.
T1 T2 T3
Operationaldatabases
Post-Processing of Loaded Data
Post-processing of loaded data
Create indexes
Generate keys
Summarize Filter
Extract
Transform
LoadWarehouseStaging area
Unique Indexes
– Disable constraints before load.– Enable constraints after load.– Re-create index if necessary.
Load data
Disableconstraints
Enableconstraints
Create index Reprocess
Catch errors
Creating Derived Keys
• The use of derived (sometimes referred as generalized or artificial key or synthetic key or a surrogate or a warehouse key) is recommended to maintain the uniqueness of a row.
• Method– Concatenate key– Assign a number sequentially from a list
109908 01109908
109908 100
Metadata repository
Metadata Users
End users
Developers IT Professionals
Metadata Documentation Approaches
– Automated• Data modeling tools• ETL tools
– Manual
Data Warehouse Design
Dimensional Modeling
I. Identify the ‘Business Process’
II. Determine the ‘Grain’
III. Identify the ‘Facts’
IV. Identify the ‘Dimensions’
Existing Metadata Production ERD Model
BusinessRequirements
Research
Business Requirements Drive the Design Process
– Primary input
– Secondary input
Perform Strategic Analysis
– Identify crucial business processes– Understand business processes– Prioritize and select the business processes to implement
BusinessBenefit
Low High
Low
High
Feasibility
Using a Business Process Matrix
DW Bus Architecture
Business Dimensions
Business ProcessesSales Returns Inventory
Customer
Date
Product
Channel
Promotion
Conformed Dimensions
• Dimensions are conformed when they are exactly the same including the keys or one is a perfect subset of the other.
• DW bus architecture provides a standard set of conformed dimensions
Determine the Grain
YEAR?
QUARTER?
MONTH?
WEEK?
DAY?
04/10/2393
Documenting the Granularity
• Is an important design consideration
• Determines the level of detail
• Is determined by business needs
Low-level grain (Transaction-level data)
High-level grain (Summary data)
Defining Time Granularity
Fiscal Time Hierarchy
Current dimension grain
Fiscal Year
Fiscal Quarter
Fiscal Month
Fiscal Week
Day Future dimension grain
Identify the Facts and Dimensions
•The attribute is perceived as constant or discrete:
– Product– Location– Time– Size
•The attribute varies continuously:
– Balance– Units Sold– Cost– Sales
Facts (Measures)
Dimensions
Data Warehouse Environment Data Structures
The data structures that are commonly found in a data warehouse environment:
– Third normal form (3NF)– Star schema– Snowflake schema
Star Schema
Customer Location
Sales
Supplier Product
Star Schema Model
Product TableProduct_idProduct_disc,...
Time TableDay_idMonth_idYear_id,...
Sales Fact TableProduct_idStore_idItem_idDay_idSales_amountSales_units, ...
Item TableItem_idItem_desc,...
Store TableStore_idDistrict_id,...
Central fact table
Denormalizeddimensions
Fact Table Characteristics
– Contain numerical metrics of the business– Can hold large volumes of data– Can grow quickly– Can contain base, derived,
and summarized data– Are typically additive– Are joined to dimension tables
through foreign keys that reference
Primary keys in the dimension tables
Sales Fact TableProduct_idStore_idItem_idDay_idSales_amountSales_units...
Dimension Table Characteristics
– Contain descriptors of the business /
textual information that represents the attributes of the business– Contain relatively static data
– Are usually smaller than fact tables
– Are joined to a fact table through
a foreign key reference
Item TableItem_idItem_desc,...
Advantages of Using a Star Dimensional Model
– Design improves performance by reducing table joins.
– The model is easy for users to understand.– Supports multidimensional analysis.– Provides an extensible design
– Primary keys represent a dimension.
– Non-foreign key columns are values.
– Facts are usually highly normalized.
– Dimensions are completely de-normalized.
– End users can express complex queries.
Base and Derived Data
Payroll table
Derived dataBase data
Emp_FK Month_FK Salary Comm Comp101 05 1,000 0 1,000102 05 1,500 100 1,600103 05 1,000 200 1,200104 05 1,500 1,000 2,500
Translating Business Measures into a Fact Table
Business measures
Facts
Business MeasuresNumber of ItemsAmountCostProfit
FactNumber of ItemsItem Amount
Item CostProfit
BaseBaseBaseDerived
Snowflake Schema Model
Time TableWeek_idPeriod_idYear_id
Dept TableDept_id
Dept_descMgr_id
Mgr TableDept_idMgr_id
Mgr_name
Product TableProduct_id
Product_desc
Item TableItem_id
Item_descDept_id
Sales Fact TableItem_idStore_idProduct_idWeek_id
Sales_amountSales_units
Store TableStore_idStore_descDistrict_id
District TableDistrict_idDistrict_desc
04/10/23105
Snowflake Model
. . . .
Order
Web
History_PK
Customer
History History_FKCustomer_FKProduct_FKChannel_FK
Item_nbrItem_descQuantityDiscnt_priceUnit-priceOrder_amt…
Product
Channel
Channel_PK
Web_PKChannel_desc
Customer_PK
. . . .
Product_PK
. . . .
Web_PK
Web_url
Snowflake Schema Model
– Provides for speedier data loading– Can become large and unmanageable– Degrades query performance– More complex metadata
– Facts are usually highly normalized
– Dimensions are also normalized
Country State County City
Constellation Configuration
Atomic fact
Fact Table Measures
Nonadditive:Cannot be added
along any dimension
Semiadditive: Added along some
dimensions
Additive: Added across all
dimensions
04/10/23109
More on Factless Fact Tables
Emp_FKSal_FKAge_FKEd_FKGrade_FK
Grade dimensionGrade_PK
Education dimensionEd_PK
Employee dimensionEmp_PK
Salary dimensionSal_PK
Age dimensionAge_PK
PK = Primary Key & FK = Foreign Key
Factless Fact Tables
– Event tracking
– Coverage
04/10/23111
Bracketed Dimensions
– Enhance performance and analytical capabilities
– Create groups of values for attributes with many unique values, such as income ranges and age brackets
– Minimize the need for full table scans by pre-aggregating data
04/10/23112
Bracketing Dimensions
Customer_PKBracket_FK
Bracket_PK
Customer_PKBracket_FK
Bracket dimension
Customer dimension
Income fact
Bracket_PK Income (10Ks) Marital Status Gender Age
1 60-90 Single Male <21
2 60-90 Single Male 21-35
3 60-90 Single Male 35-55
4 60-90 Single Male >55
5 60-90 Single Female <21
6 60-90 Single Female 21-35
04/10/23113
Identifying Analytical Hierarchies
Store dimension
Store IDStore DescLocationSizeTypeDistrict IDDistrict DescRegion IDRegion Desc
Business hierarchies describe organizational structure and logical parent-child relationships within the data.
Region
District
Store
Organization hierarchy
04/10/23114
Multiple Hierarchies
Store IDStore DescLocationSizeTypeDistrict ID District DescRegion IDRegion DescCity IDCity DescCounty IDCounty DescState IDState Desc
Region
District
Store
Organization hierarchy
Store dimension
Region
District
Store
Geography hierarchy
04/10/23115
Multiple Time Hierarchies
Fiscal year
Fiscal quarter
Fiscal month
Fiscal time hierarchy
Fiscal week
Calendar year
Calendar quarter
Calendar month
Calendar time hierarchy
Calendar week
04/10/23116
Store 5Store 1 Store 2
Region 2
District 2 District 4
Drilling Up and Drilling Down
Store 4
Group
Market Hierarchy
Region 1
District 1
Store 6Store 3
District 3
Region
District
Drilling Across
Stores > 20,000 sq. ft.
Group
Market hierarchy
Region
District
Store Store City
City
City hierarchy
Using Time in the Data Warehouse
– Defining standards for time is critical.
– Aggregation based on time is complex.
– Time is critical to the data warehouse. A consistent representation of time is required for extensibility.
Where should the element of time be stored?
Timedimension
Sales fact
Date Dimension
– Should Date Dimension be modeled?
Applying the Changes to Data
• You have a choice of techniques:– Overwrite a record– Add a record– Add a field– Maintain history– Add version numbers
OLAP Models
– Relational (ROLAP)
– Multidimensional (MOLAP)
– Hybrid (HOLAP)
– Desktop (DOLAP)
Slowly Changing Dimensions (SCDs)
What is a SCD?
It is a dimension that has attribute data that needs to be updated, rather slowly over time.
There are 3 standard ways outlined by Kimball (and others) to handle this situation:– Type-I– Type-II– Type-III
Type I - Overwriting a Record
– Easy to implement– Loses all history– Not recommended
42135 John Doe Single42135 John Doe Married
Type II - Adding a New Record
– History is preserved; dimensions grow.– Generalized key is created.
42135 John Doe Single
42135_01 John Doe Married
Type III - Adding a Current Field
– Maintains some history– Loses intermediate values– Is enhanced by adding an Effective Date field
42135 John Doe Single
42135 John Doe Single Married 1-Jan-01
Maintain History
History tables:– One-to-many relationships– One current record and many history records
Product
Time
Sales
HIST_CUST
CUSTOMER
Versioning
– Avoid double counting– Facts hold version number
Time
Product
Customer
Customer.CustId Version Customer Name
1234 1 Comer
1234 2 Comer
Sales.CustId Version Sales Facts
1234 1 $11,000
1234 2 $12,000
Sales
Rapidly Changing Dimensions (RCDs)
It is a dimension that has attribute data that needs to be updated, rather quickly over time.
Also referred to as Rapidly Changing Monster dimension.
Create a separate dimension referred to as mini dimension
DemographicsKey
Age children income
1 20–24 0 <20000
2 20-24 1-2 20000 – 30000
3 20-24 > 2 >30000
4 25-30 0 <20000
5 25-30 1-2 20000 – 30000
:::: ::::: :::: ::::::::::
Mini Dimension
Junk Dimension
Junk dimension is an abstract dimension with the decodes for a group of low cardinality flags and indicators, thereby removing them from fact table.
Junk Key Payment Type Order type Order Mode
1 Cash Normal Web
2 Cash Urgent Web
3 Credit Normal Fax
4 Credit Urgent Fax
::::: ::::: :::::: ::::;
Junk Dimension
Secret of Success
Think big, start small!
References
Useful web sites:
http://www.dmreview.comhttp://www.rkimball.comhttp://www.billinmon.comhttp://www.dmforum.orghttp://www.freedatawarehouse.com
Thank-you