+ All Categories
Home > Documents > The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other...

The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other...

Date post: 18-Jan-2018
Category:
Upload: alannah-maxwell
View: 222 times
Download: 0 times
Share this document with a friend
Description:
The Business Dimensional Lifecycle Project Planning Early Critical Tasks –definition –scope –readiness assessment –business justification Remaining Tasks –Resource requirements and identification –Schedule construction and integrations
104
The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize conform dimensions store awaiting replication export to data marts No user query services Data Staging Area Data Mart #1 OLAP (ROLAP, MOLAP,HOLAP) dimensional access subject oriented user group driven refresh frequency conforms to the Bus Data Mart #2 Data Mart #3 Populate, replicate, recover DW Bus DW Bus Corporate View
Transcript
Page 1: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Basic Structure

Extract

Source Data

Extract

Storage: flat files (fastest);RDBMS;other

Processing:clean;prune;combine;remove duplicationstandardizeconform dimensionsstore awaiting replicationexport to data marts

No user query services

Data Staging AreaData Mart #1OLAP (ROLAP,MOLAP,HOLAP)

dimensional accesssubject orienteduser group drivenrefresh frequencyconforms to the Bus

Data Mart #2

Data Mart #3

Populate,replicate,recover

DW Bus

DW Bus

Corporate View

Page 2: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Basic Structure

Data Mart #1OLAP (ROLAP,MOLAP,HOLAP)

dimensional accesssubject orienteduser group drivenrefresh frequencyconforms to the Bus

Data Mart #2

Data Mart #3DW Bus

DW Bus

Corporate Staging Area User Access

Ad Hoc Query Tools

Reporting Tools and Writers

Customized Applications

Models:forecasting;scoring;allocating;data mining;scenario analysis;etc.

Data Feed

Data Feed

Data Feed

Page 3: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Project Planning Early Critical Tasks– definition– scope– readiness assessment– business justification

• Remaining Tasks– Resource requirements and identification– Schedule construction and integrations

Page 4: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Business Requirements Definition– Critical to success– Designers must understand the business needs– A plan to extract users needs and to understand

them must be developed.

Page 5: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Three project tracks follow the business requirements definition process:– Data track– Technology track– Application track

Page 6: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Data Track– Dimensional modeling– Physical design– Data staging design and development

Page 7: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Technology Track– Technical architecture design– Things to consider:

• business requirements• current technical environment• planned strategic technical directions

Page 8: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Application Track– Product identification, selection, and

installation– End user application development

• Configuring the metadata repository access• Building specialized applications

Page 9: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Deployment– The integration of all the pieces of the puzzle.– The best warehouse will fail if deployment is

not properly planned– Plan required prior to deployment are:

• education• user support• feedback• enhancement/maintenance

Page 10: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Maintenance and Growth– Work never stops!!!– Critical to support and stay connected to the

users to ensure the warehouse meets their needs.

– Watch performance and plan ahead (the backroom)

– Collect and analyze metrics regarding use and operation

Page 11: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Maintenance and Growth (cont)– If you are successful, change is inevitable. Plan

and prioritize future initiatives with user buy-in.– Always plan for expansion and growth with

each new increment or change.

Page 12: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

• Project Management– Monitor project status– Track issues– Control change– Project communication– Project marketing– Project politician– Project visionary

Page 13: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Business Dimensional Lifecycle

TechnicalArchitecture

Design

ProductSelection &Installation

End-UserApplication

Specification

End-UserApplication

Development

ProjectPlanning

Business

Requirement

Definition

DeploymentMaintenance

andGrowth

Project Management

DimensionalModeling

PhysicalDesign

Data StagingDesign &

Development

Page 14: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Who Wants the Warehouse?– A single visionary user

• desirable because the focus remains manageable• requires political leverage to make it work• the need must have broad and definable impacts to show worth

– Multiple demands• Many organizations want a data mart or warehouse• Focus is spread, therefore politics and planning play a vital

role

Page 15: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Who Wants the Warehouse? (cont)– No identified need

• Organization wanting to get in the “warehouse” game

• More effort on the warehouse team to identify the need

• It is highly likely there will be one.

Page 16: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management• Determine Warehouse Readiness

– Do you have a strong business sponsor?• Vision• Politically savvy• Connected• Influential• History of success• Respected• Realistic• Understands the need and the process and can communicate it

Page 17: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Determine Warehouse Readiness (cont)– Without this person you will fail– Try to recruit multiple sponsors.– Is there a real and identifiable business need?– Does a strong partnership exist between IT and

the business groups?– What is the current analytical environment?

• How are things done now?• What culture shock will be created?

Page 18: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Determine Warehouse Readiness (cont)– What is the feasibility?

• Is the data “dirty” beyond recovery?• Is the target sources to dispersed and dynamic to

achieve early and significant results?

Page 19: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management• Take the Readiness “Litmus Test”

– The test looks at:• Sponsor• Business Needs• IT/Business Partnership• Current Analytical Environment• Feasibility

– A strong sponsor is the most important to get a high rating from the test

– Business needs and IT/Business Partnerships are secondary in importance

Page 20: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Addressing Readiness Issues– High-level business requirements analysis

• Identify the strategic initiatives• Identify the business metrics• Identify the high impact and ROI areas

– Business Requirements Prioritization• Look for high impact, ROI, and feasibility

– Proof of Concept

Page 21: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Develop the Initial Scope– Keep the scope narrow and short to retain clarity– The bigger the scope the more difficult it

becomes to retain focus– Always define the scope based on business

requirements. Try to avoid deadlines or budget cycles from driving the scope.

Page 22: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management• Develop the Initial Scope (cont)

– Scope definition involves both IT and business representatives

– Make the scope have significance but ensure it is achievable and timely

– Start with a single or few data sources and a single business process

– Limit your initial user base (typically 25 - 35 people).– Determine what management expects so success can be

identified

Page 23: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Develop the Initial Scope (cont)– Document the scope definition and success

indicators– Acknowledge that the scope will likely change– Develop a plan to manage the change

Page 24: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Build the Business Justification– Determine the costs

• Identify hardware and software costs (start-up and ongoing)

• Identify maintenance costs• Internal staff needs • External resources (consultants, etc.)• Operational support• Support of growth pains

Page 25: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Build the Business Justification (cont)– Determine the benefits (financial and other)

• Increased revenue• Increased profit• Increased customer satisfaction• Expansion of a market or capability• Increased employee productivity• Reduction of capital investments (storage requirements, etc.)• Protection against fraud and attack

Page 26: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Build the Business Justification (cont)– It is important to monitor and track the business

to identify and market impacts the warehouse has made

– Look for the tangibles and intangibles

Page 27: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Plan the Project– Establish project identity

• Create a name• Create documentation describing your project• Make T-shirts, mugs, etc• Market, market, market!!!

Page 28: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Plan the Project (cont)– Staff up

•Project Manager•Business Lead•Business Analyst•Data Modeler•DW DBA•Data Staging System Designer•End User Application Developer

•DW Educator•Technical/Security Architect•Technical Support Specialists•Data Staging Programmers•Data Steward•DW QA Analyst

Page 29: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Develop the Project Plan– Key (frequently update your plan)– The nature of a DW project in cyclic and

resembles a spiral approach– Identify key milestones– Develop a high-level and detailed plan

Page 30: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Manage the Project– Matrix management is often used because of

the numerous interlaced roles– Data issues may lay waste to the best devised

plans (plan for the unexpected)– The project will likely increase in visibility

(manage expectations)– Iterative/sliding window development requires

multiple teams work in sync (communication)

Page 31: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Manage the Project (cont)– Conduct a project kickoff meeting

• Identify the team, roles, and responsibilities• Identify the scope• Identify goals• Identify the schedule• Review the preliminary PMP• Conduct preliminary education

Page 32: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Project Planning & Management

• Monitor the Project Status– Frequent communication– Project status meetings– Team meetings– Project status reports– Customer reporting

Page 33: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• The old theory was not to include the users in the early stages.

• Build it and they will come.• This proved to be the demise of many early

warehouse initiatives.• A formal requirement (but flexible) is

needed to document the users needs of the warehouse.

Page 34: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• This is a difficult process for many reasons.– Key people may feel threatened and are not

willing to cooperate.– The informal decision process is typically not

well documented and is dispersed.– People have a difficult time thinking “out of the

box”– Terminology associated to warehousing often

creates confusion and/or misinformation

Page 35: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Talk with the business users first – Strive to understand how they do business– Identify how decisions are made today– Determine how they would like to make

decisions today and tomorrow– Do not just ask “what data do you need?”

Page 36: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Talk with the IT community second– Wait until some common sources and themes

are identified by the business users before approaching IT

– Look for feasibility issues– Start identifying technical issues such as

platforms, formats, access, and politics– Talk DBAs, DAs, application developers, an

designers

Page 37: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Getting the requirements (Interview VS Facilitation)– Interviews tend to stay focused and work well

with small groups– Facilitated sessions work with larger groups

and encourage “brainstorming” and cross pollination of ideas.

Page 38: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Roles of the requirements team– Lead interviewer– Secondary interviewers– Scribe– Observers– Facilitator

Page 39: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Preparation for the interview– Look at strategic plans that relate to the

company or group you will talk with– Look at the annual report. Important goals and

initiatives will be identified and taken seriously by the company.

– Review marketing material– Search the Internet for information– Identify past attempts at similar projects

Page 40: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Identify who will be interviewed– Business

• Look horizontally across the organization to see the big picture

• Get as much detail as possible in the current area of focus (vertical)

• Request that your sponsor identify who should be interviewed.

Page 41: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Identify who will be interviewed– Technology

• The data gurus (these people have been around a long time and know the details)

• Application programmers• Pseudo technical people within a business area• DBAs• Data modelers• System administrators• IT management to identify the future

Page 42: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements• Develop an interview questionnaire• Build an agenda for the interview sessions• Prepare the interviewees

– Hold a single meeting with all interviewees to discuss the project, intentions, etc

– Set the tone for all interviews– Encourage questions– Enables you to identify good and bad candidates early

(now you can plan for each person)

Page 43: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements• Conduct the interview

– Remain within the roles established for the interview team

– Validate what you have collected with the user as soon as possible

– Define terms with the users (profit, revenue, sales)– Try to talk on their level and avoid using confusing

technology terms (use their business lingo when possible)

Page 44: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements• Conduct the interview

– Try to remain flexible during the interview process• Meet with unexpected people• Run past the allotted time• Discuss topics somewhat out of the focus of the interview.

– Schedule breaks and limit the number of interview session per day to about five

– Continue to manage expectations

Page 45: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements• Potential interview questions for an executive:

– What are the objectives of your organization? What are you trying to accomplish?

– How do you measure success? How do you know you are doing well? How often do you measure yourself?

– What are the key business issues you face today? What could prevent you from meeting these objectives? What would be the impact?

Page 46: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Potential interview questions for an analyst:– What are your groups objectives? How do you

accomplish them? How do you achieve it?– What are your success metrics? How do you

know you are doing well? How often do you measure?

– What issues do you currently face?– Describe your products, vendors, etc? Is there a

natural hierarchy?

Page 47: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements• Potential interview questions for an analyst:

– What type of analysis do you perform? What data is used? How do you get it? What do you do with it?

– What analysis would you like to perform? – What dynamic analysis needs do you have? Who drives

these needs? How long does it take to perform? Are you able to conduct deeper levels of analysis?

– What analytical capabilities would you like?

Page 48: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements• Potential interview questions for an analyst:

– Where are the bottlenecks in obtaining information?– How much historical information is needed?– How will improved information access impact you and

your organization? What is the financial impact?– What reports do you currently use? Which data

elements on the reports are important? How is this information used? Is it combined with anything else?

Page 49: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• What to discuss with IT:– Request an overview of the operational systems– What are the current tools and technologies used to

share information?– What types of analyses are performed?– How are detailed analyses supported and conducted?– What are the data quality issues?– Where do bottlenecks exist?

Page 50: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• What to discuss with IT:– What concerns do you have about data

warehousing in the organization? What roadblocks do you see?

– What expectations do you have of the warehouse?

– How do you expect the warehouse to impact you?

Page 51: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Types of users you will interview– Abused User

• Involved in earlier attempts• Unwilling to cooperate

– Overbooked User (To busy to meet)– Comatose User– Overzealous User– Nonexistent User (Use technology to drive the

needs)

Page 52: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Collecting the Requirements

• Wrap Up• Review the interview results with the team• Prepare and publish the results• Establish what will be done next

Page 53: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Dimensional Modeling

Jeffrey T. Edgell

Page 54: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Dimensional Model

• More intuitive structure for presentation and reporting

• Likely predates the E/R approach– General Mills & Dartmouth University

developed a fact and dimension structure– Nielsen Marketing Research used this on

grocery and drug store auditing and scanner data in the 70s and 80s.

Page 55: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Dimensional Model

• Dimensions are descriptive• Facts are likely numeric and are

measurement based• Additive facts are vital to allow aggregation

of many records during a retrieval• Page 145 (A typical dimensional model)

Page 56: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Argument for the Dimensional Model

• Tools can utilize a standardized framework• Query tools can leverage against this for

performance optimization• High performance entry browsing is

possible• All queries can be initially constrained thus

significantly increasing performance

Page 57: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Argument for the Dimensional Model

• Easily adapts to unpredictable queries• Extends to allow the addition of new tables

or data elements– will not require rebuilding the database from

scratch– data does not need to be reloaded– existing reports and query tools do not need to

be redesigned or implemented

Page 58: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Argument for the Dimensional Model

• The model can be altered as follows without interruption:– The addition of new facts (consistent with the

defined grain)– The addition of new dimensions– The widening of a dimension table– Changing the detail of a dimension to a lower

level

Page 59: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Argument for the Dimensional Model

• The dimensional model exhibits a predefined set of approaches used to deal with common issues.– Slowly changing dimensions– Heterogeneous products (track different lines of

business i.e. checking & savings)– Pay-in-advance data bases (look at individual

components as well as the total)– Event handling (no facts)

Page 60: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Argument for the Dimensional Model

• Aggregation in a warehouse allows for query performance normally delegated to hardware to solve (greatly increasing $)

• A standard set of schemas for different business types and applications exist

Page 61: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Bus

• Supports the incremental approach• The data mart approach has often lead to

development of warehouse absent of a corporate framework

• Stovepipe decision structures result• Produces a uniform global structure

eliminating the pocket or stovepipe data marts

Page 62: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Bus

• Look at the entire enterprise as you design and build the data marts

• A high level architecture must be defined that explains the entire structure

• A detailed architecture must be developed to support each data mart as they are confronted

Page 63: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Conformed Dimensions

• Dimensions used to represent concepts across the enterprise must be standardized and agreed upon– customer– product– time– potentially not region (sales & management)

Page 64: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Conformed Dimensions

• Conformed dimensions must be carefully managed, maintained, and published to ensure consistency

• The conformed dimension represents the central source description of which everyone agrees

• If the conformed dimension approach is not observed, the bus will not properly function

Page 65: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Conformed Dimensions

• With conformed dimensions– One dimension table relates to multiple facts– Browsers are consistent with the dimension

providing a unified view– Rollups and meanings remain consistent across

facts

Page 66: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Conformed Dimensions

• Design– Lowest level of granularity possible (based on

the lowest level defined)– Use the sequential numeric key (surrogate key)

Page 67: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Conformed Facts

• Occurs during the definition of conformed dimensions

• Relates common measurements accurately– Cost – Profit– Unit price

• If facts are different use different names (marketing profit & sales profit)

• As much political as technical

Page 68: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

When the Bus is not Required

• The business you are dealing with is intentionally segmented– Components operated autonomously with no

unified corporate view required– Products or business areas are disjoint– For example a company sells music and repairs

train engines (no business or product synergy except at the very top)

Page 69: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Components of the Dimensional Model

• Facts• Dimensions• Attributes• The Bus (optional but highly suggested)

Page 70: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Operations

• Drill down and rollup – Example on page 168

Page 71: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Snowflakes• What is it?

The removal of low cardinality fields from a dimension placed in a new table and linked back with keys

• Complicates design detail• Decreases performance• Saves some space but normally not a significant

amount• Bit map indexes can not be effectively utilized

Page 72: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

When a Snowflake is OK

• When used as a subdimesnion– The data in the subd is related to the dimension

are at different levels of granularity– The data load times for the data are different– Examples:

• County and state• District and region• Ship and battle group

Page 73: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Good Descriptive Dimensions

• Large dimension tables• Highly descriptive• Without good descriptive dimensions, the

warehouse is not useful• Use:

– full words, no missing values (null), QA, metadata

Page 74: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Common Dimension Techniques

• Time– example figure 5.7 page 176

• Address– example page 178

• Commercial address– example page 179

Page 75: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Slowly Changing Dimensions

• What to do:– Type 0: Ignore the change– Type 1: Overwrite the changed attribute– Type 2: Add a new dimension record with new

value of the surrogate key– Type 3: Add an “old value” field

Page 76: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Slowly Changing Dimensions

• Ignore the change– Not typically a good solution to the problem, but

is done.• Overwrite the changed attribute

– Valid when correcting a value from the source• Add a new dimension record with a

generalized key– Retains history of a changed product

Page 77: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Slowly Changing Dimensions

• Add an “old value” field– Valid when on the previous change is needed

for decision making

Page 78: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Slowly Changing Dimensions

• Type 2 example: Change in product (bottle changes from

platic to glass)

Key 001 002Type Plastic GlassSKU 1234 1234

Page 79: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Slowly Changing Dimensions

• Type 3 example: Regional divisions of a company changes

(only one historical change is supported)

Region Gold Silver Platinum BronzeOld Region North South East West

Page 80: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Monster Dimension

• It is a compromise• Avoids creating copies of dimension

records in a significantly large dimension• Done to manage space and changes

efficiently

Page 81: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Monster Dimension

Customer_Keynameaddresscity, statebirth_datedate_first_purchaseincomenumber_childreneducationtotal_purchasescredit_score

Customer_Keynameaddresscity,statebirth_datedate_first_purchase

Demographics_Keyincome_bandnumber_childreneducation_leveltotal_purchases_bandcredit_group

Basically constant

May change with each purchase

Bands used tominimize possibilities

Example 1

Page 82: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Monster Dimension

• Case 1 (Rapid change)– Large dimensions can be dynamic because of

the amount of information contained– Certain aspects must be maintained in the

dimension, over time, to understand impacts• demographics• customer data• product lines (for companies in acquisition)

Page 83: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Monster Dimension

• The solution to very dynamic large dimensions– identify the dynamic areas of the dimension– segment the hot areas into there own

independent dimensions– The relative static information remains in the

original dimension

Page 84: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Monster Dimension

• The trade off (plus)– the warehouse can accurately retain significant

changes in a dimension over time– to slow the rate of change down extremely

dynamic attributes should be banded to slow the rate of change

– All possible combinations in the dimension become finite (discrete) and are thus manageable

Page 85: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Monster Dimension

• The trade off (minus)– Loss of detail in the bands (no longer exact)– Once bands are defined they must be enforced from

that point on– Slower browse performance required when

combining the segmented table with the original table

– Impossible to combine the data without a single instance of a fact (nothing to relate the dimensions)

Page 86: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

The Monster DimensionEmployee Tablename addressdate_of_birthsocial_security_num…titleyears_with_companyincomedivisionpurchase_level

Example 2

Employee Tablename addressdate_of_birthsocial_security_num

Corporate Demographicsposition_gradeincome_banddivisionservice_years_band

Page 87: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Degenerate Dimensions/Key

Definition - Critical data provided in the legacy environment that normally remains independent. Typically the old key from the current fact information you are using with no supporting data

Page 88: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Degenerate Dimensions/Key

• Likely found in the header of a file• The other items have been absorbed in other

dimensions– customer, date, vendor, item

• The remaining item has no supporting attributes but is important– CLIN, Requisition #, Order #

• Useful information and should be absorbed in the fact table

Page 89: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Degenerate Dimensions/Key

• Useful information and should be absorbed in the fact table

• If there is other supporting attributes, it becomes a typical dimension

Page 90: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Junk Dimensions

• Resident flags, status codes, and miscellaneous information persists after the dimensional design is near complete

• Alternatives:– Place the flags in the fact tables– Make each attribute a dimension– Remove the attributes completely

Page 91: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Junk Dimensions

• Leave the flags in the fact tables– likely sparse data– no real browse entry capability– can significantly increase the size of the fact

table• Remove the attributes from the design

– potentially critical information will be lost– if they provide no relevance, remove them

Page 92: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Junk Dimensions

• Make a flag into it’s own dimension– may greatly increase the number of

dimensions, increasing the size of the fact table– can clutter and confuse the design

Combine all relevant flags, etc. into a single dimension– the number of possibilities remain finite– information is retained

Page 93: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Keys, Keys, Keys

• Surrogate keys (always use) – 4 byte integer (232 or two billion + integers)

• Date keys should use surrogates as well– dates are typically 8 bytes -- saves 4 bytes per fact

• Do not use smart keys with embedded meanings

• Do not use legacy or production keys

Page 94: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Just the Facts• Attempt to make all facts additive

– simplifies calculations across dimensions– all numbers are not additive facts

• Semi-additive facts can be used but understand they are there– averages, max, min

• Non-additive facts often are avoided but may have value– weather conditions (non-discrete), non-discrete discriptions

Page 95: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Steps to Designing a Fact Table

• Time to choose:– data mart (functional business area)– grain of the fact table (what level of detail)– dimensions associated to the data mart– the facts relative to the data mart

Page 96: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Data Mart• Single operational source data marts provide the

least amount of risk• Multiple operational source data marts typical

provide more cross functional value• Examples: (remember, processes you measure)

– Marketing– Sales– Inventory– Productivity

Page 97: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Fact Table Grain

• Without this, dimensions can not be accurately defined

• Select as low of a grain as possible– handles unexpected queries– adapts readily to additional facts and dimensions– delivers the most comprehensive solution– Consumes more space– Performance can be an issue

Page 98: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Fact Loads

• By record– account for every transaction or activity recorded

(ATM)• Snapshot

– A picture of the related facts at a specific point in time (monthly reporting)

• Line item– track and reflect the status of line item activity (PO)

Page 99: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Dimensions

• Once the grain is defined, basic dimensions will be evident from the grain (customer, time, etc.)

• Addition of other dimensions and junk dimensions

• All dimensions can not be at a lower level of granularity than the lowest fact table grain

Page 100: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Identifying Facts

• The grain of the fact table dictates the facts• All facts must be at the same level• Individual transaction tables typically have 1

fact (the numeric value of the transaction)• Snap shot and line item fact tables will likely

contain multiple facts in that multiple additive facts are captured

• Keep all three types separated

Page 101: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Fact Table Families

• Process chain (supply chain, linear)– fact table represents each step in the process– RFI-RFP-RFQ-Contact-Delivery– supply chain process example: page 200– each fact is connected on the bus

• Value Circle (parallel measurement)– health care (example page 202)– retail

Page 102: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Fact Table Families

• Heterogeneous Product Schemas– Service offered by the business are distinct and

separate– banking (checking, savings, loans, etc.)– Insurance (life, home, auto, etc.)

• Transaction an Snapshot Schemas– Snapshot (periodic picture) example page 210– Transaction (activity detail) example page 207

Page 103: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Aggregate Families

• Used to improve query performance• Typically roll ups of facts along a

dimension for anticipated reporting and querying

• Aggregate tables can also be used to combine details from two fact tables of varying granularity

Page 104: The Basic Structure Extract Source Data Extract Storage: flat files (fastest); RDBMS; other Processing: clean; prune; combine; remove duplication standardize.

Factless Fact Tables

• Used for two reasons:– record an activity (student attendance page 213)

• answers what the most popular classes were• what days are frequently missed

– Coverage (account for activity that may not have happened) (example page 215)

• An entry is placed in the fact table for all item of interest• answers questions regarding what did and did not have

activity


Recommended