+ All Categories
Home > Documents > Data Warehouse and Business Intelligence Dr. Minder Chen [email protected] Fall 2008.

Data Warehouse and Business Intelligence Dr. Minder Chen [email protected] Fall 2008.

Date post: 20-Dec-2015
Category:
View: 220 times
Download: 0 times
Share this document with a friend
Popular Tags:
49
Data Warehouse and Business Intelligence Dr. Minder Chen [email protected] Fall 2008 Fall 2008
Transcript

Data Warehouse and Business Intelligence

Dr. Minder Chen

[email protected]

Fall 2008Fall 2008

Data Warehouse - 2 © Minder Chen, 2004-2008

Online Resources

• Additional resources: – Teradata Student Network.

» The Premier Learning Resource for Data Warehousing, DSS/BI, and Database.  The URL is http://www.teradatastudentnetwork.com

» PSW: smartdecisions

Data Warehouse - 3 © Minder Chen, 2004-2008

BI

Business Intelligence (BI) is the process of gathering meaningful information to answer questions and identify significant trends or patterns, giving key stakeholders the ability to make better business decisions.

“The key in business is to know something that

nobody else knows.”-- Aristotle Onassis

PHOTO: HULTON-DEUTSCH COLL

“To understand is to perceive patterns.”

— Sir Isaiah Berlin

"The manager asks how and when, the leader asks what and why."

— “On Becoming a Leader” by Warren Bennis

Data Warehouse - 4 © Minder Chen, 2004-2008

BI Questions

• What happened?– What were our total sales this month?

• What’s happening?– Are our sales going up or down, trend analysis

• Why?– Why have sales gone down?

• What will happen?– Forecasting & “What If” Analysis

• What do I want to happen?– Planning & Targets

Source: Bill Baker, Microsoft

Data Warehouse - 5 © Minder Chen, 2004-2008

Increasing potentialto supportbusiness decisions (MIS) End User

Business Analyst

DataAnalyst

DBA

MakingDecisions

Data Presentation

Visualization Techniques

Data MiningInformation Discovery

Data ExplorationOLAP, MDA,

Statistical Analysis, Querying and Reporting

Data Warehouses / Data Marts

Data Sources(Paper, Files, Information Providers, Database Systems, OLTP)

Business Intelligence

Data Warehouse - 6 © Minder Chen, 2004-2008

Where is Business Intelligence applied?

• ERP Reporting

• KPI Tracking

• Product Profitability

• Risk Management

• Balanced Scorecard

• Activity Based Costing

• Global Sourcing

• Logistics

• Sales Analysis

• Sales Forecasting

• Segmentation

• Cross-selling

• CRM Analytics

• Campaign Planning

• Customer Profitability

Operational Efficiency Customer Interaction

Data Warehouse - 7 © Minder Chen, 2004-2008

Data Warehouse - 8 © Minder Chen, 2004-2008

Inmon's Definition of Data Warehouse – Data View

• A warehouse is a

– subject-oriented,

– integrated,

– time-variant and – non-volatile

collection of data in support of management's decision making process.

– Bill Inmon in 1990

Source: http://www.intranetjournal.com/features/datawarehousing.html

Data Warehouse - 9 © Minder Chen, 2004-2008

Inmon's Definition Explain• Subject-oriented: They are organized around major

subjects such as customer, supplier, product, and sales. Data warehouses focus on modeling and analysis to support planning and management decisions v.s. operations and transaction processing.

• Integrated: Data warehouses involve an integration of sources such as relational databases, flat files, and on-line transaction records. Processes such as data cleansing and data scrubbing achieve data consistency in naming conventions, encoding structures, and attribute measures.

• Time-variant: Data contained in the warehouse provide information from an historical perspective.

• Nonvolatile: Data contained in the warehouse are physically separate from data present in the operational environment.

Data Warehouse - 10 © Minder Chen, 2004-2008

Kimball's Definition – Process View

• A data warehouse is a system that extracts, cleans, conforms, and delivers source data into a dimensional data store and then supports and implements querying and analysis for the purpose of decision making.

» Ralph Kimball

Data Warehouse - 11 © Minder Chen, 2004-2008

Data Warehouse - 12 © Minder Chen, 2004-2008

The Data Warehouse Process

Data Marts Data Marts and cubesand cubes

DataDataWarehouseWarehouse

SourceSourceSystemsSystems

ClientsClients

Design theDesign the Populate Populate CreateCreate QueryQuery Data Warehouse Data Warehouse Data Warehouse Data Warehouse OLAP CubesOLAP Cubes DataData

33 44

Query ToolsQuery ToolsReportingReportingAnalysisAnalysis

Data MiningData Mining

2211

Data Warehouse - 13 © Minder Chen, 2004-2008

Key Concepts in BI Development Lifecycle

Data Warehouse - 14 © Minder Chen, 2004-2008

Business Valuation Models for BI

Data Warehouse - 15 © Minder Chen, 2004-2008

Performance Dashboards for Information Delivery

Data Warehouse - 16 © Minder Chen, 2004-2008

Scorecards for Information Delivery

Data Warehouse - 17 © Minder Chen, 2004-2008

OLTP Normalized Design

Ordering Ordering ProcessProcess

Ware- Ware- househouse

POS POS ProcessProcess

Chain Chain RetailerRetailer

Retailer Retailer ReturnsReturns

Retailer Retailer PaymentsPayments

StoreStore

ProductProduct

BrandBrandGLGL AccountAccount

ClerkClerk

Retail Retail CustCust

Cash Cash RegisterRegister

Retail Retail PromoPromo

Data Warehouse - 18 © Minder Chen, 2004-2008

OLTP Versus Business Intelligence: Who asks what?

OLTP Questions

• When did that order ship?

• How many units are in inventory?

• Does this customer haveunpaid bills?

• Are any of customer X’s line items on backorder?

Analysis Questions• What factors affect order

processing time?

• How did each product line (or product) contribute to profit last quarter?

• Which products have the lowest Gross Margin?

• What is the value of items on backorder, and is it trending up or downover time?

Data Warehouse - 19 © Minder Chen, 2004-2008

OLTP vs. OLAP

Source: http://www.rainmakerworks.com/pdfdocs/OLTP_vs_OLAP.pdf#search=%22OLTP%20vs.%20OLAP%22

Data Warehouse - 20 © Minder Chen, 2004-2008

Dimensional Design Process

• Select the business process to model • Declare the grain of the business process/data

in the fact table • Choose the dimensions that apply to each fact

table row• Identify the numeric facts that will populate

each fact table row

BusinessRequirements

Data Realities

Data Warehouse - 21 © Minder Chen, 2004-2008

Select a business process to model

• Not business departments or business functions

• Cross-functional business processes

• Business events

• Examples: – Raw materials purchasing

– Order fulfillment process

– Shipments

– Invoicing

– Inventory

– General ledger

Data Warehouse - 22 © Minder Chen, 2004-2008

Requirements

Data Warehouse - 23 © Minder Chen, 2004-2008

Identifying Measures and Dimensions

The attribute variescontinuously: •Balance•Unit Sold•Cost•Sales

The attribute is perceived asa constant or discrete value:

•Description•Location•Color•Size

DimensionsMeasures

Performance Measures for KPI

Performance Drivers

Data Warehouse - 24 © Minder Chen, 2004-2008

A Dimensional Model for a Grocery Store Sales

Data Warehouse - 25 © Minder Chen, 2004-2008

Product Dimension

• SKU: Stock Keeping Unit

• Hierarchy: – Department Category Subcategory Brand Product

Data Warehouse - 26 © Minder Chen, 2004-2008

Creating Dimensional Model

• Identify fact tables• Translate business measures into fact tables

• Analyze source system information for additional measures

• Identify base and derived measures

• Document additivity of measures

• Identify dimension tables

• Link fact tables to the dimension tables

• Create views for users

Data Warehouse - 27 © Minder Chen, 2004-2008

Transaction Level Order Item Fact Table

Data Warehouse - 28 © Minder Chen, 2004-2008

Inside a Dimension Table

• Dimension table key: Uniquely identify each row. Use surrogate key (integer).

• Table is wide: A table may have many attributes (columns).

• Textual attributes. Descriptive attributes in string format. No numerical values for calculation.

• Attributes not directly related: E.g., product color and product package size. No transitive dependency.

• Not normalized (star schemar).

• Drilling down and rolling up along a dimension.

• One or more hierarchy within a dimension.

• Fewer number of records.

Data Warehouse - 29 © Minder Chen, 2004-2008

Fact Tables

Fact tables have the following characteristics:• Contain numeric measures (metric) of the

business• May contain summarized (aggregated) data• May contain date-stamped data• Are typically additive• Have key value that is typically a concatenated

key composed of the primary keys of the dimensions

• Joined to dimension tables through foreign keys that reference primary keys in the dimension tables

Data Warehouse - 30 © Minder Chen, 2004-2008

Facts Table

DateID

ProductID

CustomerID

Units

Dollars

DimensionsDimensionsDimensionsDimensions

MeasuresMeasuresMeasuresMeasures

The Fact Table contains keys and units of The Fact Table contains keys and units of measuremeasure

Measurements of business events.

Data Warehouse - 31 © Minder Chen, 2004-2008

Snowflake Schema

SalesSales

CustomersCustomers

DatesDates

ProductsProducts

ChannelsChannels

PromotionsPromotions

BrandsBrands

Data Warehouse - 32 © Minder Chen, 2004-2008

Hierarchy

Data Warehouse - 33 © Minder Chen, 2004-2008

OLAP Solutions

• Data Warehouse/Data Mart

• Dimensions

• Measures

• Cubes

• Cells

Gadgets

Gizmos

Thingies

Widgets

Q1 Q2 Q3 Q4

US

EuropeAsia

130 135 140 142

205 390 350 475

175 230 190 250

310 340 410 450

Data Warehouse - 34 © Minder Chen, 2004-2008

Operations in Multidimensional Data Model

• Aggregation (roll-up)

– dimension reduction: e.g., total sales by city

– summarization over aggregate hierarchy: e.g., total sales by city and year total sales by region and by year

• Selection (slice) defines a subcube

– e.g., sales where city = Palo Alto and date = 1/15/96

• Navigation to detailed data (drill-down)

– e.g., (sales - expense) by city, top 3% of cities by average income

• Visualization Operations (e.g., Pivot)

Data Warehouse - 35 © Minder Chen, 2004-2008

A Visual Operation: Pivot (Rotate)

1010

4747

3030

1212

JuiceJuice

ColaCola

Milk Milk

CreaCreamm

NYNY

LALA

SFSF

3/1 3/2 3/3 3/1 3/2 3/3 3/43/4

DateDate

Month

Month

Reg

ion

Reg

ion

ProductProduct

Data Warehouse - 36 © Minder Chen, 2004-2008

Date Dimension of the Retail Sales Model

Data Warehouse - 37 © Minder Chen, 2004-2008

Store Dimension

• It is not uncommon to represent multiple hierarchies in a dimension table. Ideally, the attribute names and values should be unique across the multiple hierarchies.

Data Warehouse - 38 © Minder Chen, 2004-2008

Multidimensional Query Techniques

What?Why?

Why?

Why? Slicing

Dicing

Drillingdown

ProductTime

Geography

Data Warehouse - 39 © Minder Chen, 2004-2008

ETL

ETL = Extract, Transform, Load

• Moving data from production systems to DW

• Checking data integrity

• Assigning surrogate key values

• Collecting data from disparate systems

• Reorganizing data

Data Warehouse - 40 © Minder Chen, 2004-2008

Pivot Table in Excel

Data Warehouse - 41 © Minder Chen, 2004-2008

Data Quality Issues

• No common time basis

• Different calculation algorithms

• Different levels of extraction

• Different levels of granularity

• Different data field names

• Different data field meanings

• Missing information

• No data correction rules

• No drill-down capability

Data Warehouse - 42 © Minder Chen, 2004-2008

Building The WarehouseTransforming Data

Data Warehouse - 43 © Minder Chen, 2004-2008

CUST #CUST # NAMENAME ADDRESSADDRESS TYPETYPE

90238475

90233479

90233489

90234889

90345672

90328574

90328575

Digital Equipment

Digital

Digital Corp

Digital Consulting

Digital Info Service

Digital Integration

DEC

187 N. PARK St. Salem NH 01458187 N. Pk. St. Salem NH 01458

187 N. Park St Salem NH 01458

187 N. Park Ave. Salem NH 01458

15 Main Street Andover MA 02341PO Box 9 Boston MA 02210

Park Blvd. Boston MA 04106

OEM

OEM

$#%

Comp

Consult

Mail List

SYS INT

No Unique KeyNoise in

Blank FieldsSpellingNo StandardizationAnomalies

How does one correctly identify and consolidate anomalies from millions of records?

The Anomalies Nightmare

Data Warehouse - 44 © Minder Chen, 2004-2008

OLAP and Data Mining Address Different Types of Questions

While reporting and OLAP are informative about past facts, only data mining can help you predict the future of your business.

OLAP  Data Mining 

What was the response rate to our mailing?  What is the profile of people who are likely to respond to future mailings?

 How many units of our new product did we sell to our existing customers?

 Which existing customers are likely to buy our next new product?

 Who were my 10 best customers last year? Which 10 customers offer me the greatest profit potential?

 Which customers didn't renew their policies last month?

 Which customers are likely to switch to the competition in the next six months?

 Which customers defaulted on their loans? Is this customer likely to be a good credit risk?

 What were sales by region last quarter? What are expected sales by region next year?

 What percentage of the parts we produced yesterday are defective?

 What can I do to improve throughput and reduce scrap?

Source: http://www.dmreview.com/editorial/dmreview/print_action.cfm?articleId=2367

Data Warehouse - 45 © Minder Chen, 2004-2008

Use of Data Mining

• Customer profiling

• Market segmentation

• Buying pattern affinities

• Database marketing

• Credit scoring and risk analysis

Data Warehouse - 46 © Minder Chen, 2004-2008

Associates

Which items are purchased in a retail store at the same time?

Data Warehouse - 47 © Minder Chen, 2004-2008

Sequential Patterns

What is the likelihood that a customer will

buy a product next month, if he buys a related item today?

Data Warehouse - 48 © Minder Chen, 2004-2008

Classifications

Determine customers’ buying patterns

and then find other customers with

similar attributes that may be targeted for

a marketing campaign.

Data Warehouse - 49 © Minder Chen, 2004-2008

Modeling

Use factors, such as location, number of

bedrooms, and square footage, to

Determine the market value of a property


Recommended