+ All Categories
Home > Documents > Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Chapter 9: Data...

Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Chapter 9: Data...

Date post: 24-Dec-2015
Category:
Upload: curtis-nicholson
View: 216 times
Download: 3 times
Share this document with a friend
Popular Tags:
91
Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems Chapter 9: Data Warehousing Jason C. H. Chen, Ph.D. Professor of MIS School of Business Administration Gonzaga University Spokane, WA 99258 [email protected]
Transcript

Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Chapter 9: Data Warehousing

Jason C. H. Chen, Ph.D.Professor of MIS

School of Business AdministrationGonzaga UniversitySpokane, WA 99258

[email protected]

TM 9-2Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Objectives• Definition of terms• Reasons for information gap between information

needs and availability• Reasons for need of data warehousing• Describe three levels of data warehouse architectures

(ETL)• Describe two components of star schema• Estimate fact table size• Design a data mart• Develop requirements for a data mart• OLAP, data mining and its applications

TM 9-3Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

A Solution to the Information Gap

• A solution to bridging the information gap is the ______ _________ which consolidate and integrate information from many different sources and arrange it in a meaningful format for making accurate business decisions.

data warehouses

TM 9-4Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Two issues need to know about D.W.• 1. A major factor drives the need for data

warehousing– Businesses need an integrated view of company

information.

• 2. Which of the following organizational trends does not encourage the need for data warehousing?– a) Multiple, nonsynchronized systems– b) Focus on customer relationship management– c) Downsizing– d) Focus on supplier relationship management– Answer: ______________Downsizing

TM 9-5Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Need for Data Warehousing• Integrated, company-wide view of high-quality

information (from disparate databases)• Separation of operational and informational systems and

data (for improved performance)

Table 9-1 – Comparison of Operational and Informational Systems

TM 9-6Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

DATA WAREHOUSE FUNDAMENTALS

• Data warehouse – a logical collection of information – gathered from many different operational databases – that supports business analysis activities and decision-making tasks

• The primary purpose of a data warehouse is to aggregate information throughout an organization into a single repository for decision-making purposes

TM 9-7Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Definition• Data Warehouse:

A subject-oriented, integrated, time-variant, non-updatable collection of data used in support of management decision-making processes

– Subject-oriented: e.g. customers, patients, students, products• DW is organized around key high-level entities of the enterprise

– Integrated: Consistent naming conventions, formats, encoding structures; from multiple data sources

– Time-variant: Can study trends and changes• data in the warehouse contain a time dimension so that they may be

used to study trends and changes.– Non-updatable: Read-only, periodically refreshed

• Data Mart:– A data warehouse that is limited in scope– contains a subset of data warehouse information

TM 9-8Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

History Leading to Data Warehousing

• Improvement in database technologies, especially relational DBMSs

• Advances in computer hardware, including mass storage and parallel architectures

• Emergence of end-user computing with powerful interfaces and tools

• Advances in middleware, enabling heterogeneous database connectivity

• Recognition of difference between operational and informational systems

TM 9-9Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Need for Data Warehousing• Integrated, company-wide view of high-

quality information (from disparate databases)

• Separation of operational and informational systems and data (for improved performance)

TM 9-10Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Issues with Company-Wide View

• Inconsistent key structures• Synonyms• Free-form vs. structured fields• Inconsistent data values• Missing data

See figure 9-1 for example

TM 9-11Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-1 Examples of heterogeneous data

TM 9-12Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Database vs. Datawarehouse

DBMSDBMS Database

Data Warehouse

??????

TM 9-13Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Database vs. Datawarehouse

DBMSDBMS Database

Data Warehouse

Data MiningData Mining

TM 9-14Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Database vs. Datawarehouse

DBMSDBMS Database

Datawarehouse

??????

TM 9-15Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Data Marts and the Data Warehouse

OrganizationalData

Warehouse

Legacy systems feed data to the warehouse.

The warehouse

feeds specialized

information to

departments (data marts).

FinanceData Mart

AccountingData Mart

MarketingData Mart

SalesData Mart

Operational Data Store

Operational Data Store

Operational Data Store

Operational Data Store

Legacy Systems

ETL

ETL

TM 9-16Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

The Data Mart is More Specialized

OrganizationalData

Warehouse

FinanceData Mart

AcctingData Mart

MarketingData Mart

SalesData Mart

Data Marts

· Departmentalized· Summarized, aggregated data· Star join design· Limited historical data· Limited data volume· Requirements driven data· Focused on departmental

needs· Multi-dimensional DBMS

technologies

Organizational Data Warehouse

· Corporate· Highly granular data· Normalized design· Robust historical data· Large data volume· Data Model driven data· Versatile· General purpose DBMS

technologies

The data mart serves the needs of one business unit, not the organization.

ETL

TM 9-17Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Organizational Trends Motivating Data Warehouses

• No single system of records• Multiple systems not synchronized• Organizational need to analyze

activities in a balanced way• Customer relationship management• Supplier relationship management

TM 9-18Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Separating Operational and Informational Systems

• Operational system – a system that is used to run a business in real time, based on current data; also called a system of record

• Informational system – a system designed to support decision making based on historical point-in-time and prediction data for complex queries or data-mining applications

TM 9-19Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 19

TM 9-20Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Position of the Data Warehouse Within the Organization

TM 9-21Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

DATA WAREHOUSE FUNDAMENTALS (cont.)

• Extraction, transformation, and loading (ETL) – a process that extracts information from internal and external databases, transforms the information using a common set of enterprise definitions, and loads the information into a data warehouse

TM 9-22Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Data Warehouse Architectures

• Independent Data Mart• Dependent Data Mart and Operational

Data Store• Logical Data Mart and Real-Time Data

Warehouse• Three-Layer architecture

All involve some form of extraction, transformation and loading (ETL)

TM 9-23Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems23

Figure 9-2 Independent data mart data warehousing architecture

Data marts:Mini-warehouses, limited in scope

E

T

L

Separate ETL for each independent data mart

Data access complexity due to multiple data marts

TM 9-24Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems24

Figure 9-3 Dependent data mart with operational data store: a three-level architecture

ET

L

Single ETL for enterprise data warehouse (EDW)

Simpler data access

ODS provides option for obtaining current data

Dependent data marts loaded from EDW

TM 9-25Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems25

E

T

L

Near real-time ETL for Data Warehouse

ODS and data warehouse are one and the same

Data marts are NOT separate databases, but logical views of the data warehouse Easier to create new data marts

Figure 9-4 Logical data mart and real time warehouse architecture

TM 9-26Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

The ETL Process –another perspective and example

• Capture/Extract - E• Scrub or data cleansing• Transform - T• Load and Index - L

ETL = Extract, transform, and load

TM 9-27Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Static extract = capturing a snapshot of the source data at a point in time

Incremental extract = capturing changes that have occurred since the last static extract

Capture/Extract…obtaining a snapshot of a chosen subset of the source data for loading into the data warehouse

TM 9-28Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Scrub/Cleanse…uses pattern recognition and AI techniques to upgrade data quality

Fixing errors: misspellings, erroneous dates, incorrect field usage, mismatched addresses, missing data, duplicate data, inconsistencies

Also: decoding, reformatting, time stamping, conversion, key generation, merging, error detection/logging, locating missing data

TM 9-29Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Transform = convert data from format of operational system to format of data warehouse

Record-level:Selection – data partitioningJoining – data combiningAggregation – data summarization

Field-level: single-field – from one field to one fieldmulti-field – from many fields to one, or one field to many

TM 9-30Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Load/Index= place transformed data into the warehouse and create indexes

Refresh mode: bulk rewriting of target data at periodic intervals

Update mode: only changes in source data are written to data warehouse

TM 9-31Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Information Cleansing or

Scrubbing • An organization must maintain high-quality

data in the data warehouse

• Information cleansing or scrubbing – a process that weeds out and fixes or discards inconsistent, incorrect, or incomplete information

Information Cleansing or Scrubbing

• Standardizing Customer name from Operational Systems

Information Cleansing or Scrubbing

Information Cleansing or Scrubbing

• Accurate and complete information

Representation of Data in DW• Dimensional Modeling – a retrieval-based system that supports

high-volume query access– Not only accommodate but also boost the processing of complex

multidimensional queries.

• Two means– 1. ______schema – the most commonly used and the simplest style of

dimensional modeling• Contain a fact table surrounded by and connected to several dimension

tables• Fact table contains the descriptive attributes (numerical values) needed

to perform decision analysis and query reporting, and foreign keys are used to link to dimension table.

• Dimension tables contain classification and aggregation information about the values in the fact table (i.e., attributes describing the data contained within the fact table).

– 2. ___________ schema – an extension of star schema where the diagram resembles a snowflake in shape

Star

Snowflakes

TM 9-36Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Fact Table vs. Dimensional Table

Many to Many Relationship (M:N)

Dimensional Table

Dimensional TableFact Table

fkpk

pk

cpk

fk

TM 9-37Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

TM 9-38Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-5 Three-layer data architecture for a data warehouse

TM 9-39Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Data CharacteristicsStatus vs. Event Data

Status

Status

Event = a database action (create/ update/ delete) that results from a transaction

Figure 9-6 Example of DBMS log entry

TM 9-40Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Data CharacteristicsTransient vs. Periodic Data

With transient data, changes to existing records are written over previous records, thus destroying the previous data content

Figure 9-7 Transient operational data

TM 9-41Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Periodic data are never physically altered or deleted once they have been added to the store

Data CharacteristicsTransient vs. Periodic Data

Figure 9-8 Periodic warehouse data

TM 9-42Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Other Data Warehouse Changes• New descriptive attributes• New business activity attributes• New classes of descriptive attributes• Descriptive attributes become more refined• Descriptive data are related to one another• New source of data

TM 9-43Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Data Reconciliation• Typical operational data is:

– Transient – not historical– Not normalized (perhaps due to denormalization for

performance)– Restricted in scope – not comprehensive– Sometimes poor quality – inconsistencies and errors

• After ETL, data should be:– Detailed – not summarized yet– Historical – periodic– Normalized – 3rd normal form or higher– Comprehensive – enterprise-wide perspective– Timely – data should be current enough to assist decision-

making– Quality controlled – accurate with full integrity

TM 9-44Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Derived Data• Objectives

– Ease of use for decision support applications– Fast response to predefined user queries– Customized data for particular target audiences– Ad-hoc query support– Data mining capabilities

• Characteristics– Detailed (mostly periodic) data– Aggregate (for summary)– Distributed (to departmental servers)

Most common data model = star schema(also called “dimensional model”)

TM 9-45Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-9 Components of a star schemaFact tables contain factual

(descriptive) or quantitative data (numerical values)

Dimension tables contain descriptions about the subjects of

the business (values in the fact table)

1:N relationship between dimension tables and fact tables

Excellent for ad-hoc queries, but bad for online transaction processing

Dimension tables are denormalized to maximize performance

TM 9-46Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-10 Star schema example

Fact table provides statistics for sales broken down by product, period and store dimensions

TM 9-47Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-11 Star schema with sample data

TM 9-48Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Surrogate Dimension Keys

• Dimension table keys should be surrogate (non-intelligent and non-business related), because:

– Business keys may change over time– Helps keep track of nonkey attribute values for

a given production key– Surrogate keys are simpler and shorter– Surrogate keys can be same length and format

for all keys

TM 9-49Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Grain of the Fact Table• Granularity of Fact Table–what level of detail do you

want?

– Transactional grain–finest level– Aggregated grain–more summarized– Finer grains better market basket analysis

capability– Finer grain more dimension tables, more rows in

fact table– In Web-based commerce, finest granularity is a

click

TM 9-50Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Duration of the Database

– Natural duration–13 months or 5 quarters

– Financial institutions may need longer duration

– Older data is more difficult to source and cleanse

TM 9-51Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Size of Fact Table• Depends on the number of dimensions and the grain of

the fact table• Number of rows = product of number of possible values

for each dimension associated with the fact table

• Example: assume the following for Figure 9-11:

• Total rows calculated as follows (assuming only half the products record sales for a given month):

TM 9-52Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Break ! (Ch. 9)Exercise# 5 – a, b, c (p. 422)With the following

assumptions:1. The length of a fiscal

period is one month2. The data mart will contain

five years of historical data

3. Approximately 5 percent of the policies experience some type of change each month

4. There are 8 fields in each record (row)ALL computations for b & c should be shown to

get credits .

HW#3 (p.422) – a, b, cAssume one professor per course section

TM 9-53Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-12 Modeling dates

Fact tables contain time-period data Date dimensions are important

TM 9-54Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Variations of the Star Schema• Multiple Facts Tables

– Can improve performance– Often used to store facts for different combinations of

dimensions– Conformed dimensions

• Factless Facts Tables– No nonkey data, but foreign keys for associated dimensions– Used for:

• Tracking events• Inventory coverage

TM 9-55Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-13 Conformed dimensions

Conformed dimension Associated with multiple fact tables

Two fact tables two (connected) start schemas.

TM 9-56Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems 56

Figure 9-14a Factless fact table showing occurrence of an event

No data in fact table, just keys associating dimension records

Fact table forms an n-ary relationship between dimensions

TM 9-57Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Normalizing Dimension Tables• Multivalued Dimensions

– Facts qualified by a set of values for the same business subject

– Normalization involves creating a table for an associative entity between dimensions

• Hierarchies– Sometimes a dimension forms a natural, fixed depth

hierarchy– Design options

• Include all information for each level in a single denormalized table

• Normalize the dimension into a nested set of 1:M table relationships

TM 9-58Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-15 Multivalued dimension

Helper table is an associative entity that implements a M:N relationship between dimension and fact.

TM 9-59Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-16 Fixed product hierarchy

Dimension hierarchies help to provide levels of aggregation for users wanting summary information in a data warehouse.

TM 9-60Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Slowly Changing Dimensions (SCD)

• Need to maintain knowledge of the past• One option: for each changing attribute,

create a current value field and many old-valued fields (multivalued)

• Better option: create a new dimension table row each time the dimension object changes, with all dimension characteristics at the time of change

TM 9-61Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-18 Example of Type 2 SCD Customer dimension table

The dimension table contains several records for the same customer. The specific customer record to use depends on the key and the date of the fact, which should be between start and end dates of the SCD customer record.

TM 9-62Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-19 Dimension segmentation

For rapidly changing attributes (hot attributes), Type 2 SCD approach creates too many rows and too much redundant data. Use segmentation instead.

TM 9-63Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

10 Essential Rules for Dimensional Modeling

• Use atomic facts• Create single-process fact

tables• Include a date dimension

for each fact table• Enforce consistent grain• Disallow null keys in fact

tables

• Honor hierarchies

• Decode dimension tables

• Use surrogate keys

• Conform dimensions

• Balance requirements with actual data

TM 9-64Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

• Columnar databases– Issue of Big Data (huge volume, often unstructured)– Columnar databases optimize storage for summary data of

few columns (different need than OLTP)– Data compression– Sybase, Vertica, Infobright,

• NoSQL– “Not only SQL”– Deals with unstructured data– MongoDB, CouchDB, Apache Cassandra

Other Data Warehouse Advances

TM 9-65Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems65

The User InterfaceMetadata (data catalog)

• Identify subjects of the data mart• Identify dimensions and facts• Indicate how data is derived from enterprise data

warehouses, including derivation rules• Indicate how data is derived from operational data store,

including derivation rules• Identify available reports and predefined queries• Identify data analysis techniques (e.g. drill-down)• Identify responsible people

TM 9-66Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Online Analytical Processing (OLAP) Tools• The use of a set of graphical tools that provides

users with multidimensional views of their data and allows them to analyze the data using simple windowing techniques

• Relational OLAP (ROLAP)– Traditional relational representation

• Multidimensional OLAP (MOLAP)– Cube structure

• OLAP Operations– Cube slicing–come up with 2-D view of data– Drill-down–going from summary to more detailed

views

TM 9-67Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Multidimensional Analysis

• Databases contain information in a series of two-dimensional tables

• In a data warehouse and data mart, information is multidimensional, it contains layers of columns and rows– Dimension – a particular attribute of

information

TM 9-68Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-21 Slicing a data cube

CUSTOMER

REGION

Multidimensional Analysis • Cube – common term for the representation of

multidimensional information

TM 9-70Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Figure 9-22: Example of drill-down

Summary report

Drill-down with color addedStarting with summary data, users can obtain details for particular cells

TM 9-71Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Business Performance Mgmt (BPM)

Figure 9-25 Sample Dashboard

BPM systems allow managers to measure,monitor, and manage key activities and processes to achieve organizational goals.Dashboards are often used to provide an information system in support of BPM.

Charts like these are examples of data visualization, the representation of data in graphical and multimedia formats for human analysis.

TM 9-72Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

OLAP and its Applications

• What software and function that enable you to create OLAP and its applications?

• ANSWER– EXCEL with– Pivot Table

TM 9-73Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Multidimensional Analysis • Data mining – the process of analyzing data to

extract information not offered by the raw data alone

• To perform data mining users need data-mining tools– Data-mining tool – uses a variety of techniques to find

patterns and relationships in large volumes of information and infers rules that predict future behavior and guide decision making

• An example– Grocery Store in UK (see next slide)

TM 9-74Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

CRM and Data Mining (BI)Example• A Grocery store in U.K. with the following “patterns” found:• Every Thursday afternoon• Young Fathers (why?) shopping at store• Two of the followings are always included in their shopping list

– Diapers and – Beers

• What other decisions should be made as a store manager (in terms of store layout)?

• Short term vs. Long term– This is an example of cross-selling– Other types of promotion: up-sell, bundled-sell

• IT (e.g., BI) helps to find valuable information then decision makers make a timely/right decision for improving/creating competitive advantages.

TM 9-75Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

More on OLTP vs. OLAP

Fig. Extra-a: A simple database with a relation between two tables.

• The figure depicts a relational database environment with two tables.

• The first table contains information about pet owners; the second, information about pets. The tables are related by the single column they have in common: Owner_ID.

• By relating tables to one another, we can reduce ____________ of data and improve database performance.

• The process of breaking tables apart and thereby reducing data redundancy is called _______________.

redundancy

normalization

TM 9-76Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

• Most relational databases which are designed to handle a high number of reads and writes (updates and retrievals of information) are referred to as ________ (OnLine Transaction Processing) systems.

• OLTP systems are very efficient for high volume activities such as cashiering, where many items are being recorded via bar code scanners in a very short period of time.

• However, using OLTP databases for analysis is generally not very efficient, because in order to retrieve data from multiple tables at the same time, a query containing ________ must be used.

OLTP vs. OLAP (cont.)

joins

OLTP

TM 9-77Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

• In order to keep our transactional databases running quickly and smoothly, we may wish to create a data warehouse. A data warehouse is a type of large database (including both current and historical data) that has been _____________ and archived.

• Denormalization is the process of intentionally combining some tables into a single table in spite of the fact that this may introduce duplicate data in some columns.

• The figure depicts what our simple example data might look like if it were in a data warehouse. When we design databases in this way, we reduce the number of joins necessary to query related data, thereby speeding up the process of analyzing our data.

• Databases designed in this manner are called __________ (OnLine Analytical Processing) systems.

OLTP vs. OLAP (cont.)

Fig. Extra-b: A combination of the tables into a single dataset.

OLAP

denormalized

TM 9-78Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

• Transactional systems and analytical systems have conflicting purposes when it comes to database speed and performance. For this reason, it is difficult to design a single system which will serve both purposes. This is why data warehouses generally contain archived data. Archived data are data that have been copied out of a transactional database.

• Denormalization typically takes place at the time data are copied out of the transactional system. It is important to keep in mind that if a copy of the data is made in the data warehouse, the data may become out-of-______ . This happens when a copy is made in the data warehouse and then later, a change to the original record is made in the source database.

• Data mining activities performed on out-of-synch records may be useless, or worse, misleading.

• An alternative archiving method would be to move the data out of the transactional system. This ensures that data won’t get out-of-synch, however, it also makes the data unavailable should a user of the transactional system need to view or update it.

OLTP vs. OLAP (cont.)

synch

TM 9-79Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems79

Data Mining

• Knowledge discovery using a blend of statistical, AI, and computer graphics techniques

• Goals:– Explain observed events or conditions– Confirm hypotheses– Explore data for new or unexpected relationships

TM 9-80Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

TM 9-81Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

DATA MINING

• Data-mining software includes many forms of AI such as neural networks and expert systems

TM 9-82Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Data Mining Examples• A telephone company used a data mining tool to

analyze their customer’s data warehouse. The data mining tool found about 10,000 supposedly residential customers that were expending over $1,000 monthly in phone bills.

• After further study, the phone company discovered that they were really small business owners trying to avoid paying business rates

*

TM 9-83Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Data Mining Examples (cont.)

• 65% of customers who did not use the credit card in the last six months are 88% likely to cancel their accounts.

• If age < 30 and income <= $25,000 and credit rating < 3 and credit amount > $25,000 then the minimum loan term is 10 years.

• 82% of customers who bought a new TV 27" or larger are 90% likely to buy an entertainment center within the next 4 weeks.

TM 9-84Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

84

Sustainable Competitive Advantages

• Any sustainable competitive advantages?• How can an organization sustain its

competitive advantage?• Firms may create/improve their competitive

advantages only if they:– have to learn,– employ approach,–

capacityrevenue management

learning to learn and learning to change (life-long learning environment)

TM 9-85Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

BUSINESS INTELLIGENCE

• Business intelligence – information that people use to support their decision-making efforts

• Principle BI enablers include:– Technology– People– Culture

TM 9-86Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Working , Not Harder• Overlapping Human/Organizational (Culture, Process)/

Technological factors in BI/KM:

PEOPLE

TECHNOLOGY

ORGANIZATIONALPROCESSESi Knowledge

N

Smarter

TM 9-87Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Essential Value Propositions for a Successful Company

• Business • Competency

– Set corporate goals and get executive sponsorship for the initiative

ModelCore

• Execution

TM 9-88Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Organizationalknowledge

Corecompetenc

y

ITPeopleCulture

Best Practices

A specific business context

Can be transferred and reused efficiently and

effectively across functional areas

(sharing and collaboration)

Relationship between the Organizational Knowledge and Core Competency

TM 9-89Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

BI: Big Data And Data Warehousing

• Two paradigms in BI:– _____ __________ and ___ _____.– Both are competing each other for turning data into

actionable information.

• However, in recent years, the variety and complexity of data made data warehouse incapable of keeping up the changing needs.

• Big Data– A new paradigm that the world of IT was forced to

develop, not because the _______ of the structured data but the ______ and the _______ .

Data Warehouse Big Data

volume

variety velocity

TM 9-90Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Introduction to Big Data Analytics• Big Data?

– Not just big!– V______– V______– V______– structured, unstructured, or in a stream

• Two aspects for studying “Big Data”– _______ and __________ /analyzing “Big Data”

• Push ____________ to the data instead of pushing data to a computing mode.

storing processing

computation

olume

elocity

ariety

TM 9-91Copyright © Addison Wesley Longman, Inc. & Dr. Chen, Business Database Systems

Break ! (Ch. 9)Exercise# 5 – a, b, c (p. 422)With the following

assumptions:1. The length of a fiscal

period is one month2. The data mart will contain

five years of historical data

3. Approximately 5 percent of the policies experience

some type of change each monthALL computations for b & c should be shown to

get credits .

HW#3 (p.422)


Recommended