+ All Categories
Home > Documents > M05740010120124040M0574-Pert 15-16 Data warehouse

M05740010120124040M0574-Pert 15-16 Data warehouse

Date post: 04-Apr-2018
Category:
Upload: iemamz
View: 216 times
Download: 0 times
Share this document with a friend

of 38

Transcript
  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    1/38

    Business Intelligence andDecision Support Systems

    Session 15 -16 :

    Data Warehousing

    Course : M0574Decision Support Systems

    Year : September 2012

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    2/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-2

    Learning Objectives

    Understand the basic definitions and conceptsof data warehouses

    Learn different types of data warehousing

    architectures; their comparative advantagesand disadvantages

    Describe the processes used in developingand managing data warehouses

    Explain data warehousing operations

    Explain the role of data warehouses indecision support

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    3/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-3

    Learning Objectives

    Explain data integration and the extraction,transformation, and load (ETL) processes

    Describe real-time (a.k.a. right-time and/or

    active) data warehousing Understand data warehouse administration

    and security issues

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    4/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-4

    Opening Vignette:

    DirecTV Thrives with Active DataWarehousing

    Company background

    Problem description

    Proposed solution

    ResultsAnswer & discuss the case questions.

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    5/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-5

    Main Data Warehousing (DW) Topics

    DW definitions

    Characteristics of DW

    Data Marts

    ODS, EDW, Metadata

    DW Framework

    DW Architecture & ETL Process

    DW Development

    DW Issues

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    6/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-6

    Data Warehouse Defined

    A physical repository where relational dataare specially organized to provide enterprise-wide, cleansed data in a standardized format

    The data warehouse is a collection ofintegrated, subject-oriented databases designto support DSS functions, where each unit of

    data is non-volatile and relevant to somemoment in time

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    7/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-7

    Characteristics of DW

    Subject oriented

    Integrated

    Time-variant (time series)

    Nonvolatile Summarized

    Not normalized

    Metadata

    Web based, relational/multi-dimensional

    Client/server

    Real-time and/or right-time (active)

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    8/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-8

    Data Mart

    A departmental data warehouse thatstores only relevant data

    Dependent data martA subset that is created directly from adata warehouse

    Independent data martA small data warehouse designed for astrategic business unit or a department

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    9/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-9

    Data Warehousing Definitions

    Operational data stores (ODS)

    A type of database often used as an interim area for adata warehouse

    Oper marts

    An operational data mart.

    Enterprise data warehouse (EDW)

    A data warehouse for the enterprise.

    MetadataData about data. In a data warehouse, metadatadescribe the contents of a data warehouse and themanner of its acquisition and use

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    10/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-10

    A Conceptual Framework for DW

    Data

    Sources

    ERP

    Legacy

    POS

    OtherOLTP/wEB

    External

    data

    Select

    Transform

    Extract

    Integrate

    Load

    ETL

    Process

    Enterprise

    Data warehouse

    Metadata

    Replication

    API

    /

    Middleware Data/text

    mining

    Custom built

    applications

    OLAP,

    Dashboard,

    Web

    Routine

    Business

    Reporting

    Applications

    (Visualization)

    Data mart

    (Engineering)

    Data mart

    (Marketing)

    Data mart

    (Finance)

    Data mart

    (...)

    Access

    No data marts option

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    11/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-11

    Generic DW Architectures

    Three-tier architecture

    1. Data acquisition software (back-end)

    2. The data warehouse that contains the data &

    software3. Client (front-end) software that allows users to

    access and analyze data from the warehouse

    Two-tier architecture

    First 2 tiers in three-tier architecture is combinedinto one

    sometime there is only one tier?

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    12/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-12

    Generic DW Architectures

    Tier 2:

    Application server

    Tier 1:

    Client workstation

    Tier 3:

    Database server

    Tier 1:

    Client workstation

    Tier 2:

    Application & database server

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    13/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-13

    DW Architecture Considerations

    Issues to consider when deciding whicharchitecture to use: Which database management system (DBMS)

    should be used? Will parallel processing and/or partitioning be

    used?

    Will data migration tools be used to load thedata warehouse?

    What tools will be used to support dataretrieval and analysis?

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    14/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-14

    A Web-based DW Architecture

    Web

    Server

    Client

    (Web browser)

    Application

    Server

    Data

    warehouse

    Web pages

    Internet/

    Intranet/

    Extranet

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    15/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-15

    Alternative DW Architectures

    Source

    Systems

    Staging

    Area

    Independent data marts

    (atomic/summarized data)

    End useraccess and

    applications

    ETL

    (a) Independent Data Marts Architecture

    Source

    Systems

    Staging

    Area

    End user

    access and

    applications

    ETL

    Dimensionalized data marts

    linked by conformed dimentions

    (atomic/summarized data)

    (b) Data Mart Bus Architecture with Linked Dimensional Datamarts

    Source

    Systems

    Staging

    Area

    End user

    access and

    applications

    ETL

    Normalized relational

    warehouse (atomic data)

    Dependent data marts

    (summarized/some atomic data)

    (c) Hub and Spoke Architecture (Corporate Information Factory)

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    16/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-16

    Alternative DW Architectures

    Source

    Systems

    Staging

    Area

    Normalized relational

    warehouse (atomic/some

    summarized data)

    End user

    access and

    applications

    ETL

    (d) Centralized Data Warehouse Architecture

    End useraccess and

    applications

    Logical/physical integration of

    common data elementsExisting data warehouses

    Data marts and legacy systmes

    Data mapping / metadata

    (e) Federated Architecture

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    17/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-18

    Which Architecture is the Best?

    Bill Inmon versus Ralph Kimball

    Enterprise DW versus Data Marts approach

    Empirical study by Ariyachandra and Watson (2006)

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    18/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-19

    Data Warehousing Architectures

    1. Information

    interdependence betweenorganizational units

    2. Upper managementsinformation needs

    3. Urgency of need for adata warehouse

    4. Nature of end-user tasks

    5. Constraints on resources

    6. Strategic view of the data

    warehouse prior toimplementation

    7. Compatibility with existingsystems

    8. Perceived ability of the in-houseIT staff

    9. Technical issues

    10. Social/political factors

    Ten factors that potentially affect thearchitecture selection decision:

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    19/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-20

    Enterprise Data Warehouse(by Teradata Corporation)

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    20/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-21

    Data Integration and the Extraction,Transformation, and Load (ETL) Process

    Data integration

    Integration that comprises three major processes:data access, data federation, and change capture.

    Enterprise application integration (EAI)

    A technology thatprovides a vehicle for pushing datafrom source systems into a data warehouse

    Enterprise information integration (EII)

    An evolving tool space that promises real-time dataintegration from a variety of sources

    Service-oriented architecture (SOA)

    A new way of integrating information systems

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    21/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-22

    Extraction, transformation, and load (ETL) process

    Data Integration and the Extraction,Transformation, and Load (ETL) Process

    Packaged

    application

    Legacy

    system

    Other internal

    applications

    Transient

    data source

    Extract Transform Cleanse Load

    Data

    warehouse

    Data mart

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    22/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-23

    ETL

    Issues affecting the purchase of and ETL tool

    Data transformation tools are expensive

    Data transformation tools may have a longlearning curve

    Important criteria in selecting an ETL tool

    Ability to read from and write to an unlimitednumber of data sources/architectures

    Automatic capturing and delivery of metadata A history of conforming to open standards

    An easy-to-use interface for the developer and thefunctional user

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    23/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-24

    Benefits of DW

    Direct benefits of a data warehouse Allows end users to perform extensive analysis

    Allows a consolidated view of corporate data

    Better and more timely information

    Enhanced system performance Simplification of data access

    Indirect benefits of data warehouse

    Enhance business knowledge

    Present competitive advantage Enhance customer service and satisfaction

    Facilitate decision making

    Help in reforming business processes

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    24/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-25

    Data Warehouse Development

    Data warehouse development approaches

    Inmon Model: EDW approach (top-down)

    Kimball Model: Data mart approach (bottom-up)

    Which model is best? There is no one-size-fits-all strategy to DW

    One alternative is the hosted warehouse

    Data warehouse structure: The Star Schema vs. Relational

    Real-time data warehousing?

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    25/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-26

    DW Development Approaches

    See Table 8.3 for details

    (Inmon Approach) (Kimball Approach)

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    26/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-27

    DW Structure: Star Schema(a.k.a. Dimensional Modeling)

    Claim Information

    Driver Automotive

    TimeLocation

    Start Schema Example for an

    Automobile Insurance Data Warehouse

    Dimensions:

    How data will be sliced/

    diced (e.g., by location,

    time period, type of

    automobile or driver)

    Facts:

    Central table that contains

    (usually summarized)

    information; also contains

    foreign keys to access each

    dimension table.

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    27/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-28

    Dimensional Modeling

    Data cubeA two-dimensional,three-dimensional, or

    higher-dimensionalobject in which eachdimension of the datarepresents a measureof interest

    - Grain- Drill-down- Slicing

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    28/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-29

    Best Practices for Implementing DW

    The project must fit with corporate strategy

    There must be complete buy-in to the project

    It is important to manage user expectations

    The data warehouse must be built incrementally

    Adaptability must be built in from the start

    The project must be managed by both IT andbusiness professionals (a businesssupplierrelationship must be developed)

    Only load data that have been cleansed/high quality

    Do not overlook training requirements

    Be politically aware.

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    29/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-30

    Risks in Implementing DW

    No mission or objective

    Quality of source data unknown

    Skills not in place

    Inadequate budget

    Lack of supporting software

    Source data not understood

    Weak sponsor

    Users not computer literate Political problems or turf wars

    Unrealistic user expectations

    (Continued )

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    30/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-31

    Risks in Implementing DW Cont.

    Architectural and design risks

    Scope creep and changing requirements

    Vendors out of control

    Multiple platforms

    Key people leaving the project

    Loss of the sponsor

    Too much new technology

    Having to fix an operational system Geographically distributed environment

    Team geography and language culture

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    31/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-32

    Things to Avoid for SuccessfulImplementation of DW

    Starting with the wrong sponsorship chain

    Setting expectations that you cannot meet

    Engaging in politically naive behavior

    Loading the warehouse with information justbecause it is available

    Believing that data warehousing database

    design is the same as transactional DB design Choosing a data warehouse manager who is

    technology oriented rather than user oriented

    (see more on page 356)

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    32/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-33

    Real-time DW(a.k.a. Active Data Warehousing)

    Enabling real-time data updates forreal-time analysis and real-time decisionmaking is growing rapidly

    Push vs. Pull (of data)

    Concerns about real-time BI Not all data should be updated continuously

    Mismatch of reports generated minutes apart May be cost prohibitive

    May also be infeasible

    Evolution of DSS & DW

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    33/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-34

    Evolution of DSS & DW

    h

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    34/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-35

    Active Data Warehousing(by Teradata Corporation)

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    35/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-36

    Comparing Traditional and Active DW

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    36/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-37

    Data Warehouse Administration

    Due to its huge size and its intrinsic nature, aDW requires especially strong monitoring inorder to sustain its efficiency, productivityand security.

    The successful administration andmanagement of a data warehouse entailsskills and proficiency that go past what is

    required of a traditional databaseadministrator.

    Requires expertise in high-performance software,hardware, and networking technologies

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    37/38

    Copyright 2011 Pearson Education, Inc. Publishing as Prentice Hall8-38

    DW Scalability and Security

    Scalability The main issues pertaining to scalability:

    The amount of data in the warehouse

    How quickly the warehouse is expected to grow

    The number of concurrent users

    The complexity of user queries

    Good scalability means that queries and otherdata-access functions will grow linearly with the

    size of the warehouse

    Security

    Emphasis on security and privacy

  • 7/29/2019 M05740010120124040M0574-Pert 15-16 Data warehouse

    38/38

    Copyright 2011 Pearson Education Inc Publishing as Prentice Hall8 39

    BI / OLAP Portal for Learning

    MicroStrategy, and much more www.TeradataStudentNetwork.com

    Password: [**Keyword**]

    http://www.teradatastudentnetwork.com/http://www.teradatastudentnetwork.com/

Recommended