+ All Categories
Home > Documents > 134294817 Data Warehousing

134294817 Data Warehousing

Date post: 03-Apr-2018
Category:
Upload: pattabhikv
View: 218 times
Download: 0 times
Share this document with a friend

of 50

Transcript
  • 7/28/2019 134294817 Data Warehousing

    1/50

  • 7/28/2019 134294817 Data Warehousing

    2/50

    17 July 2013 2

    Contents

    Data Warehouse Concepts

    Data Warehouse ArchitecturesData Modeling Approaches

    Data Modeling Development Cycle

  • 7/28/2019 134294817 Data Warehousing

    3/50

    17 July 2013 3317 July 2013

    Data Warehouse Concepts

  • 7/28/2019 134294817 Data Warehousing

    4/50

    17 July 2013 4

    Data Warehouse Concepts Agenda

    A.What is a Data Warehouse (DW) ?B.What are the components of a DW ?

    C.What are the various architectures/formats of a DW ?

    D.Examples of Data Warehousing tools in use

  • 7/28/2019 134294817 Data Warehousing

    5/50

    17 July 2013 5

    Need for Data Warehousing Business View

    Customer Centricity

    Single view of each customer and his/her activities Integrated information from heterogeneous sources

    Adaptability to rapidly changing business needs

    Multiple ways to view business performance

    Low cycle time, faster analytics

    Increased Global competition

    Crunch more and more data, faster and faster

    Mergers and Acquisition

    With each acquisition comes another set of disparate IT systems

    affecting consistency and performance

  • 7/28/2019 134294817 Data Warehousing

    6/50

  • 7/28/2019 134294817 Data Warehousing

    7/50

    17 July 2013

    OLTP vs. DSS : A comparison

    OLTP Environment get data IN

    large volumes of simpletransaction queries

    continuous data changes

    low processing time

    mode of processing

    transaction details

    data inconsistency

    mostly current data

    high concurrent usage

    highly normalized datastructure

    static applications

    automates routines

    DSS Environment get information OUT

    small number of diversequeries

    periodic updates only

    high processing time

    mode of discovery

    subject oriented - summaries

    data consistency

    historical data is relevant

    low concurrent usage

    fewer tables, but morecolumns per table

    Dynamic (ad-hoc) applications

    facilitates creativity

  • 7/28/2019 134294817 Data Warehousing

    8/50

    17 July 2013 8

    Data Warehouse Defined

    Data Warehouse is a

    Subject-Oriented Integrated

    Time-Variant

    Non-volatile

    co l lect ion of data enabl ing m anagement

    decis ion making

  • 7/28/2019 134294817 Data Warehousing

    9/50

    17 July 2013 9

    Data Warehouse StorageTransactional Storage

    Sales

    Customers

    Products

    Entry

    Sales Rep

    Quantity Sold

    Part Number

    Date

    Customer Name

    Product Description

    Unit Price

    Mail Address

    Process Oriented Subject Oriented

    Subject Orientation

  • 7/28/2019 134294817 Data Warehousing

    10/50

    17 July 2013 10

    Load

    Access

    Mass Load / Access of DataRecord-by-Record Data Manipulation

    Insert

    Access

    Insert

    Change

    Delete

    Change

    Volatile Non-Volatile

    Data Warehouse StorageTransactional Storage

    Data Volatility

  • 7/28/2019 134294817 Data Warehousing

    11/50

    17 July 2013 11

    Data Warehouse StorageTransactional Storage

    Current Data Historical Data

    0

    5

    10

    15

    20

    Sales ( in lakhs

    )

    January February March

    Year97

    Sales ( Region , Year - Year 97 - 1st Qtr)

    East

    West

    North

    Time Variance

  • 7/28/2019 134294817 Data Warehousing

    12/50

  • 7/28/2019 134294817 Data Warehousing

    13/50

    17 July 2013 13

    Data Warehouse Concepts Agenda

    A.What is a Data Warehouse (DW) ?B.What are the components of a DW ?

    C.What are the various architectures/formats of a DW ?

    D.Examples of Data Warehousing tools in use

  • 7/28/2019 134294817 Data Warehousing

    14/50

    17 July 2013 14

    Transmission

    N

    E

    T

    W

    O

    RK

    Metadata Layer

    Cleansing

    Transformation

    Aggregation

    Summarization

    Data MartPopulation

    Knowledge Discovery

    ODS DW

    OLAP ANALYSIS

    Extraction

    DM1

    DM2

    DMn

    Legacy System

    FS1

    FS2

    FSn

    .

    .

    .

    S

    T

    A

    G

    I

    N

    G

    A

    R

    EA

    Data Warehouse Components

  • 7/28/2019 134294817 Data Warehousing

    15/50

    17 July 2013 15

    Data Warehouse Build Lifecycle

    Data extraction

    Data Cleansing and Transformation

    Data Load and refresh

    Build derived data and views

    Service queries

    Administer the warehouse

  • 7/28/2019 134294817 Data Warehousing

    16/50

    17 July 2013 16

    Data Warehouse Concepts Agenda

    A.What is a Data Warehouse (DW) ?B.What are the components of a DW ?

    C.What are the various architectures/formats of a DW ?

    D.Examples of Data Warehousing tools in use

  • 7/28/2019 134294817 Data Warehousing

    17/50

    17 July 2013 17

    Data Warehouse Architectures

    Virtual Data Warehouse

    Enterprise Data Warehouse

    Distributed Data Marts

    Multi-tiered warehouse

  • 7/28/2019 134294817 Data Warehousing

    18/50

    17 July 2013 18

    Legacy

    Client/

    Server

    OLTP

    Application

    External

    REPORT

    INGTOOL

    U

    S

    ER

    S

    OperationalSystemsData

    Virtual Data Warehouse

  • 7/28/2019 134294817 Data Warehousing

    19/50

    17 July 2013 19

    DATA WAREHOUSE

    Legacy

    OLTP

    External

    AP

    I

    U

    S

    ER

    S

    Select

    Extract

    Maintain

    Transform

    Integrate

    Data Preparation

    Metadata

    Repository

    Client/

    Server

    Enterprise Data Warehouse

    Operationa

    lSystemsData

    REPORTINGTOOL

  • 7/28/2019 134294817 Data Warehousing

    20/50

    17 July 2013 20

    A

    PI

    U

    S

    E

    R

    S

    Operational Systems

    Data

    Data Preparation

    Data Mart

    Data Mart

    Data MartLegacy

    OLTP

    External

    Select

    Extract

    Maintain

    Transform

    Integrate

    Client/

    Server

    Distributed Data Marts

    REPORTINGTOOL

  • 7/28/2019 134294817 Data Warehousing

    21/50

    17 July 2013 21

    DATA

    WAREHOUSE

    Legacy

    Client/

    Server

    OLTP

    External

    A

    PI

    U

    S

    E

    R

    S

    Operational Systems

    Enterprise wide Data

    Metadata

    Repository

    Data Mart

    Data Mart

    Data Mart

    Select

    Extract

    Maintain

    Transform

    Integrate

    Multi-tiered Data Warehouse: Option 1

    REPORTINGTOOL

  • 7/28/2019 134294817 Data Warehousing

    22/50

    17 July 2013 22

    A

    PI

    U

    S

    E

    R

    S

    Operational Systems

    Data

    Data Preparation

    Data Mart

    Data Mart

    Data MartLegacy

    OLTP

    External

    Select

    Extract

    Maintain

    Transform

    Integrate

    Client/

    ServerDATA

    WAREHOUSE

    Metadata

    Repository

    Multi-tiered Data Warehouse: Option 2

    REPORTINGTOOL

  • 7/28/2019 134294817 Data Warehousing

    23/50

    17 July 2013 23

    Highly Summarized Data

    Lightly Summarized Data

    Current Detail Data

    Older Detail Data

    Metadata

    Cont.

    Relative Data sizes in a Data Warehouse

  • 7/28/2019 134294817 Data Warehousing

    24/50

    17 July 2013 24

    Monthly Sales by Product

    for 1991-94

    Weekly sales by

    product/sub-product

    for 1991-94

    Sales Detail

    for 1991-94

    Sales Detail for

    1985-90

    Metadata

    Weekly sales by

    region for 1991-94

    Monthly sales by

    region for 1991-94

    Data Warehouse - Example

  • 7/28/2019 134294817 Data Warehousing

    25/50

    17 July 2013 25

    Cont.

    Building a Data Warehouse - Steps

    Identify key business drivers, sponsorship, risks, ROI

    Survey information needs and identify desired functionalityand define functional requirements for initial subject area.

    Architect long-term, data warehousing architecture

    Evaluate and Finalize DW tool & technology

    Conduct Proof-of-Concept

  • 7/28/2019 134294817 Data Warehousing

    26/50

    17 July 2013 26

    Building a Data Warehouse - Steps

    Design target data base schema

    Build data mapping, extract, transformation, cleansing and

    aggregation/summarization rules

    Build initial data mart, using exact subset of enterprise datawarehousing architecture and expand to enterprise

    architecture over subsequent phases

    Maintain and administer data warehouse

  • 7/28/2019 134294817 Data Warehousing

    27/50

    17 July 2013 27

    Tool Category Products

    ETL Tools ETI Extract, Informatica, IBM Visual WarehouseOracle Warehouse Builder

    OLAP Server Oracle Express Server, Hyperion Essbase, IBM DB2OLAP Server, Microsoft SQL Server OLAP Services,Seagate HOLOS, SAS/MDDB

    OLAP Tools Oracle Express Suite, Business Objects, WebIntelligence, SAS, Cognos Powerplay/Impromtu,KALIDO, MicroStrategy, Brio Query, MetaCube

    Data Warehouse Oracle, Informix, Teradata, DB2/UDB, Sybase,Microsoft SQL Server, RedBricks

    Data Mining &Analysis

    SAS Enterprise Miner, IBM Intelligent Miner,SPSS/Clementine, TCS Tools

    Representative DW Tools

  • 7/28/2019 134294817 Data Warehousing

    28/50

    17 July 2013

    Top-Down Approach

    Using the top-down approach, you can discover and draft a description of the

    business process. That description supplies you with concepts that will be usedas a starting place. This is a functional or process-driven analysis.You need to,

    as the name implies, start at the top and drill downward, increasing the level

    of detail in an iterative fashion. This typically needs more time for

    development.

    Without it you may miss the following: Assumptions everyone expects you to know

    Future developments that could change your direction

    Opportunities to increase the quality, usability, accessibility, and enterprise

    data

  • 7/28/2019 134294817 Data Warehousing

    29/50

    17 July 2013

    Top-Down Approach

    With it you gain the following:

    An understanding of the way things fit together, from high to low levels ofdetail

    A sense of the political environment that may surround the data

    An enhancement of your understanding of data importance

    The guide to the level of detail you need for different audiences

  • 7/28/2019 134294817 Data Warehousing

    30/50

    17 July 2013

    Top-Down Approach

    In top-down analysis, people are the best source of your information.

    The top down implementation can also imply more of a need for an enterprisewide or corporate wide data warehouse with a higher degree of cross

    workgroup, department, or line of business access to the data.

    A top down implementation can result in more consistent data definitions and

    the enforcement of business rules across the organization, from the beginning.

    However, the cost of the initial planning and design can be significant. It is atime-consuming process and can delay actual implementation, benefits, and

    return-on-investment.

  • 7/28/2019 134294817 Data Warehousing

    31/50

    17 July 2013

    Bottom-Up Approach

    The bottom-up approach focuses instead on the inventory of things in a process.

    It implies an in-depth understanding of as much of the process as can be knownat this point. Using this approach you discover and draft a list of potential

    elements without regard to how theyre used. The list usually consists of a mixed

    set of very low-level, detailed notions and high-level concepts. The trick is to

    aggregate them to the same level of detail. This is a data-driven analysis. You

    concentrate on what things are. You concentrate on the parts rather than the

    process. You need to, as the name implies, start at the bottom and aggregate up

    while increasing your level of aggregation, again in an iterative fashion. Without

    it you may miss a real-world understanding of the data and how it fits together,

    as well as the following:

    Data areas that everyone expects you to know

  • 7/28/2019 134294817 Data Warehousing

    32/50

    17 July 2013

    Bottom-Up Approach

    Relationships

    Fuzzy, currently undefined areas that need extra work to bring them to thesame level of understanding

    With it you gain the following:

    An understanding of the things involved

    A sense of the quality levels that may be inherent to the data

    An enhancement of your understanding of data definitions

  • 7/28/2019 134294817 Data Warehousing

    33/50

    17 July 2013

    Bottom-Up Approach

    In bottom-up analysis, the current environment is the best source of your

    information. The bottom up implementation approach has become the choice ofmany organizations, especially business management, because of the faster

    payback. It enables faster results because data marts have a less complex design

    than a global data warehouse. In addition, the initial implementation is usually

    less expensive in terms of hardware and other resources than deploying the

    global data warehouse. Typically Bottom Up approach is confined to only

    limited set of requirements and focuses on short term solution which delivers the

    reporting needs quickly.

    Data Modeling Development Cycle

  • 7/28/2019 134294817 Data Warehousing

    34/50

    17 July 2013

    Conceptual Data

    Modeling

    Logical DataModeling

    Physical Data

    Modeling

    This data model

    includes all major

    things that need

    to be tracked,

    along with

    constraints.

    Usually, specifiedin terms of

    business

    requirements,

    forms, reports

    etc.

    This is a complete

    model that includes

    all required tables,

    columns,

    relationship,

    database properties,

    referential integrity

    constraints for the

    physical

    implementation.

    DBAs instruct the data

    Modeling tool to createSQL code from physical

    data model. The SQL

    code is then executed

    on the server to create

    databases.

    This is the actual

    implementation of a

    conceptual model in a

    logical data model.

    Usually expressed in

    terms of entities,

    attributes,

    relationships, and

    keys.

    Data Modeling Development Cycle

    Database

    Creation

    D l t C l C t l D t M d li

  • 7/28/2019 134294817 Data Warehousing

    35/50

    17 July 2013

    Development Cycle - Conceptual Data Modeling

    CDM is the first step in constructing a data model in top-down approach

    and is a clear and accurate visual representation of the business of an

    organization. In many ways, it represents the users view of the business.

    A Conceptual Data Model (CDM) visualizes the users view of the

    business and provides high-level information about the subject areas of

    an organization.

    CDM discussion starts with main subject area of an organization. It relies

    on specs, reports, forms, views, requirements, application demos, and

    user interactions to form a conceptual view of business.

  • 7/28/2019 134294817 Data Warehousing

    36/50

  • 7/28/2019 134294817 Data Warehousing

    37/50

    Development Cycle Physical Data Modeling

  • 7/28/2019 134294817 Data Warehousing

    38/50

    17 July 2013

    Development Cycle - Physical Data Modeling

    Physical Data Models are used to design the internal schema of a database,

    depicting the data tables (derived from the logical data entities), the data columnsof those tables (derived from the entity attributes), and the relationships between

    the tables (derived from the entity relationships).

    Database performance, indexing strategy, physical storage and denormalization

    are important parameters of a physical model.

    The transformations from logical model to physical model include imposing

    database rules, implementation of referential integrity, super types and sub types

    etc.

    Once physical data model is completed, it is then forwarded to technical teams(developer, group lead, DBA) for review.

    Development Cycle CDM LDM PDM comparisons

  • 7/28/2019 134294817 Data Warehousing

    39/50

    17 July 2013

    Development Cycle CDM, LDM, PDM comparisons

    Conceptual

    Data Model

    Logical

    Data Model

    Physical

    Data Model

    Provides high-levelinformation about the

    subject areas and users

    view of an organization.

    Represents businessinformation and defines

    business rules

    Represents the physicalimplementation of the model

    in a database.

    Subject Areas Entity Table

    Things to track Attribute Column

    No Keys identified Primary Key Primary Key Constraint

    No Keys identified Alternate Key Unique Constraint or Unique

    Index

    No Rules or constraints Rule, Functional Dependencies Check Constraint, Default

    Value, User Definedconstraints, referential

    constraints

    Relationship Relationship Foreign Key

    No Definition or comment Definition Comment

    Development Cycle Database Creation / Development

  • 7/28/2019 134294817 Data Warehousing

    40/50

    17 July 2013

    Development Cycle - Database Creation/ Development

    A physical database definition (say DDL for DB2, schema for Sybase or Oracle)

    can be generated by entering the gathered information into a physical designtool.

    This must be reviewed carefully and in all likelihood modified to some degree,

    since no physical design tool generates 100 percent perfect database definitions.

    The script can then be run against the database management system to define

    the physical environment.

    Data Modeling for a Data Warehouse

  • 7/28/2019 134294817 Data Warehousing

    41/50

    17 July 2013

    Data Modeling for a Data Warehouse

    Following are commonly followed data modeling techniques :

    Dimensional Modeling :a) Star Schema ( Denormalized data)

    b) Snow Flake Schema (Partial Normalized data)

    ER Modeling or Relation Modeling (Normalized data 1 NF, 2NF, 3NF)

    Pros and Cons of each Technique.

    Star Schema

  • 7/28/2019 134294817 Data Warehousing

    42/50

    17 July 2013

    Fact Table This table is the core of the StarSchema Structure and contains

    the Facts or Measures availablethrough the Data Warehouse.

    These Facts answer the

    questions of What, How

    Much, or How Many.

    Some Examples:

    Sales Dollars, Units Sold, Gross

    Profit, Expense Amount, Net Income,

    Unit Cost, Number of Employees,

    Turnover, Salary, Tenure, etc.

    Star Schema

    Star Schema

  • 7/28/2019 134294817 Data Warehousing

    43/50

    17 July 2013

    Dimension

    Tables

    These tables describe the Facts

    or Measures. These tables

    contain the Attributes and may

    also be Hierarchical.

    These Dimensions answer the

    questions of Who, What,

    When, or Where.

    Some Examples:

    Day, Week, Month, Quarter, Year

    Sales Person, Sales Manager, VP of Sales

    Product, Product Category, Product Line

    Cost Center, Unit, Segment, Business,

    Company

    Star Schema

    Star Schema

  • 7/28/2019 134294817 Data Warehousing

    44/50

    17 July 2013

    Time_Dim

    TimeKeyTheDate...

    Sales_Fact

    TimeKeyEmployeeKey

    ProductKeyCustomerKeyShipperKey

    Required Data(Business Metrics)or (Measures)...

    Employee_Dim

    EmployeeKeyEmployeeID..

    .

    Product_Dim

    ProductKeyProductID...

    Customer_Dim

    CustomerKey

    CustomerID...

    Shipper_Dim

    ShipperKey

    ShipperID...

    Star Schema

    Star Schema

  • 7/28/2019 134294817 Data Warehousing

    45/50

    17 July 2013

    Particular form of a dimensional model

    Central fact table containing Measures

    Surrounded by one perimeter of descriptors - Dimensions

    Star Schema

    Snow Flake Schema

  • 7/28/2019 134294817 Data Warehousing

    46/50

    17 July 2013

    Complex dimensions are re-normalized

    Different levels or hierarchies of a dimension are kept separate

    Given dimension has relationship to other levels of samedimension

    Snow Flake Schema

  • 7/28/2019 134294817 Data Warehousing

    47/50

    Modeling ER Model

  • 7/28/2019 134294817 Data Warehousing

    48/50

    17 July 2013

    Modeling ER Model

    In ER modeling, naming entities is important for an easy and clear understanding

    and communications. Usually, the entity name is expressed grammatically in the

    form of a noun rather than a verb. The criteria for selecting an entity name is how

    well the name represents the characteristics and scope of the entity. In the

    detailed ER model, defining a unique identifier of an entity is the most critical

    task. These unique identifiers are called candidate keys. From them we can

    select the key that is most commonly used to identify the entity. It is called the

    primary key.

    Another important concept in ER modeling is normalization. Normalization is a

    process for assigning attributes to entities in a way that reduces data

    redundancy, avoids data anomalies, provides a solid architecture for updating

    data, and reinforces the long-term integrity of the data model. The third normal

    form is usually adequate. A process for resolving the many-to-many relationshipsis an example of normalization.

    Modeling Example of ER model

  • 7/28/2019 134294817 Data Warehousing

    49/50

    17 July 2013

    Modeling Example of ER model

  • 7/28/2019 134294817 Data Warehousing

    50/50


Recommended