+ All Categories
Home > Documents > 20091029Session DW

20091029Session DW

Date post: 05-Apr-2018
Category:
Upload: santosh-kumar-macharla
View: 214 times
Download: 0 times
Share this document with a friend

of 33

Transcript
  • 7/31/2019 20091029Session DW

    1/33

    Introduction to Data Warehousing

    2009 IBM Corporation

    Robert [email protected] for i Center of Excellence

  • 7/31/2019 20091029Session DW

    2/33

    STG Technical Conferences 2009

    The Agenda

    Background

    Turning DATA into INFORMATION

    Architectures/Strategies to get you there

    Introduction to Data Warehousing 2009 IBM Corporation

    DB2 for i Enablers

    2

  • 7/31/2019 20091029Session DW

    3/33

    STG Technical Conferences 2009

    Todays Reporting Requirements Remove Dependency on ITEase IT backlog of reporting requests

    Reduce Report Maintenance

    Empower End Users Client Independence

    Web Based

    Reduced Software Maintenance

    Multi le Viewin O tions

    Introduction to Data Warehousing 2009 IBM Corporation

    Dashboards/Scorecards

    Spreadsheet Integration

    Board Room Quality PDF

    Automated Report Distribution

    E-mail Distribution Application Integration

    Reporting as a function of Line of Business apps

    Portal interfaces

    3

  • 7/31/2019 20091029Session DW

    4/33

    STG Technical Conferences 2009

    What is Business Intelligence?

    REPORTINGWHAT HAPPENED?

    MONITOR

    WHAT JUSTHAPPENED?

    ANALYSISWHY DID IT HAPPEN?

    PREDICTWHAT WILL HAPPEN?

    Data MiningQuery/

    ReportingOnLine

    AnalyticsDashboards/Scorecards

    Introduction to Data Warehousing 2009 IBM Corporation

    OS/EAI-Operation Systems/Enterprise Application Integrations

    Source: The Data Warehousing Institute, Smart Companies in the 21st Century, July 2003

    Trending/OLAP Data Mining(Predictive Analytics)

    Business PerformanceManagement

    Historical Data (Data Warehouses/Marts) Real-Time Data (OS/EAI)

    DBMS

    4

  • 7/31/2019 20091029Session DW

    5/33

    STG Technical Conferences 2009

    Customer info ----> C file

    Order header file-> O file

    Order details ------> D file

    DB2

    Normalized OLTP Data Base

    Introduction to Data Warehousing 2009 IBM Corporation

    em escr p ons-> eSalesman info ----> S file

    Very good design change information only in one place

    O

    I

    S

    5

  • 7/31/2019 20091029Session DW

    6/33

    STG Technical Conferences 2009

    Update customer information

    Take an order

    Record a payment

    DB2

    Follow a transaction

    Introduction to Data Warehousing 2009 IBM Corporation

    OLTP usually workswith small pieces ofthe DB

    OD

    I S

    6

  • 7/31/2019 20091029Session DW

    7/33

    STG Technical Conferences 2009

    DB2

    But Ask A Simple Question

    Introduction to Data Warehousing 2009 IBM Corporation

    Who are my best

    customers?Must go through theentire customer file

    OD

    I S

    7

  • 7/31/2019 20091029Session DW

    8/33

    STG Technical Conferences 2009

    DB2

    Another Question

    Introduction to Data Warehousing 2009 IBM Corporation

    Who are my bestSalesmen?

    Who are they selling to?

    What are they selling?

    OD

    I S

    8

  • 7/31/2019 20091029Session DW

    9/33

    STG Technical Conferences 2009

    Are you in Spreadsheet or I/T Purgatory?

    Source

    Systems

    ERP

    Annual RepQuarter1298 this is abogus report &is only for thepurpose of cre-ating an icon...

    Reports

    Excel

    ExcelExcel

    1 + 1 = 21 + 1 = 2

    RekeyedDownloaded

    Rekeyed 1 + 1 = 21 + 1 = 2

    Rekeyed

    Introduction to Data Warehousing 2009 IBM Corporation

    Rekeyed

    POS

    Spreadsheets

    Other

    Sources

    xce

    Excel

    Excel

    Access Excel

    Excel

    1 + 1 = 31 + 1 = 3

    1 + 3 = 71 + 3 = 7

    2 + 1 = 1.52 + 1 = 1.5

    Rekeyed

    Cut & Paste

    Downloaded

    Uploaded

    9

  • 7/31/2019 20091029Session DW

    10/33

    STG Technical Conferences 2009

    The most widespread technicalproblem reported by practitionerswas slow query performance.

    Survey of over 2000 companies that have implemented

    Introduction to Data Warehousing 2009 IBM Corporation

    us ness nte gence pp cat ons

    The BI Survey 8 Nigel Pendse,

    10

  • 7/31/2019 20091029Session DW

    11/33

    STG Technical Conferences 2009

    Managing the Querying of Production Data

    Shield report authors and end users from complexities of the database

    Leverage a META DATA oriented Query Tool (ex: DB2 Web Query)

    Define data relationships, standardize/simplify data meanings

    Optimize the environment

    Ensure a PROACTIVE or REACTIVE indexing strategy is in place

    Proactive

    Read Indexing and Statistics White Paper at: http://www-03.ibm.com/servers/enable/site/bi/strategy/ind ex.html

    Reactive

    Introduction to Data Warehousing 2009 IBM Corporation

    Get to (at a minimum) V5R4

    Minimize Impact on Production Systems

    Isolate query workloads through dedicated subsystems/pools for Query jobs

    Be wary of autotuner impact on queries

    Leverage Query Governor (QQRYTIMLMT) with time or disk space (V5R4) governing

    Get Some Assistance

    IBM Lab Services SQL/Query Performance Assessment service

    ibm.com/systems/i/editions/services.htm

    11

  • 7/31/2019 20091029Session DW

    12/33

    STG Technical Conferences 2009

    Isolating Production Systems with Logical Replication

    H/A Solution

    Production H/A Backup

    DB2Mirrored

    Image

    ODS Data Warehouse

    Introduction to Data Warehousing 2009 IBM Corporation

    Queries againstProduction Databases

    Queries againstData Warehouse/Marts

    I/T Optimization through Combined H/A and BI Server

    Leverage H/A software to create Operational Data Store (ODS) in near real time

    Utilize ODS as the source for ETL processes into the Data Warehouse

    Combine with target side remote journaling for ETL efficiencies

    No impact to Production Databases

    Utilize mostly idle capacity of H/A Server for Data Warehouse Workloads

    Optionally mirror Data Warehouse

    12

  • 7/31/2019 20091029Session DW

    13/33

    STG Technical Conferences 2009

    Common data Challenges

    Data errors failed joins

    invalid dates missing values

    Introduction to Data Warehousing 2009 IBM Corporation

    Hidden meanings and conditional rules 2nd character of column X means ..

    if column Y = S, value Z must be multiplied by -1

    If record type is 1, there mustbe a matching record in table B.

    If type is 2, there maybe a record.

    If type is 3 there should notbe a record.

    For data older than 2/11/2003, column X will be blank but it must be a valid valuefrom then on.

    13

  • 7/31/2019 20091029Session DW

    14/33

  • 7/31/2019 20091029Session DW

    15/33

    STG Technical Conferences 2009

    Source 1

    Personal Name Address Information

    Bob Christiansan 416 Columbus Ave #2, Boston, Massachusetts 02116Kate A. Roberts 4 New York Plaza Floor 23, Manhattan NY, 10036

    James Trenton 125-A Washington, Los Angeles, CA 90066

    Robert Christiansen Four sixteen Columbus Avenue APT2, Boston, Mass 02116

    Common data Challenges

    Introduction to Data Warehousing 2009 IBM Corporation

    Unlimited formats, structures & attributes

    Source 2

    Source 3

    Katherine Roberts Four NY Plaza, FL-23, New York New York, 10036Trenton, James 125 Washington Unit A, LA, California, 90066

    R.J. Christensen 416 Columbus Suite #2, Suffolk County 02116

    Mrs. K. Roberts 4 NY Plaza, LVL23, NYC 10036Mr & Mrs J.Trenton One-twenty-five Washington #A, Los Angeles Cnty 90066

    15

  • 7/31/2019 20091029Session DW

    16/33

    STG Technical Conferences 2009

    The Enterprise Data Warehouse Architecture

    Data Propagation

    Operational System(s)

    Extraction, Transformation and Loading

    l

    Data Staging Area

    Cleansed,TransformedData

    Introduction to Data Warehousing 2009 IBM Corporation

    SalesFinance

    DataMart

    DataMart

    DataMart

    Mfg

    Tacti

    calo

    peration

    decis

    ionsupp

    ort

    PC or Browser Web Visualization Products

    OLAPApplications

    16

  • 7/31/2019 20091029Session DW

    17/33

    STG Technical Conferences 2009

    Reasons you may choose a data warehouse

    Manage larger (Terabyte?) volumes of data

    Add data from sources other than production systems

    Ex: purchased demographic data

    Non IBM i databases

    Clean/Transform the data

    An ODS does not solve a lot of data issues

    Introduction to Data Warehousing 2009 IBM Corporation

    Tuning AspectsSeparate server/partition allows for different tuning knobs to be turned

    May be a different allocation of resources to manage this very different workload

    Separation of Powers

    Data Warehouse Team versus Operational Systems TeamSeparate Decisions

    OS or resource upgrades

    Single Version of the Truth

    17

  • 7/31/2019 20091029Session DW

    18/33

    STG Technical Conferences 2009

    E.T.L.

    Extract data from somewhere

    (may be MANY sources)

    Transform it somehow

    Introduction to Data Warehousing 2009 IBM Corporation

    Load it somewhere else(and load it FAST)

    18

  • 7/31/2019 20091029Session DW

    19/33

    STG Technical Conferences 2009

    CUSTNO CUSTNAME

    1001 John Smith1002 Mary Jones

    1003 Chris Anderson

    1004 David Perry

    Customer File - US

    CUSTNO CUSTNAME

    1001 Harry Potter1002 Jeremy Carr

    1003 Penny Hayes

    1004 Debbie Thornton

    Customer File - Canada

    Transformation Example: Surrogate Keys

    Introduction to Data Warehousing 2009 IBM Corporation

    Surrogate key is asequential number

    with no correlation toreplaced value(s)

    CUSTNUMBER CUSTNAME REGION OLDNUM

    1 John Smith US 1001

    2 Mary Jones US 1002

    3 Chris Anderson US 1003

    4 David Perry US 1004

    5 Harry Potter CANADA 1001

    6 Jeremy Carr CANADA 1002

    7 Penny Hayes CANADA 1003

    8 Debbie Thornton CANADA 1004

    Customer File - Data Warehouse

    PKSecondary Index

    19

  • 7/31/2019 20091029Session DW

    20/33

    STG Technical Conferences 2009

    Show me the date, weather, andquantity/revenue from sales ofumbrellas, raingear, and hats in ourFlorida stores in November, and

    Transformation Example: Star Schema

    Itemkey

    Itemkey

    Storekey

    Item_Dim keylist

    Store_Dim keylist

    DIMENSIONS

    FACT

    Introduction to Data Warehousing 2009 IBM Corporation

    or er y s ore, em, a e, en

    weather

    SalesQuantity

    Datekey

    Storekey

    Datekey

    Date_Dim keylistSelect store, item, date, weather, sum(sales), sum(quantity)

    from item_dim, store_dim, date_dim, fact_table

    where itemkey in (...keylist...) and storekey in (...keylist...)

    and datekey in (...keylist...)and itemkey=itemkey, storekey=storekey, datekey=datekey

    group by store, item, date, weather

    20

  • 7/31/2019 20091029Session DW

    21/33

    STG Technical Conferences 2009

    E.T.L.

    But.. There are two VITAL additional requirements

    Validate bad data in is bad data out

    Manage what do you do with bad data ? how do you administer ETL jobs?

    Introduction to Data Warehousing 2009 IBM Corporation

    Validate

    Transform

    Manage

    21

  • 7/31/2019 20091029Session DW

    22/33

    STG Technical Conferences 2009

    ETL Alternatives

    Do it yourself

    You almost always end up looking at tools later

    If you do, consider use of SQL!

    ETL lite: IBM i based Information Builders Data Migrator

    www.ibi.com

    Coglin Mills Rodin DB2 Web Query Edition

    Introduction to Data Warehousing 2009 IBM Corporation

    www.coglinmill.com

    Talend Open Source

    www.talend.com

    High End (AIX Partition on Power Systems)

    IBM InfoSphere Information Server

    22

  • 7/31/2019 20091029Session DW

    23/33

    STG Technical Conferences 2009

    DB2 for i DW Near Real Time Architecture

    DB2 for i

    .25 CPUs

    DWStaging

    AreaDB2 for i

    3.75 CPUs

    DB2DW

    ERP

    IBM i LPAR

    4 CPUs

    Remote Journaling

    ShippedLogs

    Data Mirror

    StagedData

    Or ODS

    ETL Tool

    Introduction to Data Warehousing 2009 IBM Corporation

    Remote Journaling during normal business processing hours Trickle Feed Staging Area/ODS

    Eliminate EXTRACTION impact on production systems

    No Charge Feature of IBM i

    Requires Program (e.g., DataMirror) to read data from journal receivers

    Can add SQL logic to remove unwanted fields, change datatypes,

    Virtualization Engine Technologies

    Optimize resources for supporting production and daytime data warehouse queries

    High speed data transfers over Virtual Ethernet

    Common Backup and other Shared I/O

    23

  • 7/31/2019 20091029Session DW

    24/33

    STG Technical Conferences 2009

    On Line Analytical Processing (OLAP)

    OLAP is INTERACTIVE and ITERATIVE

    Query is usually batch, list oriented result sets

    Accessing business data with numerous dimensions 'anything' by'anything' by'anything' analysis

    data can be easily analyzed from many different viewpoints

    data is modeled to the business

    Introduction to Data Warehousing 2009 IBM Corporation

    data is viewed across, down and through the various dimensions

    Helps answer business questions

    How are my different departments performing?

    Is this pattern the same every year?

    Can we look at the information another way?

    24

  • 7/31/2019 20091029Session DW

    25/33

    STG Technical Conferences 2009

    OLAP is uniquely suited to handle applicationssuch as:

    Budgeting Planning Forecasting Business Modeling

    Introduction to Data Warehousing 2009 IBM Corporation

    Financial Consolidation

    Sales & Performance Analysis Customer & Product Profitability

    25

  • 7/31/2019 20091029Session DW

    26/33

    STG Technical Conferences 2009

    What is the right OLAPTechnology?

    BI ToolBI Tool ApplicationApplication BI ToolBI Tool

    SQL 3

    SQL 2

    SQL 1

    Relational

    Data

    Data

    Load

    Introduction to Data Warehousing 2009 IBM Corporation

    MOLAP ROLAP

    # of users Many Few

    engine Cubing Engine Query Optimization

    architecture Depends DBMS Backend

    via complex loading complex SQL

    metadata in engine Meta Data Layer

    ExamplesESSBASE,

    InfoManagerDB2 Web Query

    (Olap option)

    speed of thought Will vary

    data strategySummary with drillthrough to detail

    Summary or Detail

    26

  • 7/31/2019 20091029Session DW

    27/33

    STG Technical Conferences 2009

    DB2 for i Enablers for Data Warehousing

    POWER6 Processors SQL Query Engine (SQE)

    Self Learning, Self Adapting

    Database Parallelism* Real time statistics Materialized Query Tables Star Join Query Rewrite

    60,000

    80,000

    100,000

    120,000

    140,000

    57% Improvement

    79% Improvement

    Introduction to Data Warehousing 2009 IBM Corporation

    Encoded Vector Indexing Remote Journaling (Trickle Feed) Single Level Storage Autonomic Indexes Index Advisor

    0

    20,000

    40,000

    2w i520 2w 520 4w i570 4w 570 8w i570

    POWER5+ POWER6 (V6R1) POWER6 (v5r4)

    *See detailed certified benchmark results athttp://www.sap.com/solutions/benchmark/bid_results.htm

    27

  • 7/31/2019 20091029Session DW

    28/33

    STG Technical Conferences 2009

    Indexing technology that can significantly improve performance, especially for star schema

    10% to 30% faster index builds

    1/3 to 1/16 the size

    1/2 the time for index scans

    1/3 the time for bit map generation

    Symbol Table

    Key Value Code First Last Count

    BI Acceleration with Encoded Vector Indexing

    Introduction to Data Warehousing 2009 IBM Corporation

    Vector1 13 12 28 2 17 38 2 26 33

    Row 1 Row 2 ....

    Row Row

    Arizona 1 1 80005 5000Arkansas 2 5 99760 7300

    ......

    Virginia 37 1222 30111 340

    Wyoming 38 7 83000 2760

    EVIs now part of Index Advice!!!

    28

  • 7/31/2019 20091029Session DW

    29/33

    STG Technical Conferences 2009

    IBM DB2 Web Query for System i Powered By Information Builders

    Base Program Product Includes:

    IBM i Web Reporting Server

    Several Web Based authoring tools

    RA, GA, Power Painter

    Query/400 (5722-QU1) Web Enable Query/400 Reports

    BASE PRODUCT OFFERED AS NOCHARGE UPGRADE FROM QU1

    Introduction to Data Warehousing 2009 IBM Corporation

    Does not include Software Maintenance

    Additional Features

    Run Time User Enablement

    Active Reports (Disconnected Analysis)

    On Line Analytical Processing Requires Meta Data provided with Developer

    Workbench

    Developer Workbench

    IT Tool for meta data http://www.ibm.com/systems/i/db2/webquery

    DB2 Web Query Report Broker

    Automated Report Execution andDistribution

    DB2 Web Query SDK

    Web Services to integrate reportingfunctions into applications/portals

    29

  • 7/31/2019 20091029Session DW

    30/33

    STG Technical Conferences 2009

    Automated Delivery Of Information

    On Scheduled Basis

    Through Admin GUI

    Daily, Weekly, Specific Days, exclude rules

    On Event Basis

    Some customization required

    Intelligent bursting

    Ex: Regional Sales Report

    DB2 Web Query Report Broker 5733-QU3

    Introduction to Data Warehousing 2009 IBM Corporation

    Additional output formats for batch reporting

    (HTML, PDF, Excel, Active HTML)

    Delivery Destinations

    E-mail

    Printer

    Save the reports for later viewing

    Notify Function

    Send notification when report is complete or fails

    Requires DB2 Web Query BASE Product to be installed

    30

  • 7/31/2019 20091029Session DW

    31/33

    STG Technical Conferences 2009

    New in 2009: Microsoft Integration

    Spreadsheet Client

    Improve the experience for Excel Users

    Excel Plug In

    Embed queries in Excel templates

    SQL Server Adapter

    Extend the reach of DB2 Web Query

    Introduction to Data Warehousing 2009 IBM Corporation

    databases with a single adapter

    31

  • 7/31/2019 20091029Session DW

    32/33

    STG Technical Conferences 2009

    Introduction to Data Warehousing 2009 IBM Corporation32

  • 7/31/2019 20091029Session DW

    33/33


Recommended