+ All Categories
Home > Documents > DWDM Single Ppt Notes

DWDM Single Ppt Notes

Date post: 11-Oct-2015
Category:
Upload: rahul-kale
View: 15 times
Download: 0 times
Share this document with a friend
Description:
The notes of DWDM
Popular Tags:

of 169

Transcript
  • DATA WAREHOUSING ANDDATA MINING

    S. SudarshanKrithi Ramamritham

    IIT Bombay

    [email protected]@cse.iitb.ernet.in

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 2Course Overviewz The course:

    what and how

    z 0. Introductionz I. Data Warehousingz II. Decision Support

    and OLAPz III. Data Miningz IV. Looking Ahead

    z Demos and Labs

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 30. Introduction

    z Data Warehousing, OLAP and data mining: what and why (now)?

    z Relation to OLTPz A case study

    z demos, labs

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 4Which are our lowest/highest margin

    customers ?Who are my customers

    and what products are they buying?

    Which customers are most likely to go to the competition ?

    What impact will new products/services

    have on revenue and margins?

    What product prom--otions have the biggest

    impact on revenue?

    What is the most effective distribution

    channel?

    A producer wants to know.

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 5Data, Data everywhereyet ... z I cant find the data I need

    y data is scattered over the network

    y many versions, subtle differences

    z I cant get the data I needy need an expert to get the data

    z I cant understand the data I foundy available data poorly documented

    z I cant use the data I foundy results are unexpectedy data needs to be transformed

    from one form to other

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 6What is a Data Warehouse?

    A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context.

    [Barry Devlin]

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 7What are the users saying...

    z Data should be integrated across the enterprise

    z Summary data has a real value to the organization

    z Historical data holds the key to understanding data over time

    z What-if capabilities are required

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 8What is Data Warehousing?

    A process of transforming data into information and making it available to users in a timely enough manner to make a difference

    [Forrester Research, April 1996]Data

    Information

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 9Evolution

    z 60s: Batch reportsy hard to find and analyze informationy inflexible and expensive, reprogram every new

    request

    z 70s: Terminal-based DSS and EIS (executive information systems)y still inflexible, not integrated with desktop tools

    z 80s: Desktop data access and analysis toolsy query tools, spreadsheets, GUIsy easier to use, but only access operational databases

    z 90s: Data warehousing with integrated OLAP engines and tools

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 10

    Warehouses are Very Large Databases

    35%

    30%

    25%

    20%

    15%

    10%

    5%

    0%5GB

    5-9GB10-19GB 50-99GB 250-499GB

    20-49GB 100-249GB 500GB-1TB

    InitialProjected 2Q96

    Source: META Group, Inc.

    Res

    pond

    ents

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 11

    Very Large Data Bases

    z Terabytes -- 10^12 bytes:

    z Petabytes -- 10^15 bytes:

    z Exabytes -- 10^18 bytes:

    z Zettabytes -- 10^21 bytes:

    z Zottabytes -- 10^24 bytes:

    Walmart -- 24 Terabytes

    Geographic Information Systems

    National Medical Records

    Weather images

    Intelligence Agency Videos

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 12

    Data Warehousing -- It is a process

    z Technique for assembling and managing data from various sources for the purpose of answering business questions. Thus making decisions that were not previous possible

    z A decision support database maintained separately from the organizations operational database

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 13

    Data Warehouse

    z A data warehouse is a y subject-orientedy integratedy time-varyingy non-volatile

    collection of data that is used primarily in organizational decision making.

    -- Bill Inmon, Building the Data Warehouse 1996

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 14

    Explorers, Farmers and Tourists

    Explorers: Seek out the unknown and previously unsuspected rewards hiding in the detailed data

    Farmers: Harvest informationfrom known access paths

    Tourists: Browse information harvested by farmers

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 15

    Data Warehouse Architecture

    Data Warehouse Engine

    Optimized Loader

    ExtractionCleansing

    AnalyzeQuery

    Metadata Repository

    RelationalDatabases

    LegacyData

    Purchased Data

    ERPSystems

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 16

    Data Warehouse for Decision Support & OLAP

    z Putting Information technology to help the knowledge worker make faster and better decisionsy Which of my customers are most likely to go

    to the competition?

    y What product promotions have the biggest impact on revenue?

    y How did the share price of software companies correlate with profits over last 10 years?

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 17

    Decision Support

    z Used to manage and control businessz Data is historical or point-in-timez Optimized for inquiry rather than updatez Use of the system is loosely defined and

    can be ad-hoc

    z Used by managers and end-users to understand the business and make judgements

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 18

    Data Mining works with Warehouse Data

    z Data Warehousing provides the Enterprise with a memory

    z Data Mining provides the Enterprise with intelligence

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 19

    We want to know ...z Given a database of 100,000 names, which persons are the

    least likely to default on their credit cards?

    z Which types of transactions are likely to be fraudulent given the demographics and transactional history of a particular customer?

    z If I raise the price of my product by Rs. 2, what is the effect on my ROI?

    z If I offer only 2,500 airline miles as an incentive to purchase rather than 5,000, how many lost responses will result?

    z If I emphasize ease-of-use of the product as opposed to its technical capabilities, what will be the net effect on my revenues?

    z Which of my customers are likely to be the most loyal?

    Data Mining helps extract such information

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 20

    Application Areas

    Industry ApplicationFinance Credit Card AnalysisInsurance Claims, Fraud Analysis

    Telecommunication Call record analysisTransport Logistics managementConsumer goods promotion analysisData Service providersValue added dataUtilities Power usage analysis

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 21

    Data Mining in Use

    z The US Government uses Data Mining to track fraud

    z A Supermarket becomes an information broker

    z Basketball teams use it to track game strategy

    z Cross Sellingz Warranty Claims Routingz Holding on to Good Customersz Weeding out Bad Customers

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 22

    What makes data mining possible?

    z Advances in the following areas are making data mining deployable:y data warehousing y better and more data (i.e., operational,

    behavioral, and demographic) y the emergence of easily deployed data

    mining tools and y the advent of new data mining

    techniques. -- Gartner Group

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 23

    Why Separate Data Warehouse?

    z Performancey Op dbs designed & tuned for known txs & workloads.y Complex OLAP queries would degrade perf. for op txs.y Special data organization, access & implementation

    methods needed for multidimensional views & queries.

    z Functiony Missing data: Decision support requires historical data, which

    op dbs do not typically maintain.y Data consolidation: Decision support requires consolidation

    (aggregation, summarization) of data from many heterogeneous sources: op dbs, external sources.

    y Data quality: Different sources typically use inconsistent data representations, codes, and formats which have to be reconciled.

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 24

    What are Operational Systems?

    z They are OLTP systemsz Run mission critical

    applicationsz Need to work with

    stringent performance requirements for routine tasks

    z Used to run a business!

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 25

    RDBMS used for OLTP

    z Database Systems have been used traditionally for OLTPy clerical data processing tasksy detailed, up to date datay structured repetitive tasksy read/update a few recordsy isolation, recovery and integrity are

    critical

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 26

    Operational Systems

    z Run the business in real timez Based on up-to-the-second dataz Optimized to handle large

    numbers of simple read/write transactions

    z Optimized for fast response to predefined transactions

    z Used by people who deal with customers, products -- clerks, salespeople etc.

    z They are increasingly used by customers

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 27

    Examples of Operational Data

    Data IndustryUsage Technology Volumes

    CustomerFile

    All TrackCustomerDetails

    Legacy application, flatfiles, main frames

    Small-medium

    AccountBalance

    Finance Controlaccountactivities

    Legacy applications,hierarchical databases,mainframe

    Large

    Point-of-Sale data

    Retail Generatebills, managestock

    ERP, Client/Server,relational databases

    Very Large

    CallRecord

    Telecomm-unications

    Billing Legacy application,hierarchical database,mainframe

    Very Large

    ProductionRecord

    Manufact-uring

    ControlProduction

    ERP,relational databases,AS/400

    Medium

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • So, whats different?

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 29

    Application-Orientation vs. Subject-Orientation

    Application-Orientation

    Operational Database

    LoansCredit Card

    Trust

    Savings

    Subject-Orientation

    DataWarehouse

    Customer

    VendorProduct

    Activity

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 30

    OLTP vs. Data Warehouse

    z OLTP systems are tuned for known transactions and workloads while workload is not known a priori in a data warehouse

    z Special data organization, access methods and implementation methods are needed to support data warehouse queries (typically multidimensional queries)y e.g., average amount spent on phone calls

    between 9AM-5PM in Pune during the month of December

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 31

    OLTP vs Data Warehouse

    z OLTPy Application

    Orientedy Used to run

    businessy Detailed datay Current up to datey Isolated Datay Repetitive accessy Clerical User

    z Warehouse (DSS)y Subject Orientedy Used to analyze

    businessy Summarized and

    refinedy Snapshot datay Integrated Datay Ad-hoc accessy Knowledge User

    (Manager)

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 32

    OLTP vs Data Warehouse

    z OLTPy Performance Sensitivey Few Records accessed at

    a time (tens)

    y Read/Update Access

    y No data redundancyy Database Size 100MB

    -100 GB

    z Data Warehousey Performance relaxedy Large volumes accessed

    at a time(millions)y Mostly Read (Batch

    Update)y Redundancy presenty Database Size

    100 GB - few terabytes

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 33

    OLTP vs Data Warehouse

    z OLTPy Transaction

    throughput is the performance metric

    y Thousands of usersy Managed in entirety

    z Data Warehousey Query throughput is

    the performance metric

    y Hundreds of usersy Managed by

    subsets

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 34

    To summarize ...

    z OLTP Systems are used to run a business

    z The Data Warehouse helps to optimize the business

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 35

    Why Now?

    z Data is being producedz ERP provides clean dataz The computing power is availablez The computing power is affordablez The competitive pressures are strongz Commercial products are available

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 36

    Myths surrounding OLAP Servers and Data Marts

    z Data marts and OLAP servers are departmental solutions supporting a handful of users

    z Million dollar massively parallel hardware is needed to deliver fast time for complex queries

    z OLAP servers require massive and unwieldy indices

    z Complex OLAP queries clog the network with data

    z Data warehouses must be at least 100 GB to be effective

    Source -- Arbor Software Home Page

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 37

    Wal*Mart Case Study

    z Founded by Sam Waltonz One the largest Super Market Chains

    in the US

    z Wal*Mart: 2000+ Retail Stores z SAM's Clubs 100+Wholesalers Stores

    x This case study is from Felipe Carinos (NCR Teradata) presentation made at Stanford Database Seminar

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 38

    Old Retail Paradigm

    z Wal*Marty Inventory

    Management y Merchandise Accounts

    Payable y Purchasing y Supplier Promotions:

    National, Region, Store Level

    z Suppliers y Accept Orders y Promote Products y Provide special

    Incentives y Monitor and Track

    The Incentives y Bill and Collect

    Receivables y Estimate Retailer

    Demands

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 39

    New (Just-In-Time) Retail Paradigm

    z No more dealsz Shelf-Pass Through (POS Application)

    y One Unit Pricex Suppliers paid once a week on ACTUAL items sold

    y Wal*Mart Managerx Daily Inventory Restockx Suppliers (sometimes SameDay) ship to Wal*Mart

    z Warehouse-Pass Throughy Stock some Large Items

    x Delivery may come from suppliery Distribution Center

    x Suppliers merchandise unloaded directly onto Wal*Mart Trucks

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 40

    Wal*Mart System

    z NCR 5100M 96 Nodes;

    z Number of Rows:z Historical Data:z New Daily Volume:

    z Number of Users:z Number of Queries:

    24 TB Raw Disk; 700 - 1000 Pentium CPUs

    > 5 Billions65 weeks (5 Quarters)Current Apps: 75 MillionNew Apps: 100 Million +Thousands60,000 per week

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 41

    Course Overview

    z 0. Introductionz I. Data Warehousingz II. Decision Support

    and OLAPz III. Data Miningz IV. Looking Ahead

    z Demos and Labs

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 42

    I. Data Warehouses:Architecture, Design & Construction

    z DW Architecturez Loading, refreshingz Structuring/Modelingz DWs and Data Martsz Query Processing

    z demos, labs

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 43

    Data Warehouse Architecture

    Data Warehouse Engine

    Optimized Loader

    ExtractionCleansing

    AnalyzeQuery

    Metadata Repository

    RelationalDatabases

    LegacyData

    Purchased Data

    ERPSystems

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 44

    Components of the Warehouse

    z Data Extraction and Loadingz The Warehouse z Analyze and Query -- OLAP Toolsz Metadata

    z Data Mining tools

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • Loading the Warehouse

    Cleaning the data before it is loaded

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 46

    Source Data

    z Typically host based, legacy applicationsy Customized applications, COBOL,

    3GL, 4GLz Point of Contact Devices

    y POS, ATM, Call switchesz External Sources

    y Nielsens, Acxiom, CMIE, Vendors, Partners

    Sequential Legacy Relational ExternalOperational/Source Data

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 47

    Data Quality - The Reality

    z Tempting to think creating a data warehouse is simply extracting operational data and entering into a data warehouse

    z Nothing could be farther from the truth

    z Warehouse data comes from disparate questionable sources

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 48

    Data Quality - The Reality

    z Legacy systems no longer documentedz Outside sources with questionable quality

    proceduresz Production systems with no built in

    integrity checks and no integrationy Operational systems are usually designed to

    solve a specific business problem and are rarely developed to a a corporate plan

    x And get it done quickly, we do not have time to worry about corporate standards...

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 49

    Data Integration Across Sources

    Trust Credit cardSavings Loans

    Same data different name

    Different data Same name

    Data found here nowhere else

    Different keyssame data

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 50

    Data Transformation Exampleen

    codi

    ngun

    itfie

    ld

    appl A - balanceappl B - balappl C - currbalappl D - balcurr

    appl A - pipeline - cmappl B - pipeline - inappl C - pipeline - feetappl D - pipeline - yds

    appl A - m,fappl B - 1,0appl C - x,yappl D - male, female

    Data Warehouse

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 51

    Data Integrity Problems

    z Same person, different spellingsy Agarwal, Agrawal, Aggarwal etc...

    z Multiple ways to denote company namey Persistent Systems, PSPL, Persistent Pvt.

    LTD.z Use of different names

    y mumbai, bombayz Different account numbers generated by

    different applications for the same customerz Required fields left blankz Invalid product codes collected at point of sale

    y manual entry leads to mistakesy in case of a problem use 9999999

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 52

    Data Transformation Terms

    z Extractingz Conditioningz Scrubbingz Mergingz Householding

    z Enrichmentz Scoringz Loadingz Validatingz Delta Updating

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 53

    Data Transformation Terms

    z Extractingy Capture of data from operational source in

    as is status

    y Sources for data generally in legacy mainframes in VSAM, IMS, IDMS, DB2; more data today in relational databases on Unix

    z Conditioningy The conversion of data types from the source

    to the target data store (warehouse) -- always a relational database

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 54

    Data Transformation Terms

    z Householdingy Identifying all members of a household

    (living at the same address)y Ensures only one mail is sent to a

    householdy Can result in substantial savings: 1 lakh

    catalogues at Rs. 50 each costs Rs. 50 lakhs. A 2% savings would save Rs. 1 lakh.

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 55

    Data Transformation Terms

    z Enrichmenty Bring data from external sources to

    augment/enrich operational data. Data sources include Dunn and Bradstreet, A. C. Nielsen, CMIE, IMRA etc...

    z Scoring y computation of a probability of an

    event. e.g..., chance that a customer will defect to AT&T from MCI, chance that a customer is likely to buy a new product

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 56

    Loads

    z After extracting, scrubbing, cleaning, validating etc. need to load the data into the warehouse

    z Issuesy huge volumes of data to be loadedy small time window available when warehouse can be

    taken off line (usually nights)y when to build index and summary tablesy allow system administrators to monitor, cancel, resume,

    change load ratesy Recover gracefully -- restart after failure from where

    you were and without loss of data integrity

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 57

    Load Techniques

    z Use SQL to append or insert new datay record at a time interfacey will lead to random disk I/Os

    z Use batch load utility

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 58

    Load Taxonomy

    z Incremental versus Full loadsz Online versus Offline loads

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 59

    Refresh

    z Propagate updates on source data to the warehouse

    z Issues:y when to refreshy how to refresh -- refresh techniques

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 60

    When to Refresh?

    z periodically (e.g., every night, every week) or after significant events

    z on every update: not warranted unless warehouse data require current data (up to the minute stock quotes)

    z refresh policy set by administrator based on user needs and traffic

    z possibly different policies for different sources

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 61

    Refresh Techniques

    z Full Extract from base tablesy read entire source table: too expensivey maybe the only choice for legacy

    systems

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 62

    How To Detect Changes

    z Create a snapshot log table to record ids of updated rows of source data and timestamp

    z Detect changes by:y Defining after row triggers to update

    snapshot log when source table changesy Using regular transaction log to detect

    changes to source data

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 63

    Data Extraction and Cleansing

    z Extract data from existing operational and legacy data

    z Issues:y Sources of data for the warehousey Data quality at the sourcesy Merging different data sourcesy Data Transformationy How to propagate updates (on the sources) to

    the warehousey Terabytes of data to be loaded

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 64

    Scrubbing Data

    z Sophisticated transformation tools.

    z Used for cleaning the quality of data

    z Clean data is vital for the success of the warehouse

    z Exampley Seshadri, Sheshadri,

    Sesadri, Seshadri S., Srinivasan Seshadri, etc. are the same person

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 65

    Scrubbing Tools

    z Apertus -- Enterprise/Integrator z Vality -- IPEz Postal Soft

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • Structuring/Modeling Issues

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 67

    Data -- Heart of the Data Warehouse

    z Heart of the data warehouse is the data itself!

    z Single version of the truthz Corporate memoryz Data is organized in a way that

    represents business -- subject orientation

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 68

    Data Warehouse Structure

    z Subject Orientation -- customer, product, policy, account etc... A subject may be implemented as a set of related tables. E.g., customer may be five tables

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 69

    Data Warehouse Structure

    y base customer (1985-87)x custid, from date, to date, name, phone, dob

    y base customer (1988-90)x custid, from date, to date, name, credit rating,

    employer

    y customer activity (1986-89) -- monthly summary

    y customer activity detail (1987-89)x custid, activity date, amount, clerk id, order no

    y customer activity detail (1990-91)x custid, activity date, amount, line item no, order no

    Time is Time is part of part of key of key of each tableeach table

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 70

    Data Granularity in Warehouse

    z Summarized data storedy reduce storage costsy reduce cpu usagey increases performance since smaller

    number of records to be processedy design around traditional high level

    reporting needsy tradeoff with volume of data to be

    stored and detailed usage of data

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 71

    Granularity in Warehouse

    z Can not answer some questions with summarized datay Did Anand call Seshadri last month? Not

    possible to answer if total duration of calls by Anand over a month is only maintained and individual call details are not.

    z Detailed data too voluminous

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 72

    Granularity in Warehouse

    z Tradeoff is to have dual level of granularityy Store summary data on disks

    x 95% of DSS processing done against this data

    y Store detail on tapesx 5% of DSS processing against this data

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 73

    Vertical Partitioning

    Frequentlyaccessed Rarely

    accessed

    Smaller tableand so less I/O

    Acct.No Name BalanceDate Opened

    InterestRate Address

    Acct.No Balance

    Acct.No Name Date Opened

    InterestRate Address

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 74

    Derived Data

    z Introduction of derived (calculated data) may often help

    z Have seen this in the context of dual levels of granularity

    z Can keep auxiliary views and indexes to speed up query processing

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 75

    Schema Design

    z Database organizationy must look like businessy must be recognizable by business usery approachable by business usery Must be simple

    z Schema Typesy Star Schemay Fact Constellation Schemay Snowflake schema

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 76

    Dimension Tables

    z Dimension tablesy Define business in terms already

    familiar to usersy Wide rows with lots of descriptive texty Small tables (about a million rows) y Joined to fact table by a foreign keyy heavily indexedy typical dimensions

    x time periods, geographic region (markets, cities), products, customers, salesperson, etc.

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 77

    Fact Table

    z Central tabley mostly raw numeric itemsy narrow rows, a few columns at mosty large number of rows (millions to a

    billion)y Access via dimensions

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 78

    Star Schema

    z A single fact table and for each dimension one dimension table

    z Does not capture hierarchies directly

    T ime

    prod

    cust

    city

    fact

    date, custno, prodno, cityname, ...

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 79

    Snowflake schema

    z Represent dimensional hierarchy directly by normalizing tables.

    z Easy to maintain and saves storageT ime

    prod

    cust

    city

    fact

    date, custno, prodno, cityname, ...

    region

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 80

    Fact Constellation

    z Fact Constellationy Multiple fact tables that share many

    dimension tablesy Booking and Checkout may share many

    dimension tables in the hotel industry

    Hotels

    Travel Agents

    Promotion

    Room TypeCustomer

    Booking

    Checkout

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 81

    De-normalization

    z Normalization in a data warehouse may lead to lots of small tables

    z Can lead to excessive I/Os since many tables have to be accessed

    z De-normalization is the answer especially since updates are rare

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 82

    Creating Arrays

    z Many times each occurrence of a sequence of data is in a different physical location

    z Beneficial to collect all occurrences together and store as an array in a single row

    z Makes sense only if there are a stable number of occurrences which are accessed together

    z In a data warehouse, such situations arise naturally due to time based orientationy can create an array by month

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 83

    Selective Redundancy

    z Description of an item can be stored redundantly with order table -- most often item description is also accessed with order table

    z Updates have to be careful

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 84

    Partitioning

    z Breaking data into several physical units that can be handled separately

    z Not a question of whether to do it in data warehouses but how to do it

    z Granularity and partitioning are key to effective implementation of a warehouse

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 85

    Why Partition?

    z Flexibility in managing dataz Smaller physical units allow

    y easy restructuringy free indexingy sequential scans if neededy easy reorganizationy easy recoveryy easy monitoring

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 86

    Criterion for Partitioning

    z Typically partitioned by y datey line of businessy geographyy organizational unity any combination of above

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 87

    Where to Partition?

    z Application level or DBMS levelz Makes sense to partition at

    application levely Allows different definition for each year

    x Important since warehouse spans many years and as business evolves definition changes

    y Allows data to be moved between processing complexes easily

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • Data Warehouse vs. Data Marts

    What comes first

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 89

    From the Data Warehouse to Data Marts

    DepartmentallyStructured

    IndividuallyStructured

    Data WarehouseOrganizationallyStructured

    Less

    More

    HistoryNormalizedDetailed

    Data

    Information

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 90

    Data Warehouse and Data MartsOLAPData MartLightly summarizedDepartmentally structured

    Organizationally structuredAtomicDetailed Data Warehouse Data

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 91

    Characteristics of the Departmental Data Mart

    z OLAPz Smallz Flexiblez Customized by

    Departmentz Source is

    departmentally structured data warehouse

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 92

    Techniques for Creating Departmental Data Mart

    z OLAPz Subsetz Summarizedz Supersetz Indexedz Arrayed

    Sales Mktg.Finance

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 93

    Data Mart Centric

    Data Marts

    Data Sources

    Data Warehouse

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 94

    Problems with Data Mart Centric Solution

    If you end up creating multiple warehouses, integrating them is a problem

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 95

    True Warehouse

    Data Marts

    Data Sources

    Data Warehouse

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 96

    Query Processing

    z Indexing

    z Pre computed views/aggregates

    z SQL extensions

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 97

    Indexing Techniques

    z Exploiting indexes to reduce scanning of data is of crucial importance

    z Bitmap Indexesz Join Indexesz Other Issues

    y Text indexingy Parallelizing and sequencing of index

    builds and incremental updates

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 98

    Indexing Techniques

    z Bitmap index:y A collection of bitmaps -- one for each

    distinct value of the columny Each bitmap has N bits where N is the

    number of rows in the tabley A bit corresponding to a value v for a

    row r is set if and only if r has the value for the indexed attribute

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 99

    BitMap Indexes

    z An alternative representation of RID-listz Specially advantageous for low-cardinality

    domainsz Represent each row of a table by a bit

    and the table as a bit vectorz There is a distinct bit vector Bv for each

    value v for the domainz Example: the attribute sex has values M

    and F. A table of 100 million people needs 2 lists of 100 million bits

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 100Customer Query : select * from customer where

    gender = F and vote = Y

    0

    0

    0

    0

    0

    0

    0

    0

    0

    1

    1

    1

    1

    1

    1

    1

    1

    1

    Bitmap Index

    M

    F

    F

    F

    F

    M

    Y

    Y

    Y

    N

    N

    N

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 101

    Bit Map Index

    Cust Region RatingC1 N HC2 S MC3 W LC4 W HC5 S LC6 W LC7 N H

    Base TableBase TableRow ID N S E W

    1 1 0 0 02 0 1 0 03 0 0 0 14 0 0 0 15 0 1 0 06 0 0 0 17 1 0 0 0

    Row ID H M L1 1 0 02 0 1 03 0 0 04 0 0 05 0 1 06 0 0 07 1 0 0

    Rating IndexRating IndexRegion IndexRegion Index

    Customers whereCustomers where Region = WRegion = W Rating = MRating = MAndAnd

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 102

    BitMap Indexes

    z Comparison, join and aggregation operations are reduced to bit arithmetic with dramatic improvement in processing time

    z Significant reduction in space and I/O (30:1)z Adapted for higher cardinality domains as well.z Compression (e.g., run-length encoding)

    exploitedz Products that support bitmaps: Model 204,

    TargetIndex (Redbrick), IQ (Sybase), Oracle 7.3

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 103

    Join Indexes

    z Pre-computed joinsz A join index between a fact table and a

    dimension table correlates a dimension tuple with the fact tuples that have the same value on the common dimensional attributey e.g., a join index on city dimension of calls

    fact tabley correlates for each city the calls (in the calls

    table) from that city

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 104

    Join Indexes

    z Join indexes can also span multiple dimension tablesy e.g., a join index on city and time

    dimension of calls fact table

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 105

    Star Join Processing

    z Use join indexes to join dimension and fact table

    CallsC+T

    C+T+L

    C+T+L+P

    Time

    Loca-tion

    Plan

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 106

    Optimized Star Join Processing

    Time

    Loca-tion

    Plan

    Calls

    Virtual Cross Productof T, L and P

    Apply Selections

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 107

    Bitmapped Join Processing

    AND

    Time

    Loca-tion

    Plan

    Calls

    Calls

    Calls

    Bitmaps101

    001

    110

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 108

    Intelligent Scan

    z Piggyback multiple scans of a relation (Redbrick)y piggybacking also done if second scan

    starts a little while after the first scan

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 109

    Parallel Query Processing

    z Three forms of parallelismy Independenty Pipelinedy Partitioned and partition and replicate

    z Deterrents to parallelismy startup y communication

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 110

    Parallel Query Processing

    z Partitioned Datay Parallel scansy Yields I/O parallelism

    z Parallel algorithms for relational operatorsy Joins, Aggregates, Sort

    z Parallel Utilitiesy Load, Archive, Update, Parse, Checkpoint,

    Recovery z Parallel Query Optimization

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 111

    Pre-computed Aggregates

    z Keep aggregated data for efficiency (pre-computed queries)

    z Questionsy Which aggregates to compute?y How to update aggregates?y How to use pre-computed aggregates

    in queries?

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 112

    Pre-computed Aggregates

    z Aggregated table can be maintained by they warehouse servery middle tier y client applications

    z Pre-computed aggregates -- special case of materialized views -- same questions and issues remain

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 113

    SQL Extensions

    z Extended family of aggregate functionsy rank (top 10 customers)y percentile (top 30% of customers)y median, modey Object Relational Systems allow

    addition of new aggregate functions

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 114

    SQL Extensions

    z Reporting featuresy running total, cumulative totals

    z Cube operatory group by on all subsets of a set of

    attributes (month,city)y redundant scan and sorting of data can

    be avoided

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 115

    Red Brick has Extended set of Aggregates

    z Select month, dollars, cume(dollars) as run_dollars, weight, cume(weight) as run_weightsfrom sales, market, product, period twhere year = 1993and product like Columbian%and city like San Fr%order by t.perkey

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 116

    RISQL (Red Brick Systems) Extensions

    z Aggregatesy CUMEy MOVINGAVGy MOVINGSUMy RANKy TERTILEy RATIOTOREPORT

    z Calculating Row Subtotalsy BREAK BY

    z Sophisticated Date Time Supporty DATEDIFF

    z Using SubQueries in calculations

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 117

    Using SubQueries in Calculationsselect product, dollars as jun97_sales, (select sum(s1.dollars)from market mi, product pi, period, ti, sales siwhere pi.product = product.productand ti.year = period.yearand mi.city = market.city) as total97_sales,100 * dollars/(select sum(s1.dollars)from market mi, product pi, period, ti, sales siwhere pi.product = product.productand ti.year = period.yearand mi.city = market.city) as percent_of_yrfrom market, product, period, saleswhere year = 1997and month = June and city like Ahmed%order by product;

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 118

    Course Overviewz The course:

    what and how

    z 0. Introductionz I. Data Warehousingz II. Decision Support

    and OLAPz III. Data Miningz IV. Looking Ahead

    z Demos and Labs

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • II. On-Line Analytical Processing (OLAP)

    Making Decision Support Possible

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 120

    Limitations of SQL

    A Freshman in Business needs a Ph.D. in SQL

    -- Ralph Kimball

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 121

    Typical OLAP Queries

    z Write a multi-table join to compare sales for each product line YTD this year vs. last year.

    z Repeat the above process to find the top 5 product contributors to margin.

    z Repeat the above process to find the sales of a product line to new vs. existing customers.

    z Repeat the above process to find the customers that have had negative sales growth.

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 122

    * Reference: http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html* Reference: http://www.arborsoft.com/essbase/wht_ppr/coddTOC.html

    What Is OLAP?

    z Online Analytical Processing - coined by EF Codd in 1994 paper contracted by Arbor Software*

    z Generally synonymous with earlier terms such as Decisions Support, Business Intelligence, Executive Information System

    z OLAP = Multidimensional Databasez MOLAP: Multidimensional OLAP (Arbor Essbase,

    Oracle Express)z ROLAP: Relational OLAP (Informix MetaCube,

    Microstrategy DSS Agent)

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 123

    The OLAP Market z Rapid growth in the enterprise market

    y 1995: $700 Milliony 1997: $2.1 Billion

    z Significant consolidation activity among major DBMS vendorsy 10/94: Sybase acquires ExpressWayy 7/95: Oracle acquires Express y 11/95: Informix acquires Metacubey 1/97: Arbor partners up with IBMy 10/96: Microsoft acquires Panorama

    z Result: OLAP shifted from small vertical niche to mainstream DBMS category

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 124

    Strengths of OLAP

    z It is a powerful visualization paradigm

    z It provides fast, interactive response times

    z It is good for analyzing time series

    z It can be useful to find some clusters and outliers

    z Many vendors offer OLAP tools

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 125

    Nigel Pendse, Richard Creath - The OLAP ReportNigel Pendse, Richard Creath - The OLAP Report

    OLAP Is FASMI

    z Fastz Analysisz Sharedz Multidimensionalz Information

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 126MonthMonth

    1 1 22 3 3 4 4 776 6 5 5

    Prod

    uct

    Prod

    uct

    Toothpaste Toothpaste

    JuiceJuiceColaColaMilk Milk

    CreamCream

    Soap Soap

    Regio

    n

    Regio

    n

    WWS S

    N N

    Dimensions: Dimensions: Product, Region, TimeProduct, Region, TimeHierarchical summarization pathsHierarchical summarization paths

    Product Product Region Region TimeTimeIndustry Country YearIndustry Country Year

    Category Region Quarter Category Region Quarter

    Product City Month WeekProduct City Month Week

    Office DayOffice Day

    Multi-dimensional Data

    z HeyI sold $100M worth of goods

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 127

    Data Cube Lattice

    z Cube latticey ABC

    AB AC BC A B C none

    z Can materialize some groupbys, compute others on demand

    z Question: which groupbys to materialze?z Question: what indices to createz Question: how to organize data (chunks, etc)

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 128

    Visualizing Neighbors is simpler

    1 2 3 4 5 6 7 8AprMayJunJulAugSepOctNovDecJanFebMar

    Month Store SalesApr 1Apr 2Apr 3Apr 4Apr 5Apr 6Apr 7Apr 8May 1May 2May 3May 4May 5May 6May 7May 8Jun 1Jun 2

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 129

    A Visual Operation: Pivot (Rotate)

    1010

    4747

    30301212

    JuiceJuice

    ColaCola

    Milk Milk

    CreamCream

    NYNYLALA

    SFSF

    3/1 3/2 3/3 3/43/1 3/2 3/3 3/4DateDate

    Month

    Month

    Regi

    onRe

    gion

    ProductProduct

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 130

    Slicing and Dicing

    Product

    Sales Channel

    Regio

    ns

    Retail Direct Special

    Household

    Telecomm

    Video

    Audio IndiaFar East

    Europe

    The Telecomm Slice

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 131

    Roll-up and Drill Down

    z Sales Channelz Regionz Countryz State z Location Addressz Sales

    Representative

    Roll

    Up

    Higher Level ofAggregation

    Low-levelDetails

    Drill-D

    own

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 132

    Nature of OLAP Analysisz Aggregation -- (total sales,

    percent-to-total)z Comparison -- Budget vs.

    Expensesz Ranking -- Top 10, quartile

    analysisz Access to detailed and

    aggregate dataz Complex criteria

    specificationz Visualization

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 133

    Organizationally Structured Data

    z Different Departments look at the same detailed data in different ways. Without the detailed, organizationally structured data as a foundation, there is no reconcilability of data

    marketing

    manufacturing

    sales

    finance

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 134

    Multidimensional Spreadsheetsz Analysts need

    spreadsheets that supporty pivot tables (cross-tabs)y drill-down and roll-upy slice and dicey sorty selectionsy derived attributes

    z Popular in retail domain

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 135

    OLAP - Data Cube

    z Idea: analysts need to group data in many different waysy eg. Sales(region, product, prodtype, prodstyle,

    date, saleamount)y saleamount is a measure attribute, rest are

    dimension attributesy groupby every subset of the other attributes

    x materialize (precompute and store) groupbys to give online response

    y Also: hierarchies on attributes: date -> weekday, date -> month -> quarter -> year

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 136

    SQL Extensionsz Front-end tools require

    y Extended Family of Aggregate Functionsx rank, median, mode

    y Reporting Featuresx running totals, cumulative totals

    y Results of multiple group byx total sales by month and total sales by

    product

    y Data Cube

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 137

    Relational OLAP: 3 Tier DSSData Warehouse ROLAP Engine Decision Support Client

    Database Layer Application Logic Layer Presentation Layer

    Store atomic data in industry standard RDBMS.

    Generate SQL execution plans in the ROLAP engine to obtain OLAP functionality.

    Obtain multi-dimensional reports from the DSS Client.

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 138

    MD-OLAP: 2 Tier DSSMDDB Engine MDDB Engine Decision Support Client

    Database Layer Application Logic Layer Presentation Layer

    Store atomic data in a proprietary data structure (MDDB), pre-calculate as many outcomes as possible, obtain OLAP functionality via proprietary algorithms running against this data.

    Obtain multi-dimensional reports from the DSS Client.

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 139

    16 81 256 10244096

    16384

    65536

    0

    10000

    20000

    30000

    40000

    50000

    60000

    70000

    2 3 4 5 6 7 8

    Data Explosion SyndromeData Explosion Syndrome

    Number of DimensionsNumber of Dimensions

    Num

    ber o

    f Agg

    rega

    tions

    Num

    ber o

    f Agg

    rega

    tions

    (4 levels in each dimension)(4 levels in each dimension)

    Typical OLAP Problems Data Explosion

    Microsoft TechEd98

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 140

    Metadata Repository

    z Administrative metadatay source databases and their contentsy gateway descriptionsy warehouse schema, view & derived data definitionsy dimensions, hierarchiesy pre-defined queries and reportsy data mart locations and contentsy data partitionsy data extraction, cleansing, transformation rules,

    defaultsy data refresh and purging rulesy user profiles, user groupsy security: user authorization, access control

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 141

    Metdata Repository .. 2

    z Business datay business terms and definitionsy ownership of datay charging policies

    z operational metadatay data lineage: history of migrated data and

    sequence of transformations appliedy currency of data: active, archived, purgedy monitoring information: warehouse usage

    statistics, error reports, audit trails.

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • Recipe for a Successful Warehouse

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 143

    For a Successful Warehouse

    z From day one establish that warehousing is a joint user/builder project

    z Establish that maintaining data quality will be an ONGOING joint user/builder responsibility

    z Train the users one step at a timez Consider doing a high level corporate data

    model in no more than three weeks

    From Larry Greenfield, http://pwp.starnetinc.com/larryg/index.html

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 144

    For a Successful Warehouse

    z Look closely at the data extracting, cleaning, and loading tools

    z Implement a user accessible automated directory to information stored in the warehouse

    z Determine a plan to test the integrity of the data in the warehouse

    z From the start get warehouse users in the habit of 'testing' complex queries

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 145

    For a Successful Warehouse

    z Coordinate system roll-out with network administration personnel

    z When in a bind, ask others who have done the same thing for advice

    z Be on the lookout for small, but strategic, projects

    z Market and sell your data warehousing systems

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 146

    Data Warehouse Pitfalls

    z You are going to spend much time extracting, cleaning, and loading data

    z Despite best efforts at project management, data warehousing project scope will increase

    z You are going to find problems with systems feeding the data warehouse

    z You will find the need to store data not being captured by any existing system

    z You will need to validate data not being validated by transaction processing systems

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 147

    Data Warehouse Pitfalls

    z Some transaction processing systems feeding the warehousing system will not contain detail

    z Many warehouse end users will be trained and never or seldom apply their training

    z After end users receive query and report tools, requests for IS written reports may increase

    z Your warehouse users will develop conflicting business rules

    z Large scale data warehousing can become an exercise in data homogenizing

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 148

    Data Warehouse Pitfalls

    z 'Overhead' can eat up great amounts of disk space

    z The time it takes to load the warehouse will expand to the amount of the time in the available window... and then some

    z Assigning security cannot be done with a transaction processing system mindset

    z You are building a HIGH maintenance systemz You will fail if you concentrate on resource

    optimization to the neglect of project, data, and customer management issues and an understanding of what adds value to the customer

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 149

    DW and OLAP Research Issuesz Data cleaning

    y focus on data inconsistencies, not schema differencesy data mining techniques

    z Physical Designy design of summary tables, partitions, indexesy tradeoffs in use of different indexes

    z Query processingy selecting appropriate summary tablesy dynamic optimization with feedbacky acid test for query optimization: cost estimation, use of

    transformations, search strategiesy partitioning query processing between OLAP server and

    backend server.

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 150

    DW and OLAP Research Issues .. 2

    z Warehouse Managementy detecting runaway queriesy resource managementy incremental refresh techniquesy computing summary tables during loady failure recovery during load and refreshy process management: scheduling queries,

    load and refreshy Query processing, cachingy use of workflow technology for process

    management

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • Products, References, Useful Links

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 152

    Reporting Toolsz Andyne Computing -- GQL z Brio -- BrioQuery z Business Objects -- Business Objects z Cognos -- Impromptu z Information Builders Inc. -- Focus for Windows z Oracle -- Discoverer2000 z Platinum Technology -- SQL*Assist, ProReports z PowerSoft -- InfoMaker z SAS Institute -- SAS/Assist z Software AG -- Esperant z Sterling Software -- VISION:Data

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 153

    OLAP and Executive Information Systemsz Andyne Computing -- Pablo z Arbor Software -- Essbase z Cognos -- PowerPlay z Comshare -- Commander

    OLAP

    z Holistic Systems -- Holos z Information Advantage --

    AXSYS, WebOLAP

    z Informix -- Metacubez Microstrategies --DSS/Agent

    z Microsoft -- Platoz Oracle -- Express z Pilot -- LightShip z Planning Sciences --

    Gentium

    z Platinum Technology -- ProdeaBeacon, Forest & Trees

    z SAS Institute -- SAS/EIS, OLAP++

    z Speedware -- Media

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 154

    Other Warehouse Related Products

    z Data extract, clean, transform, refreshy CA-Ingres replicatory Carleton Passporty Prism Warehouse Managery SAS Accessy Sybase Replication Servery Platinum Inforefiner, Infopump

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 155

    Extraction and Transformation Tools

    z Carleton Corporation -- Passport

    z Evolutionary Technologies Inc. -- Extract

    z Informatica -- OpenBridge

    z Information Builders Inc. -- EDA Copy Manager

    z Platinum Technology -- InfoRefiner

    z Prism Solutions -- Prism Warehouse Manager

    z Red Brick Systems -- DecisionScape Formation

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 156

    Scrubbing Tools

    z Apertus -- Enterprise/Integrator z Vality -- IPEz Postal Soft

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 157

    Warehouse Products

    z Computer Associates -- CA-Ingres z Hewlett-Packard -- Allbase/SQL z Informix -- Informix, Informix XPSz Microsoft -- SQL Server z Oracle -- Oracle7, Oracle Parallel Serverz Red Brick -- Red Brick Warehouse z SAS Institute -- SAS z Software AG -- ADABAS z Sybase -- SQL Server, IQ, MPP

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 158

    Warehouse Server Products

    z Oracle 8z Informix

    y Online Dynamic Servery XPS --Extended Parallel Servery Universal Server for object relational

    applicationsz Sybase

    y Adaptive Server 11.5y Sybase MPPy Sybase IQ

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 159

    Warehouse Server Products

    z Red Brick Warehousez Tandem Nonstopz IBM

    y DB2 MVSy Universal Servery DB2 400

    z Teradata

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 160

    Other Warehouse Related Products

    z Connectivity to Sourcesy Apertusy Information Builders EDA/SQLy Platimum Infohuby SAS Connecty IBM Data Joinery Oracle Open Connecty Informix Express Gateway

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 161

    Other Warehouse Related Products

    z Query/Reporting Environmentsy Brio/Queryy Cognos Impromptuy Informix Viewpointy CA Visual Expressy Business Objectsy Platinum Forest and Trees

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 162

    4GL's, GUI Builders, and PC Databases

    z Information Builders -- Focus z Lotus -- Approach z Microsoft -- Access, Visual Basic z MITI -- SQR/Workbench z PowerSoft -- PowerBuilder z SAS Institute -- SAS/AF

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 163

    Data Mining Products

    z DataMind -- neurOagent z Information Discovery -- IDIS z SAS Institute -- SAS/Neuronets

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 164

    Data Warehouse

    z W.H. Inmon, Building the Data Warehouse, Second Edition, John Wiley and Sons, 1996

    z W.H. Inmon, J. D. Welch, Katherine L. Glassey, Managing the Data Warehouse, John Wiley and Sons, 1997

    z Barry Devlin, Data Warehouse from Architecture to Implementation, Addison Wesley Longman, Inc 1997

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 165

    Data Warehouse

    z W.H. Inmon, John A. Zachman, Jonathan G. Geiger, Data Stores Data Warehousing and the Zachman Framework, McGraw Hill Series on Data Warehousing and Data Management, 1997

    z Ralph Kimball, The Data Warehouse Toolkit, John Wiley and Sons, 1996

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 166

    OLAP and DSS

    z Erik Thomsen, OLAP Solutions, John Wiley and Sons 1997

    z Microsoft TechEd Transparencies from Microsoft TechEd 98

    z Essbase Product Literaturez Oracle Express Product Literaturez Microsoft Plato Web Sitez Microstrategy Web Site

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 167

    Data Mining

    z Michael J.A. Berry and Gordon Linoff, Data Mining Techniques, John Wiley and Sons 1997

    z Peter Adriaans and Dolf Zantinge, Data Mining, Addison Wesley Longman Ltd. 1996

    z KDD Conferences

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 168

    Other Tutorials

    z Donovan Schneider, Data Warehousing Tutorial, Tutorial at International Conference for Management of Data (SIGMOD 1996) and International Conference on Very Large Data Bases 97

    z Umeshwar Dayal and Surajit Chaudhuri, Data Warehousing Tutorial at International Conference on Very Large Data Bases 1996

    z Anand Deshpande and S. Seshadri, Tutorial on Datawarehousing and Data Mining, CSI-97

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net

  • 169

    Useful URLs

    z Ralph Kimballs home pagey http://www.rkimball.com

    z Larry Greenfields Data Warehouse Information Centery http://pwp.starnetinc.com/larryg/

    z Data Warehousing Institutey http://www.dw-institute.com/

    z OLAP Councily http://www.olapcouncil.com/

    www.jntuworld.com

    www.jntuworld.com

    www.jwjobs.net


Recommended