Download - OLAP On Line Analytic Processing. OLTP On Line Transaction Processing –support for ‘real-time’ processing of orders, bookings, sales –typically access.

OLAP

On Line Analytic Processing

OLTP

• On Line Transaction Processing– support for ‘real-time’ processing of orders,

bookings, sales– typically access to single rows of tables– current data only - current products, available

flights– supports Operational level of organisational

hierarchy

Analysis

• Support for – tactical (e.g. stock ordering in view of forecast

demand– strategic (e.g. where to open a new store)

• Need to ‘understand the data’ - ‘Business Intelligence’

Requirements• Operations

– Summation– Statistical analysis - variation for standard– Ranking and percentages– Comparison over time - last year-to-date

• ‘Static data picture’ to avoid ‘inconsistent reads’ during analysis production, and comparisons between analyses

• Addition data - forecasting models, external data, structural data (location of tills, of products on shelves)

Analysis pre database

• 50% of code used to create complex bulk reports on how the company was performing

• inflexible, costly, untimely

• ‘raw data’ too large to retain, so aggregation required - restricts year-to year comparison to level of prior aggregation

OLAP/Data warehouse

• Specialised tools, data structures to support analysis ‘on line’ i.e. on demand

• Data will be a snap-shot to ensure consistency

• Hold base ‘fact’ table - raw basic facts - credit card transaction, item sale, booking

• + multiple analytic dimensions - – time, product, customer, store

SuperMarket ‘Basket’ data• Fact - a single line on a till receipt

– basket no– date– customer no– product code– till no– quantity– price

• Dimensions – Customer, Product, Time, Till/Store

Star Schema

Kinds of Attributes• Measures

– continuously varying– interval,ratio scales– able to be summed, ranked etc

• Dimensions– nominal scales - no ordering– may be classified in multiple ways

• Date/Time– interval scale - i.e ordered only– can be treated as dimension and classified

Product dimension

• Product e.g. size 9 Bata slippers– Product category - shoes

• Product group - clothing

– Size e.g. shoe9– Range e.g Bata– Product class e.g. cheap

Time Dimension

• Date e.g. 11 Dec 2002– Day of Week e.g. Wednesday– Week e.g. 49– Month e.g. December

• Qtr e.g.3– Year e.g. 2002

– Promotion period e.g. Pre-Christmas Sales– Season e.g. Autumn

Snowflake schema

Query Processing

• Consider a typical analytic query– How do sales of clothing vary by day of week in stores in

the SW region?– Select product.name,dow.name,sum(qty*price)– from sales, product, productCategory,

productGroup,date,DOW– where ( the 5 join conditions)– and (productGroup.name=‘Clothing’)– group by DOW.name,product.name– order by DOW.name,product.name

Difficulties

• Fact table is huge - must be compressed as much as possible by reducing field sizes etc

• Since dimension data is smaller and more stable, OK to denormalise to reduce joins– date - add dow.id,month.id,year,fiscalYear,…– Balance required– Denormalised dimensions result in ‘Starflake

schema’

Aggregation

• Simple case :– sales - date,product,cust,store,value– How many aggregations are possible– Store(5) Product(10) Customer(30)

Aggregation operations

• Cube - all possible aggregations– 8 for 3 dimensions

• Roll-up - aggregate in order– e.g. Product,Store,Cust

– 4 for 3 dimensions

• Slice and Dice– Takes parts or slices of a cube of aggregations

• Drill down– Given an aggregation, expand an aggregated dimension e.g. Expand

clothing sales analysis by City

Data Mining

• OLAP requires a priori assumptions about the categories of interest

• But a useful category may be ‘hidden’ - can it be discovered a posteriori ?– e.g. identify high risk motor insurance policies by

attributes of policy - gender, age, type of vehicle, job, postcode

• Rule induction (machine learning) methods can be used

• Finding relationships– looking for deviations from normal behaviour -

e.g. to identify fraudulent transactions in a credit card company

– looking for deviations from average e.g. Non-random combinations of goods in a basket - classic example is beer and nappies

• Requires heavy aggregations, statistical selection, rule induction

ETL

• Extract - Transform - Load

• OLAP databases are often said to be read only - but all need periodic updating with new data extracted from sources, validation, re-organising and load, whilst maintaining aggregations and index

• Extract– new or changed data from the OLTP– changed product and structural data– external data

• External data may be in legacy systems, remote databases, flat files

Transform

• Filtering out bad transaction data

• Validating against database

Load

• Load new facts

• Re-de-normalising if product dimensions change

• Re-aggregation