Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 216 times |
Download: | 2 times |
2/10/05 Salman Azhar: Database Systems
1
On-Line Analytical Processing
Salman Azhar
WarehousingData CubesData Mining
These slides use some figures, definitions, and explanations from Elmasri-Navathe’s Fundamentals of Database Systems
and Molina-Ullman-Widom’s Database Systems
2/10/05 Salman Azhar: Database Systems
2
Overview
Traditional database systems tuned to many, small, simple queries
Some newer “analytic” applications fewer, more time-consuming, complex
queries New architectures
developed to handle complex “analytic” queries efficiently
2/10/05 Salman Azhar: Database Systems
3
The Data Warehouse
The most common form of data integration: Copy sources into a single DB
(warehouse) and try to keep it up-to-date
Usual method: periodic reconstruction of the warehouse, perhaps overnight
Warehouse essential for analytic queries
2/10/05 Salman Azhar: Database Systems
4
OLTP Most database operations involve
On-Line Transaction Processing (OTLP). Short, simple, frequent queries and/or
modifications Each involving a small number of tuples.
Examples… : Looking up a phone number on the web Sales at cash registers Selling airline tickets
2/10/05 Salman Azhar: Database Systems
5
OLAP
Increasing importance of On-Line Application Processing (OLAP) queries Few, but complex queries --- may run
for hours. Queries do not depend on having an
absolutely up-to-date database. Sometimes called Data Mining
2/10/05 Salman Azhar: Database Systems
6
OLAP Examples
1. Amazon analyzes customer purchases by its customers to recommend with products of likely interest
Compares purchases between customers Takes longer than customers are willing to
wait
2. Wal-Mart looks for items with sales trends in a region or time period
Presents data to vendors Used to determine ordering and inventory
2/10/05 Salman Azhar: Database Systems
7
Common Architecture
Databases at branches handle OLTP
Local databases copied to a central warehouse overnight (or periodically)
Analysts use the warehouse for OLAP
OLTP
OLTP
OLTP
OLTP
OLAP
AnalystsTransaction Users
2/10/05 Salman Azhar: Database Systems
8
Data Warehouse
Data Warehouse
Data Access
User Data AccessData
Sources
Data Input
StagingArea
Data Marts
2/10/05 Salman Azhar: Database Systems
9
Star Schemas A star schema
common organization for data at a warehouse
It consists of… Fact table :
a very large accumulation of facts such as sales often “insert-only”
Dimension tables : smaller, generally static information about the entities involved in the facts
2/10/05 Salman Azhar: Database Systems
10
StarSchema
Fact TableDimension Table
Employee_DimEmployee_DimEmployee_DimEmployee_DimEmployeeKeyEmployeeKey
EmployeeID...EmployeeID...
Time_DimTime_DimTime_DimTime_DimTimeKeyTimeKey
TheDate...TheDate...
Product_DimProduct_DimProduct_DimProduct_DimProductKeyProductKey
ProductID...ProductID...
Customer_DimCustomer_DimCustomer_DimCustomer_DimCustomerKeyCustomerKey
CustomerID...CustomerID...
Shipper_DimShipper_DimShipper_DimShipper_DimShipperKeyShipperKey
ShipperID...ShipperID...
Sales_FactSales_FactTimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey
TimeKeyEmployeeKeyProductKeyCustomerKeyShipperKey
Sales AmountUnit Sales ...Sales AmountUnit Sales ...
2/10/05 Salman Azhar: Database Systems
11
Example: Star Schema
Suppose we want to record in a warehouse information about every car sale: dealer, car, buyer, day, time, price
paid The fact table is a relation:
Sale(dealer, model, buyer, day, time, price)
2/10/05 Salman Azhar: Database Systems
12
Example, Continued The dimension tables include
information about the dealer, car, and buyer “dimensions”: Dealer(dealer, city, zip) Car(model, manufacturer) Buyer(buyer, city, phone)
Recall the fact table: Sale(dealer, model, buyer, day, time,
price)
2/10/05 Salman Azhar: Database Systems
13
Dimensions and Dependent Attributes Two classes of fact-table attributes:
Dimension attributes : the key of a dimension table Sale(dealer, model, buyer, day, time, price)
Dependent attributes : a value determined by the dimension
attributes of the row Sale(dealer, model, buyer, day, time, price) E.g., price determined by the combination
of dealer, model, buyer, day, time
2/10/05 Salman Azhar: Database Systems
14
Example: Dependent Attribute
price is determined by the combination of dimension
attributes: dealer, car, buyer, and the time
(combination of day and time attributes).
2/10/05 Salman Azhar: Database Systems
15
Approaches to Building Warehouses
ROLAP = “relational OLAP”: Tune a relational DBMS to support
star schemas MOLAP = “multidimensional
OLAP”: Use a specialized DBMS with a
model such as the “data cube”
2/10/05 Salman Azhar: Database Systems
16
ROLAP Techniques Bitmap indexes :
For each key value of a dimension table (e.g., each model for relation Cars)
create a bit-vector telling which tuples of the fact table have that value
Materialized views : Store the answers to several useful
queries (views) in the warehouse itself Stored views!
2/10/05 Salman Azhar: Database Systems
17
Typical OLAP Queries Often, OLAP queries begin with a “star join”:
the natural join of the fact table with all or most of the dimension tables
Recall the tables:Sales(dealer, model, buyer, day, time, price)
Dealers(dealer, city, zip) Cars(model, manufacturer) Buyers(buyer, city, phone)
Example:SELECT * FROM Sales, Dealers, Cars, BuyersWHERE Sales.dealer = Dealers.dealer AND
Sales.model = Cars.model ANDSales.buyer = Buyers.buyer;
2/10/05 Salman Azhar: Database Systems
18
Typical OLAP Queries --- 2
The typical OLAP query will:1. Start with a star join2. Select for interesting tuples, based
on dimension data3. Group by one or more dimensions4. Aggregate certain attributes of the
result
2/10/05 Salman Azhar: Database Systems
19
Example: OLAP Query For each dealer in Indianapolis
find the total sales of each car manufactured by BMW
Filter: city = “Indianapolis” manf = “BMW”
Grouping: by dealer and car
Aggregation: Sum of price
GROUP EXERCISE:Write the SQL Query
Note: Do not turn over to the next page before
attempting this exercise yourself!
2/10/05 Salman Azhar: Database Systems
20
Example: In SQL
SELECT dealer, model, SUM(price)
FROM Sales NATURAL JOIN Dealers
NATURAL JOIN Cars
WHERE Dealer.city = ’Indianapolis’
AND Car.manf = ’BMW’
GROUP BY dealer, model;
2/10/05 Salman Azhar: Database Systems
21
Using Materialized Views
A direct execution of this query from Sales and the dimension tables could take too long
If we create a materialized view that contains enough information, we may be able to answer our query
much faster
2/10/05 Salman Azhar: Database Systems
22
Example: Materialized View
Which views could help with our query? Key issues:
1. It must join Sales, Dealers, and Cars, at least2. It must group by at least dealer and car3. It must not select out Indianapolis Dealers or
BMW Cars4. It must not project out city or manf
2/10/05 Salman Azhar: Database Systems
23
Example --- Continued Here is a materialized view that could help:
CREATE VIEW vSales(dealer, city,
car, manf, sales) AS
SELECT dealer, city, model, manf,
SUM(price) sales
FROM Sales NATURAL JOIN Dealers
NATURAL JOIN Cars
GROUP BY dealer, city, model, manf;
Since dealer -> city and model -> manf, there is no real grouping.We need city and manf in the SELECT.
2/10/05 Salman Azhar: Database Systems
24
Example --- Concluded
Here’s our query using the materialized view vSales:
SELECT dealer, car, sales
FROM vSales
WHERE city = ’Indianapolis’
AND manf = ’BMW’;
2/10/05 Salman Azhar: Database Systems
25
MOLAP and Data Cubes
Keys of dimension tables are the dimensions of a hypercube Example: for the Sales data, the four
dimensions are Dealers, Cars, Buyers, and time
Dependent attributes (e.g., price) appear at the points of the cube
2/10/05 Salman Azhar: Database Systems
26
Defining a Cube
Q4Q1 Q2 Q3Time Dimension Pro
ducts
Dim
ensio
n
Detroit
Denver
Chicago
Mar
ket D
imen
sion
Apples
CherriesGrapes
Atlanta
Melons
2/10/05 Salman Azhar: Database Systems
27
Q4Q1 Q2 Q3Time Dimension
Produ
cts D
imen
sionDallas
Denver
Chicago
Mar
kets
Dim
ensi
on
Apples
CherriesGrapes
AtlantaSales Fact
Melons
Querying a Cube
2/10/05 Salman Azhar: Database Systems
28
ApplesQ4Q1 Q2 Q3
Time Dimension
Produ
cts D
imen
sion
Detroit
Denver
Chicago
Atlanta
Mar
kets
Dim
ensi
on
MelonsCherries
Grapes
Defining a Cube Slice
2/10/05 Salman Azhar: Database Systems
29
Working with Dimensions and Hierarchies Dimensions Allow You to
Slice Dice
Hierarchies Allow You to
Drill Down Drill Up
2/10/05 Salman Azhar: Database Systems
30
Marginals
The data cube also includes aggregation (typically SUM) along the margins of the cube
The marginals include aggregations over one dimension, two
dimensions,…
2/10/05 Salman Azhar: Database Systems
31
Example: Marginals
Our 4-dimensional Sales cube includes the sum of price over each dealer, each
car, each buyer, and each time unit (perhaps days)
It would also have the sum of price over all dealer-model pairs, all dealer-buyer-
day triples,…
2/10/05 Salman Azhar: Database Systems
32
Structure of the Cube Think of each dimension as having an
additional value * A point with one or more *’s in its
coordinates aggregates over the dimensions with the *’s.
Example: Sales(“Auto Nation”, “Mini Cooper”, *, *)
holds the sum over all Buyers and all time of the Mini Coopers bought at AutoNation
2/10/05 Salman Azhar: Database Systems
33
Drill-Down
Drill-down = “de-aggregate” = break an aggregate into its
constituents Example:
having determined that Auto Nation sells very few BMW Cars,
break down his sales by particular car
2/10/05 Salman Azhar: Database Systems
34
Roll-Up
Roll-up = aggregate along one or more
dimensions. Example:
given a table of how many Mini Coopers each buyer buys at each dealer,
roll it up into a table giving total number of Mini Coopers bought by each buyer
2/10/05 Salman Azhar: Database Systems
35
Materialized Data-Cube Views
Data cubes invite materialized views that are aggregations in one or more dimensions
Dimensions may not be completely aggregated an option is to group by an attribute
of the dimension table
2/10/05 Salman Azhar: Database Systems
36
Example
A materialized view for our Sales data cube might:
1. Aggregate by buyer completely2. Not aggregate at all by car3. Aggregate by time according to the
week4. Aggregate according to the city of
the dealer
2/10/05 Salman Azhar: Database Systems
37
Data Mining
Data mining is a popular term for queries that summarize big data sets in useful ways
Examples:1. Clustering all Web pages by topic2. Finding characteristics of fraudulent
credit-card use
2/10/05 Salman Azhar: Database Systems
38
Market-Basket Data
An important form of mining from relational data involves market baskets sets of “items” that are purchased
together as a customer leaves a store Summary of basket data is frequent
itemsets sets of items that often appear together
in baskets
2/10/05 Salman Azhar: Database Systems
39
Example: Market Baskets
If people often buy bread and butter together, the store can:
1. Put bread and butter near each other and put potato chips between the two
2. Run a sale on bread and raise the price of butter
2/10/05 Salman Azhar: Database Systems
40
Finding Frequent Pairs The simplest case is when we only
want to find “frequent pairs” of items. Assume data is in a relation
Baskets(basket, item) The support thresholds is the
minimum number of baskets in which a pair appears before we are interested
2/10/05 Salman Azhar: Database Systems
41
Frequent Pairs in SQL
SELECT b1.item, b2.item
FROM Baskets b1, Baskets b2
WHERE b1.basket = b2.basket
AND b1.item < b2.item
GROUP BY b1.item, b2.item
HAVING COUNT(*) >= s;
Look for twoBasket tupleswith the same
basket anddifferent items.First item must
precede second,so we don’t
count the samepair twice.
Create a group foreach pair of itemsthat appears in atleast one basket.
Throw away pairs of itemsthat do not appear at least
s times.
2/10/05 Salman Azhar: Database Systems
42
Summary
OLAP vs. OLTP Two different worlds
Warehousing Data Cubes Data Mining Materialized views Storing aggregate data