+ All Categories
Home > Documents > March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

Date post: 16-Dec-2015
Category:
Upload: opal-mckinney
View: 216 times
Download: 2 times
Share this document with a friend
21
March 30 2001 DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University
Transcript
Page 1: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Aggregation in Main Memory

Kenneth A. Ross

Columbia University

Page 2: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Research Experience

Complex query processing Data Warehousing Main memory databases

Students: Kazi Zaman, Junyan Ding

Page 3: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

MediatorMediatorQueryQuery

UnifiedUnifiedResultsResults

UserUser

Main-MemoryDBMS

TraditionalDBMS

......

Scenario A

Page 4: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

MediatorMediatorData RequestData Request

UnifiedUnifiedResultsResults

UserUser

Web

TraditionalDBMS

......

Scenario B

Main Memory

DB

Sequence OfSequence OfInteractiveInteractive QueriesQueries

Page 5: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

MediatorMediator

Data RequestData Request

UnifiedUnifiedResultsResults

UserUser Web

TraditionalDBMS

......

Scenario C

Main Memory

DB

Graphical User Graphical User InterfaceInterface

Dynamic QueryDynamic Query

Page 6: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Outline

Introduction to Datacubes Frameworks for querying cubes The Main Memory based framework Experimental Results Conclusions and Plan

Page 7: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

The CUBE BY Operator

State Year Grade Sales

CA 1997 Regular 90NY 1997 Premium 70CA 1998 Premium 65

NY 1998 Premium 95

State Year Grade Sales

CA 1997 Regular 90CA 1997 ALL 90ALL 1997 Regular 90CA ALL Regular 90

ALL 1997 Regular 90ALL 1997 ALL 160ALL ALL Regular 90CA ALL ALL 155

ALL ALL ALL 320

CUBE BY(sum Sales)

Large increase in total Size,especially with many dimensions

…….

Additional records

Page 8: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

DGRC FedStats VisitMarch 30 2001

Lattice Representation

State, Year, Grade

State, Year State, Grade Year, Grade

State Year Grade

Page 9: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Modeling Queries

Slice Queries ask for a single aggregate record

SELECT State, year, sum(sales)FROM BLS-12345GROUP BY State, yearHAVING State = “NY” AND year = “1998”

Page 10: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Existing Frameworks

State, Year, Grade

State, Year State,Grade Year,Grade

State Year Grade

Choose subset of cube tomaterialize based on workload.Materialize on disk

Appropriate record recovered orcomputed for incoming slice query

Drawbacks: Ignores Clustering of Relation on disk.Smallest unit of materialization is too big.

Page 11: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Our approach

State, Year, Grade

State, Year State,Grade Year,Grade

State Year Grade

The full cube is often larger than available memory, but ...

The finest granularity aggregate may fit.

Any record can be computedwithout having to go to disk.

How should the finest granularity be organized ?

Page 12: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Framework

Level-1 Store Level-2 Store

records in linked lists

Slot directory

Selected coarse recordsin hash table

Finest granularity cuboid

Query q

Page 13: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

The Level-1 Store

Records are <Key,Value> pairs stored in a hash table.

Records can contain ALL’s

Given query Q, form compositekey and check level-1 store (constant time).

If not found, use level-2 store

Key Value a1 55 b2 34 c2 12

… ...

Page 14: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

The Level-2 StoreLevel-2 Store

records in linked lists

Slot directory

Finest granularity cuboidSlot directory is organized asa multidimensional array:level2[sz1][sz2][sz3][sz4]

Each slot points to a linkedlist of elements.

Records placed according toset of mapping functions H

Page 15: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Using the Level-2 store

b4

Query Q without ALL’s

d5a3 c2

Slot 4 Slot 3 Slot 7 Slot1

Access list denoted by level2[4][3][7][1] ;aggregate those matching (a3,b4,c2,d5).

Page 16: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Using the Level-2 store

ALL

Query Q with ALL’s

ALLa3 c2

Slot 4 List of Slots Slot 7 List of Slots

Access lists matching level2[4][*][7][*] ;aggregate those matching (a3,*,c2,*).

Page 17: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Demo

Shows multidimensional dataset (subset of columns of 5% Census sample for NY in 1990).

User asks queries: fast answers. Future: User Interface asks many

queries, with display changing interactively.

demo

Page 18: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Experimental ResultsQuery Processing Time vs Additional Memory Used

(real dataset, 10^6 records, 8 dimensions)

0

5

10

15

0 20 40 60 80

Additional Memory Used in MB

Ave

rage

tim

e pe

r qu

ery

in m

illi

seco

nds

Query Cost

Scanning all records takes 194 ms.

Page 19: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Importance of Work

•Aggregation is fundamental to analysis.

•Make analysis interactive, even for many dimensions.

•Make a variety of aggregate granularities available, where possible.

Page 20: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Contributions

A Main Memory based framework for answering datacube queries efficiently.

Query Performance in the 2-4 ms range which is more efficient than going to disk.

Page 21: March 30 2001DGRC FedStats Visit Aggregation in Main Memory Kenneth A. Ross Columbia University.

March 30 2001 DGRC FedStats Visit

Plan

Integrate with user interface to generate dynamic queries.

Self-tuning capability. Multiple data sets. Work with agencies to generate value

– For intra-agency analysis– For enhanced data dissemination


Recommended