+ All Categories
Home > Documents > ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition....

ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition....

Date post: 21-Dec-2015
Category:
View: 216 times
Download: 1 times
Share this document with a friend
Popular Tags:
21
ORDB Implementation Discussion
Transcript
Page 1: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

ORDB ImplementationDiscussion

Page 2: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

From RDB to ORDB

Issues to address whenadding OO extensions to

DBMS system

Page 3: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Layout of Data

• Deal with large data types : ADTs/blobs• special-purpose file space for such data, with

special access methods• Large fields in one tuple :

• One single tuple may not even fit on one disk page

• Must break into sub-tuples and link via disk pointers

• Flexible layout : • constructed types may have flexible sized sets, ,

e.g., one attribute can be a set of strings.• Need to provide meta-data inside each type

concerning layout of fields within the tuple• Insertion/deletion will cause problems when

contiguous layout of ‘tuples’ is assumed

Page 4: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Layout of Data

• More layout design choices (clustering on disk):

• Lay out complex object nested and clustered on disk (if nested and not pointer based)

• Where to store objects that are referenced (shared) by possibly several other and different structures

• Many design options for objects that are in a type hierarchy with inheritance

• Constructed types such as arrays require novel methods, like array chunking into (4x4) subarrays for non-continuous access

Page 5: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Objects/OIDs

• OID generation : uniqueness across time and system

• Object reference handling : • must avoid dangling references• semantics for object manipulation for

shared objects

Page 6: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

ADTs

•Type representation: size/storage•Type access : import/export•Type manipulation: special methods

to serve as filter predicates and join predicates

•Special-purpose index structures : efficiency

Page 7: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

ADTs

• Mechanism to add index support along with ADT:• External storage of index file outside DBMS• Provide “access method interface” a la:

• Open(), close(), search(x), retrieve-next()• Plus, statistics on external index

• Or, generic ‘template’ index structure • Generalized Search Tree (GiST) – user-extensible• Concurrency/recovery provided

Page 8: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Query Processing

• Query Parsing :• Type checking for methods• Subtyping/Overriding

• Query Rewriting:• May translate path expressions into join

operators• Deal with collection hierarchies (UNION?)• Indices or extraction out of collection

hierarchy

Page 9: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Query Optimization Core

• New algebra operators must be designed :• such as nest, unnest, array-ops,

values/objects, etc.

• Query optimizer must integrate them into optimization process :• New Rewrite rules• New Costing• New Heuristics

Page 10: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Query Optimization Revisited

• Existing algebra operators revisited : SELECT

• Where clause expressions can be expensive

• So SELECT pushdown may be bad heuristic

Page 11: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Selection Condition Rewriting

• EXAMPLE:• (tuple.attribute < 50)

• Only CPU time (on the fly)

• (tuple.location OVERLAPS lake-object)• Possibly complex CPU-heavy computations • May Involve both IO and CPU costs

• State-of-art: • consider reduction factor only

• Now, we must consider both factors:• Cost factor : dramatic variations • Reduction factor: unrelated to cost factor

Page 12: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Operator Ordering

op1

op2

Page 13: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Ordering of SELECT Operators

• Cost factor : dramatic variations • Reduction factor: orthogonal to cost factor• We want: maximal reduction and minimal cost• Rank ( operator ) = (reduction) * ( 1/cost ) • Order operators by increasing ‘rank’

• High rank (good) -> low in cost, and large reduction

• Low rank (bad) -> high in cost, and small reduction

Page 14: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Access Methods ( on what ?)

• Indexes that are ADT specific• Indexes on navigation path• Indexes on methods, not just on

columns• Indexes over collection hierarchies

(trade-offs)• Indexes for new WHERE clause

expressions not just =, <, > ; but also “overlaps”

Page 15: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Registering New Index (to Optimizer)

• What WHERE conditions it supports• Estimated cost for “matching tuple”

• Given by index designer (user?)• Monitor statistics; even construct test plans

• Estimation of reduction factors/join factors:• Register auxiliary function to estimate factor• Provide simple defaults• Estimation of method costs (~IO/CPU)

Page 16: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Methods

• Dynamic linking of methods (outside DB)• Overwriting methods for type hierarchy• Use of “methods” with implied semantics• Incorporation of methods into query

process : termination? • “untrusted” methods : methods corrupt

server or modify DB content (side effects)• Handling of “untrusted” methods :

• restrict language; interpret vs compile, separate address space as DB server

Page 17: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Query Optimization with Methods

• Estimation of “costs” of method predicates• Optimization of Method execution:

• Similar idea as handling correlated nested subqueries; must recognize repetition and rewrite physical plan.

• Provide some level of precomputation and reuse

• Optimization of Method execution:• 1. If called on same input, cache that one result• 2. If on full column, presort column first (groupby)• 3. Or, precompute results of methods for each

possible value in domain; and put in hash-table : fct (val );

Look up in hash-table during query processing or even join with it, instead of recomputing : val fct (val)

Page 18: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Query Processing• User-defined aggregate functions:

• E.g., “second largest” or “second yellowest”

• Distributive aggregates: incremental computation • Provide:

• Initialize(): set up state space• Iterate(): per tuple update the state• Terminate(): compute final result based on state; and

cleanup state

• For example : “second largest” • Initialize(): 2 fields• Iterate(): per tuple compare numbers• Terminate(): remove 2 fields

Page 19: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Following Disk Pointers?

• Complex object structures with object pointers may exist (~ disk pointers)

• Navigate complex object into memory for a long-running transaction like in CAD design

• What to do about “pointers” between subobjects or related objects ?• Swizzle = replace OIDs dereferences by in-

memory pointers, and unswizzle back at end.• Issues : In-memory table of OIDs and their state;

indicate in each object pointer via a bit.• Different policies for swizzling: on access,

attached to object brought in, etc.

Page 20: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Models of Persistence• Different models of persistence for OODB

implementations:• Parallel type systems:

• E.g., int and dbint• User must make decision at object creation time• Allow for user control by “casting” types

• Persistence by container management:• Objects must be placed into “persistent containers” such as

relations in order to stay around• Eg., Insert o into Collection MyBooks;• Could be rather dynamic control without casting

• Persistence by reachability :• Use global variable names to objects and structures• Objects being referenced by other objects that are reachable

by application, they by transitivity are also persistent.• need garbage collection

Page 21: ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition. From RDB to ORDB Issues to address when adding OO.

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

Summary

• A lot of work to get there: From physical database

design/layout issues up to logical query optimizer extensions

• ORDB: reuses existing implementation base and incrementally adds new features on (but relation is first-class citizen)


Recommended