ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition....

ORDB ImplementationDiscussion

Ramakrishnan and Gehrke. Database Management Systems, 3rd Edition.

From RDB to ORDB

Issues to address whenadding OO extensions to

DBMS system


Layout of Data

• Deal with large data types : ADTs/blobs• special-purpose file space for such data, with

special access methods• Large fields in one tuple :

• One single tuple may not even fit on one disk page

• Must break into sub-tuples and link via disk pointers

• Flexible layout : • constructed types may have flexible sized sets, ,

e.g., one attribute can be a set of strings.• Need to provide meta-data inside each type

concerning layout of fields within the tuple• Insertion/deletion will cause problems when

contiguous layout of ‘tuples’ is assumed


Layout of Data

• More layout design choices (clustering on disk):

• Lay out complex object nested and clustered on disk (if nested and not pointer based)

• Where to store objects that are referenced (shared) by possibly several other and different structures

• Many design options for objects that are in a type hierarchy with inheritance

• Constructed types such as arrays require novel methods, like array chunking into (4x4) subarrays for non-continuous access


Objects/OIDs

• OID generation : uniqueness across time and system

• Object reference handling : • must avoid dangling references• semantics for object manipulation for

shared objects


ADTs

•Type representation: size/storage•Type access : import/export•Type manipulation: special methods

to serve as filter predicates and join predicates

•Special-purpose index structures : efficiency


ADTs

• Mechanism to add index support along with ADT:• External storage of index file outside DBMS• Provide “access method interface” a la:

• Open(), close(), search(x), retrieve-next()• Plus, statistics on external index

• Or, generic ‘template’ index structure • Generalized Search Tree (GiST) – user-extensible• Concurrency/recovery provided


Query Processing

• Query Parsing :• Type checking for methods• Subtyping/Overriding

• Query Rewriting:• May translate path expressions into join

operators• Deal with collection hierarchies (UNION?)• Indices or extraction out of collection

hierarchy


Query Optimization Core

• New algebra operators must be designed :• such as nest, unnest, array-ops,

values/objects, etc.

• Query optimizer must integrate them into optimization process :• New Rewrite rules• New Costing• New Heuristics


Query Optimization Revisited

• Existing algebra operators revisited : SELECT

• Where clause expressions can be expensive

• So SELECT pushdown may be bad heuristic


Selection Condition Rewriting

• EXAMPLE:• (tuple.attribute < 50)

• Only CPU time (on the fly)

• (tuple.location OVERLAPS lake-object)• Possibly complex CPU-heavy computations • May Involve both IO and CPU costs

• State-of-art: • consider reduction factor only

• Now, we must consider both factors:• Cost factor : dramatic variations • Reduction factor: unrelated to cost factor


Operator Ordering

op1

op2


Ordering of SELECT Operators

• Cost factor : dramatic variations • Reduction factor: orthogonal to cost factor• We want: maximal reduction and minimal cost• Rank ( operator ) = (reduction) * ( 1/cost ) • Order operators by increasing ‘rank’

• High rank (good) -> low in cost, and large reduction

• Low rank (bad) -> high in cost, and small reduction


Access Methods ( on what ?)

• Indexes that are ADT specific• Indexes on navigation path• Indexes on methods, not just on

columns• Indexes over collection hierarchies

(trade-offs)• Indexes for new WHERE clause

expressions not just =, <, > ; but also “overlaps”


Registering New Index (to Optimizer)

• What WHERE conditions it supports• Estimated cost for “matching tuple”

• Given by index designer (user?)• Monitor statistics; even construct test plans

• Estimation of reduction factors/join factors:• Register auxiliary function to estimate factor• Provide simple defaults• Estimation of method costs (~IO/CPU)


Methods

• Dynamic linking of methods (outside DB)• Overwriting methods for type hierarchy• Use of “methods” with implied semantics• Incorporation of methods into query

process : termination? • “untrusted” methods : methods corrupt

server or modify DB content (side effects)• Handling of “untrusted” methods :

• restrict language; interpret vs compile, separate address space as DB server


Query Optimization with Methods

• Estimation of “costs” of method predicates• Optimization of Method execution:

• Similar idea as handling correlated nested subqueries; must recognize repetition and rewrite physical plan.

• Provide some level of precomputation and reuse

• Optimization of Method execution:• 1. If called on same input, cache that one result• 2. If on full column, presort column first (groupby)• 3. Or, precompute results of methods for each

possible value in domain; and put in hash-table : fct (val );

Look up in hash-table during query processing or even join with it, instead of recomputing : val fct (val)


Query Processing• User-defined aggregate functions:

• E.g., “second largest” or “second yellowest”

• Distributive aggregates: incremental computation • Provide:

• Initialize(): set up state space• Iterate(): per tuple update the state• Terminate(): compute final result based on state; and

cleanup state

• For example : “second largest” • Initialize(): 2 fields• Iterate(): per tuple compare numbers• Terminate(): remove 2 fields


Following Disk Pointers?

• Complex object structures with object pointers may exist (~ disk pointers)

• Navigate complex object into memory for a long-running transaction like in CAD design

• What to do about “pointers” between subobjects or related objects ?• Swizzle = replace OIDs dereferences by in-

memory pointers, and unswizzle back at end.• Issues : In-memory table of OIDs and their state;

indicate in each object pointer via a bit.• Different policies for swizzling: on access,

attached to object brought in, etc.


Models of Persistence• Different models of persistence for OODB

implementations:• Parallel type systems:

• E.g., int and dbint• User must make decision at object creation time• Allow for user control by “casting” types

• Persistence by container management:• Objects must be placed into “persistent containers” such as

relations in order to stay around• Eg., Insert o into Collection MyBooks;• Could be rather dynamic control without casting

• Persistence by reachability :• Use global variable names to objects and structures• Objects being referenced by other objects that are reachable

by application, they by transitivity are also persistent.• need garbage collection


Summary

• A lot of work to get there: From physical database

design/layout issues up to logical query optimizer extensions

• ORDB: reuses existing implementation base and incrementally adds new features on (but relation is first-class citizen)

Date post:	21-Dec-2015
Category:	Documents
View:	216 times
Download:	1 times

ORDB Implementation Discussion. Ramakrishnan and Gehrke. Database Management Systems, 3 rd Edition....

Documents