+ All Categories
Home > Documents > Deriving object oriented federated databases and processing federated queries

Deriving object oriented federated databases and processing federated queries

Date post: 10-Nov-2023
Category:
Upload: independent
View: 0 times
Download: 0 times
Share this document with a friend
26
Journal of System Integration, 8, 5–30 (1998) c 1998 Kluwer Academic Publishers. Manufactured in The Netherlands. Deriving Object Oriented Federated Databases and Processing Federated Queries EE-PENG LIM [email protected] Centre for Advanced Information Systems, Nanyang Technological University, Singapore 639798 JAIDEEP SRIVASTAVA [email protected] Department of Computer Science, University of Minnesota, Minneapolis, MN 55455 Received July 25, 1996; Revised May 29, 1997 Abstract. In this paper, we present a federated query processing approach to evaluate queries on an Object- Oriented (OO) federated database. This approach has been designed and implemented in the OO-Myriad project, which is an OO extension to the Myriad FDBS research[1]. Since data integration is performed as part of federated query processing, we have proposed outerjoin, outer-difference and generalized attribute derivation operations together with the traditional relational operations, to be used for integration purposes. To define an OO federated database as a virtual view on multiple OO export databases, we adopt a database mapping strategy that systematically derives each of the class extents, deep class extents and relationships of the federated database using an operator tree consisting of the integration operations. By augmenting federated database queries with this algebraic mapping information, query execution plans can be generated. Based on the original Myriad query processing framework, we have realized the proposed OO federated query processing approach in the OO-Myriad prototype. Keywords: federated query processing, database integration, object-oriented view 1. Introduction In most organizations, information is scattered among several departmental offices and is often stored in heterogeneous databases or in files administered by file systems with primitive retrieval capabilities. In addition, overlapping information can usually be found in different data sources. This is caused either by poor coordination among departments, or a compelling need to duplicate information for convenient and timely data access. After some time, the organizations are confronted with fractured information that hinders the development of new applications which require integrated data[2, 3]. When such situations arise, we have to rely on a federated database systems (FDBS) to hide the heterogeneity among local databases, for the global users. In contrast with query processing used by stand- alone database systems or distributed database systems, federated query processing must address the tasks of resolving schematic and semantic conflicts among data from different databases in addition to evaluating query operations. Moreover, the architectural design of a FDBS query processor must also preserve the autonomy of existing local database systems in order not to affect the legacy database applications. In [2], a five-level schema architecture for FDBS was proposed. Till now, a number of its variations have emerged but there is yet no clear winner[4]. In this paper, we adopt a similar but simpler architecture that consists of three levels as shown in Figure 1. The local schemas describe the local databases in their respective data models. The export schemas describe
Transcript

Journal of System Integration, 8, 5–30 (1998)c© 1998 Kluwer Academic Publishers. Manufactured in The Netherlands.

Deriving Object Oriented Federated Databases andProcessing Federated Queries

EE-PENG LIM [email protected] for Advanced Information Systems, Nanyang Technological University, Singapore 639798

JAIDEEP SRIVASTAVA [email protected] of Computer Science, University of Minnesota, Minneapolis, MN 55455

Received July 25, 1996; Revised May 29, 1997

Abstract. In this paper, we present a federated query processing approach to evaluate queries on an Object-Oriented (OO) federated database. This approach has been designed and implemented in the OO-Myriad project,which is an OO extension to the Myriad FDBS research[1]. Since data integration is performed as part offederated query processing, we have proposedouterjoin, outer-differenceandgeneralized attribute derivationoperations together with the traditional relational operations, to be used for integration purposes. To define anOO federated database as a virtual view on multiple OO export databases, we adopt a database mapping strategythat systematically derives each of the class extents, deep class extents and relationships of the federated databaseusing an operator tree consisting of the integration operations. By augmenting federated database queries withthis algebraic mapping information, query execution plans can be generated. Based on the original Myriad queryprocessing framework, we have realized the proposed OO federated query processing approach in the OO-Myriadprototype.

Keywords: federated query processing, database integration, object-oriented view

1. Introduction

In most organizations, information is scattered among several departmental offices andis often stored in heterogeneous databases or in files administered by file systems withprimitive retrieval capabilities. In addition, overlapping information can usually be foundin different data sources. This is caused either by poor coordination among departments,or a compelling need to duplicate information for convenient and timely data access. Aftersome time, the organizations are confronted with fractured information that hinders thedevelopment of new applications which require integrated data[2, 3]. When such situationsarise, we have to rely on a federated database systems (FDBS) to hide the heterogeneityamong local databases, for the global users. In contrast with query processing used by stand-alone database systems or distributed database systems, federated query processing mustaddress the tasks of resolving schematic and semantic conflicts among data from differentdatabases in addition to evaluating query operations. Moreover, the architectural designof a FDBS query processor must also preserve the autonomy of existing local databasesystems in order not to affect the legacy database applications.

In [2], a five-level schema architecture for FDBS was proposed. Till now, a number of itsvariations have emerged but there is yet no clear winner[4]. In this paper, we adopt a similarbut simpler architecture that consists of three levels as shown in Figure 1. The local schemasdescribe the local databases in their respective data models. The export schemas describe

6 LIM AND SRIVASTAVA

Local Schema B Local Schema CLocal Schema A

Export Schema A1 Export Schema B1 Export Schema C1 Export Schema C2

Federated Schema 1 Federated Schema 2 Federated Schema 3

Export

FederatedSchemaLevel

SchemaLevel

Local SchemaLevel

Figure 1. Three-level Schema Architecture

the subsets of local databases made available to the federated database users. The exportschemas, unlike the local schemas, are in the common OO data model. One or more exportschemas may be defined upon a local schema allowing different aspects of the local schemato be tapped by different FDBSs. Each export schema can participate in the constructionof none or more federated schemas1. At the federated schema level, we reconcile thediscrepancies among export databases. In this paper, the classes and objects that belong tothe federated database are termed asglobal classesandglobal objectsrespectively. Whilequeries formulated against local schemas and federated schemas are handled by local DBSquery processors and federated query processors respectively, the queries formulated againstexport schemas have to be handled bygateways. The development and maintenance of agateway may be managed by either the federated database administrator or the local databaseadministrator controlling the export schema the gateway supports. In any case, the gatewaymust be able to perform the appropriate data model translation when it processes an exportdatabase query. Although we have adopted a three-level schema architecture, it can beeasily extended to allow more schema levels. For example, specially customized views canstill be defined on a federated schema for applications accessing different portions of thefederated schema.

In theOO-Myriad project, we design a federated query processor that addresses hetero-geneity and local autonomy issues in FDBSs. The design advocates an algebraic approachto map from export databases to federated databases2. OO-Myriad supports a set of integra-tion operations that can be used to derive the OO federated database while resolving variouskinds of schema and semantic conflicts between export databases. The main advantage ofthis approach is its generality and flexibility. It is designed with a fundamental belief thatfederated query processing requires a suitable set of integration operations in addition tothe widely known query operations (e.g. select, project, and join in the case of relationalmodel). The two sets of operations may share common members but distinguishing themis essential. Since a federated database is usually a virtual view built upon export data, aFDBS query processor must be able to evaluate integration operations during query pro-cessing, as well as to optimize execution plans consisting of these operations. While somelanguage constructs may be used to specify the mapping from export databases to federateddatabases, these language constructs should eventually be transformed into a series of inte-gration operations before query processing can begin. Furthermore, the larger is the set of

DERIVING OBJECT ORIENTED FEDERATED DATABASES 7

‘useful’ integration operations, the more varied inter-database conflicts can be resolved bythe FDBS. In this manner, the FDBS query processor design can also achieve extensibility.

In recent years, the object-oriented data model has gained wide acceptance amongdatabase users. The ability to capture superclass-subclass (IS-A) and aggregation (IS-PART-OF) relationships between classes has enabled OO data model to represent the real-worldobjects in a natural manner. In the past, a number of implemented FDBS prototypes e.g.Myriad[1], CORD[5], PRECI*[6], etc. have chosen relational model as the canonical datamodel to represent global schemas and export schemas. On the other hand, the design ofFDBS supporting OO data model is still at an early stage of research. Equipped with OOglobal schemas and OO query support, a FDBS can integrate local DBs that may or maynot be object-oriented yet providing the OO semantics to the global applications. In OO-Myriad, we investigate the implication of supporting OO queries against global schemaswhich are built from OO export schemas. We show that the additional semantics requiredby a global schema complicates its derivation from export schemas as well as federatedquery processing. At present, we have focused on supporting basic object constructs andqueries in OO-Myriad, complex OO features such as methods and aggregate data types arenot included in the scope.

In the following, we summarize the essential features of OO-Myriad query processing.

• An algebraic approach has been developed to execute FDBS queries that not onlyinvolves query operations, but also integration operations which aim to resolve semanticand schematic conflicts among OO schemas and objects exported by local DBSs. Theintegration operationsouterjoin ,outer-difference, andjoin are used to merge objects inexport classes from different local databases such that: (a) the union of objects becomesobjects of a global class; (b) the difference between two sets of objects becomes objectsof a global class; and (c) the intersection between sets of objects becomes objects of aglobal class (typically a subclass with multiple parent classes). The merging of exportedobjects in the relational domain has been studied in [7, 8, 9, 4] and is known as theentity identification problem . Another integration operation,generalized attributederivation, has been used to resolve attribute conflict between the objects to be merged.This is known as theattribute value conflict problem and has been studied in [10, 11,12].

• To homogenize the OO query interface to local databases, and to allow local DBAs toexport OO views above the local databases, OO gateways have been developed. In thisway, local DBAs can exercise control on the amount of information accessible by thefederation, based on their security need or their contract with the federation DBA. Forsimilar reasons, the local DBAs can further determine the query operations supported byOO gateways. Thus, OO-Myriad preserves the autonomy of local DBSs. Furthermore,OO applications can be built either on the OO- Myriad query processor or its gateways.

• Similar to Myriad, the OO-Myriad query processing architecture supports a distributedquery execution model. This is accomplished by designating a federated query agentat each local site to coordinate the query execution as well as to execute operationsnot supported by the local DBS. Nevertheless, the architecture does not exclude thepossibility of performing centralized query execution (i.e. a special site is used todistribute subqueries to other local DBS sites and to collect their query results).

8 LIM AND SRIVASTAVA

In the following table, we summarize the major differences between OO-Myriad and itsancestor, i.e. Myriad.

Comparisons between Myriad and OO-Myriad

Myriad [1] OO-MyriadBoth the federated and export Both the federated and exportschemata are relational schemata are object-orientedSQL as the query language Object-oriented query language - MySQLOuterjoin, generalized attribute Outerjoin, outer-difference, andderivation operation are introducedgeneralized attribute derivationas integration operations operation are used as integration

operationsGlobal relations are the only kind Both global classes and relationshipsof collection of instances to be need to be derived.derived Global oid generation is an issue.

1.1. Survey of Related Work and Comparisons

In the last few years, many FDBS designs have been proposed and a number of prototypeshave been developed. In this subsection, we survey some of the related research efforts.

• The IRO-DB project [13, 14, 15], initiated by ESPRIT III, attempts to build a FDBSthat supports OO query processing on an integrated schema. Its proposed databaseintegration approach establishes path correspondences between local schemas in orderto augment a local schema with virtual classes and relationship attributes. The integratedschema consists of classes derived from the local databases using the ODMG [16]language. Nevertheless, since ODMG is originally designed for standalone objectdatabases, the ODMG language does not provide specialized operations for databaseintegration purposes. So far, IRO-DB has not introduced new integration operations toovercome the inadequacy of ODMG.

• Pegasus[17, 18, 19] is a completed FDBS project by HP Lab. It supports type and objectintegration using two special functions, i.e.unifier and image. The unifier functionmaps from a local type to a global type while the image function maps from a localobject to a global object. Unlike OO-Myriad which provides flexibility in definingthe global classes, Pegasus builds a virtual superclass over a set of related importedlocal classes in order to integrate the classes. When multiple local objects correspondto the same real-world entities, their OID equivalences are stored in a dictionary. Asshown in Figure 2, a virtual superclassemployee must be defined on two importedlocal classes,engineer andresearcher, in order to integrate the two closely relatedclasses. Hence, it is possible that a real-world employee entity is represented by objectsin theemployee, engineer andresearcher classes at the same time.

• A FDBS that allows sharing of objects on a pairwise basis has been developed in theRemote-exchangeproject [21, 20]. The approach makes remote objects accessible toa local database by creating local surrogates for the remote objects. Functions defined

DERIVING OBJECT ORIENTED FEDERATED DATABASES 9

ImportedClass

ImportedClass

Local DB1: RESEARCHER( name, address, phone, topic)

ENGINEER( name, address, phone, job)Local DB2:

A real-world entitywhich is modelled in bothRESEARCHER and ENGINEER

researcher engineer

employee

VirtualSuperclass

Figure 2. Pegasus Integration Strategy - Virtual Superclass

on local surrogates are implemented by performing retrievals on their correspondingremote objects. Since this project adopts a pairwise approach, it is not possible to definean integrated federated schema over more than two local databases as in the case ofOO-Myriad.

In addition to the above mentioned projects, there are also some ongoing research effortsthat adopt amediator architecture to integrate information from different sources havingdifferent formats and structures [22]. In the TSIMMIS project [23, 24], a new data modelknown as OEM (Object Exchange Model) is proposed to represent and query heterogeneousinformation. However, since OEM is designed for semi-structured information, it may notbe appropriate for integrating structured databases. In [25], a mediator system (known asInformation Manifold) for heterogeneous structured databases is described. InformationManifold is different from OOMyriad in that it is based on relational model and it does notattempt to integrate instances from multiple sources.

1.2. Paper Outline

This paper is organized as follows. Section 2 describes the OO data model and querylanguage adopted in this paper. An example of OO export schemas is given in Section 2.1.We present the OO export schema to global schema mapping strategy together with anexample in Section 3. In the same section, we define the new integration operations used inthe mapping. The OO-Myriad query processor architecture is given in Section 4. The queryprocessing design considerations and its steps are described in Section 5. Conclusions aregiven in Section 6.

2. OO-Myriad OO Schemas and Query Language

In OO-Myriad, both the federated databases and export databases are described by their re-spective OO schemas. The following essential OO concepts are supported in these schemas:

10 LIM AND SRIVASTAVA

• Specialization and generalization: An object class can be defined as the superclass ofone or more object classes known as the subclasses. Asclass inheritanceis supported,the subclasses inherit the attributes possessed by their superclasses.

Every object must belong to exactly one object class. The termextentof a class is usedto denote the objects that belong to the class directly. Since the objects of a class arealso indirect members of its superclasses, we call the set of direct and indirect objectmembers of a class thedeep class extent.

• Multiple inheritance : OO-Myriad supports multiple inheritance of class attributes.That is, a class can be a subclass of multiple classes and inherit all the attributesfrom these superclasses. Nevertheless, the names of both inherited and non-inheritedattributes must be unique in order to distinguish between them. If there are attributesinherited from the same attribute in an ancestor class, these inherited attributes shouldbe considered identical.

• Object identifier (Oid) : In OO data model, each object is assigned an identifier uniqueacross all classes in a database. Oid is therefore a mandatory system-defined attributefor every class. Due to the autonomy of local DBSs, the oids of objects from differentexport database may not be unique. In other words, we may find the same oids existingin different export databases although their respective objects are not related to oneanother. In OO-Myriad, the global oids can be derived in two ways described inSection 3.1.

• Relationship attributes: In an object class, one can define the attributes of objects thatbelong to the class. Attributes can be classified intosimple attributesandrelationshipattributes. The former can assume values of primitive data types supported by the datamodel. The latter relates one class of objects to another class of objects.

In the design of OO-Myriad, we extend the well-accepted SQL language to includesome OO features. This extended language is calledMySQL . MySQL adopts the SQL’sSELECT-FROM-WHERE structure in specifying a retrieval query. Currently, the query modelsupported by MySQL includes the following features:

1. Path expressions. A path expression is a dot-expression linking one object class to the at-tribute of another class via one or more relationship attributes, e.g.EMP.workin.dname.Path expressions can be used in the same way as other simple attributes in theSELECTandWHERE clauses. With path expressions, MySQL is able to perform implicit joins3

embedded in the OO schemas. Moreover, path expressions allow us to formulate cyclicqueries such as:Find the employees who manage departments in which they do not work.

2. Class Aliases. In a MySQL query, each class in the FROM clause can be associated withone or more class aliases. These aliases, unique within a query, are used to distinguishdifferent roles of classes in the query.

3. Tuple sets as query results. Every MySQL query returns tuples of attribute values asresults. At present, OO-Myriad does not attempt to associate a class (be it an existing

DERIVING OBJECT ORIENTED FEDERATED DATABASES 11

DEPTa

dname

EMPa

phone

eno

ename

floor

addresscname phone costpname

PRODUCTb

pname cost

Legend:

1-1 relationship1-m relationship

(a)

managed_by

work_in

has_emp

(b)

CUSTbbuy

bought_by

(c)

PRODUCTc

weight

Figure 3. Export Databases: (a)DBa (b)DBb (c)DBc

class or newly created class) in the schema to the result of a query. The same approachto treat OO query results has also been adopted by ORION[26] and Postgres[27].

4. Support for explicit joins between classes. Apart from the implicit joins embedded inthe OO schemas, MySQL supports explicit joins between classes. This allows classesto be joined on predicates which cannot be expressed by path expressions.

5. Querying deep class extent and direct class extent. Most OO query models supportquerying of deep class extent. In some cases, this alone might be too restricted. Forexample, if one needs to retrieve information about people who assume the role ofemployee only, it is not necessary to obtain all employee information, i.e. includingthose who also assume the role of customer.

2.1. Example of OO Export schemas

Throughout this paper, three OO export databases are used to demonstrate our databasemapping and query processing approaches. These export databases are shown in Figure 3.DBa, DBb andDBc are three export databases modeling information about a company.Their classes and attributes are self-explanatory.DBa is an export database containingemployee information. Each employee has a unique name and is assigned a unique employeenumber.DBb maintains information about the customers and the products they purchasedin the past.DBc is a warehouse database that keeps all the product information. Thecost attribute inPRODUCTb is the amount the customer paid in the last transaction. Theamount may be different from the cost provided byDBc since small discounts on certainproducts may be given to customers in order to keep the company competitive.

These export databases may be constructed on non-OO local databases but we assumethat gateways that support OO queries on these export databases are available. We alsoassume that each export class only supportspseudo-object id, i.e. the oid of an object maychange within a transaction. Thus, the export oid and the export database id cannot be usedfor generating unique global oid.

12 LIM AND SRIVASTAVA

3. Mapping from OO Export Schemas to OO Global Schema

In OO-Myriad, our objective is to allow federated DBAs to define federated databases in aflexible manner. This achieves independence between the global schema structures and thelocal schema structures. Furthermore, OO-Myriad allows multiple local database instancesmodeling a real-world entity to be integrated into a global object. In the IRO-DB project[14], database mapping is achieved by specifying a SQL-like query to derive the objectsin a federated database from export databases. Like OO-Myriad, IRO-DB allows a globalobject to be constructed from export database objects representing the same real-worldentity. On the whole, IRO-DB and OO-Myriad share some common integration strategiesexcept that OO-Myriad distinguishes the use of integration operations and focuses on analgebraic framework to deploy these operations.

OO-Myriad adopts an algebraic approach to express the database mapping. In this ap-proach, the federated schema can be freely specified as long as its objects can be computedfrom the participating export databases. Hence, each global class is given an algebraicexpression that computes its object instances from export classes. Moreover, we also asso-ciate with each global relationship an algebraic expression that computes the object-pairsrelating two global classes. In the following subsections, we describe in detail the algebraicmapping approach to derive (i) global classes, (ii) global relationships, and (iii) deep globalclass extent.

3.1. Deriving Global Classes

In deriving the global classes, we are concerned with how the objects in a global class canbe derived from their counterpart in the export classes. The three main issues to be dealtwith are: entity identification, attribute value conflict resolution and global oid generation.Entity IdentificationWe consider three ways in which a global class is derived from the export schemas4:

• In the simplest case, a global class can be directly derived from an export class. In thiscase, a global object corresponds to a single export database object. This scenario isdepicted in Figure 4(a).

• A global class can be derived from multiple export classes where each global objectcorresponds to a single export database object. In other words, the sets of real-worldentities represented by the export classes of each database are mutually exclusive.Figure 4(b) depicts this scenario.

• A global class can be derived from multiple export classes where each global objectcorresponds to multiple export database objects. This is shown in Figure 4(c).

The objective of OO-Myriad is to handle all the three above mentioned cases. Sinceeach global object representing a real-world entity can be derived from a single or multipleexport database object, it is essential that the export classes be merged together share somecommon attribute(s), which determines whether their objects correspond to the same real-world entities. We call these common attribute(s) theentity key. In [9], we showed thatentity key is not necessarily the same as the key attributes of the export classes5. To provide

DERIVING OBJECT ORIENTED FEDERATED DATABASES 13

Global Class

FederatedDatabaseLayer

Export Class

ExportDatabaseLayer

Global Class

FederatedDatabaseLayer

ExportDatabaseLayer

Export Class 2Export Class 1

Global Class

FederatedDatabaseLayer

ExportDatabaseLayer

Export Class 2Export Class 1

(a) (b)

(c)

Legend:

Global object

Export object

Figure 4. Class and Object Mapping Scenarios

a basis for object integration, we assume that every export class has an entity key. Currently,we also require that when a global class is derived from two export classes, the entity keywill be preserved in the global class. Nevertheless, the database mapping approach can beeasily extended to hide the entity key from the global class if necessary.Attribute Value Conflict ResolutionOnce the export database objects corresponding to the same real-world entities have beenidentified, it is often necessary to resolve any conflict in their attribute values before mergingthem into a global object. For example, the same employee may have different age valuesstored in the export database objects. It is necessary to adopt some function, e.g. average,minimum, etc. to resolve the conflicts. In this respect, OO-Myriad adheres quite closelyto Dayal’s approach of using aggregate functions as attribute derivation functions[33]. Tocarry the idea further, OO-Myriad allows the functions to be user-defined.

Global Object IdIn OO-Myriad, each global object has a unique identifier. Since the global object isvirtual(i.e. is derived from export databases but not necessarily materialized), we have to devisetechniques to generate global object ids from their corresponding export database objects.Although the export databases support oids, these cannot be directly used as global oid forthe following two reasons:

1. The local DBS may not be OO and the oids of the same export objects may be differentwhen retrieved by different queries within a transaction. These oids are known as thepseudo-object oids.

2. Even when each export database supports unique oids, the oids across export databasesmay not be unique.

14 LIM AND SRIVASTAVA

PERSON

STUDENT

STUD-EMP

EMP

name

school

year

salary

Figure 5. Global Oid Generation Example

In response to the above two cases, OO-Myriad allows the global oids to be generated bytwo methods described below. To ensure the uniqueness of oids across global classes, theselected oid generation method must be appliedthroughout all global classesin a federatedDB.

• Method 1: The global oids can be computed by combining the entity key values andname of the global class. If the global class is involved in a class lattice6, it is necessaryto designate a representative global class for the lattice and use it (together with entitykey values) to generate the global oids of objects which belong to the class lattice7.This is achieved by applying an oid generation function on the entity key and the rep-resentative global class name. Since the entity key of a global object is unique withrespect to the global class it belongs to, the global oids generated are unique.

Example: Figure 5 depicts a class lattice in the federated schema consisting ofPERSON , STUDENT ,EMPLOY EE andSTUD−EMP global classes, shar-ing name as the entity key. To generate the oids for objects which belong to any ofthese classes, we apply an oid generation function with “PERSON ” (Let PERSONbe the chosen representative class for this class lattice.) and the entity keyname of theglobal objects as the input arguments.

• Method 2: The global oids can be computed by combining the export oids and exportdatabase name. If more than one export object models the same real-world entity, onlyan export oid - export database name pair is needed to generate the global object id.

While the merit of the first method is to accommodate pseudo-object oids, it forbids a globalobject to migrate from one global class to another. If a global object has to migrate, it willhave to assume a different global oid. The second approach does not suffer from the abovelimitation but it requires the export databases to support unique oids.

When an export database is built upon a non-OO local DBS, the entity key or oid of itsobjects may not persist across transactions. Consequently, it is impossible to keep globaloids unique across global transactions. To the best of our knowledge, no solution to thisproblem has been proposed so far. Hence, in this situation we can only settle with globaloids that are unique within a global transaction but not across transactions. This kind of

DERIVING OBJECT ORIENTED FEDERATED DATABASES 15

global oids are known astemporal oid[14]. Currently, OO-Myriad does not maintain themapping between the global oids and their corresponding export objects. We are nowlooking into extending our global oid generation functions to update such mapping tablewhich may be used in processing federated queries that reference global oids.

3.2. Deriving Global Relationship

A relationship attribute links one class to another. LetRA be a relationship attribute thatlinks a global classCA to another global classCB . There are several possible waysRAcan be derived. A few ways8 are illustrated below:

• CA andCB directly correspond to two export classesECA andECB respectively, bothof which are in the same export database.RA corresponds to a relationship attributeERA that linksECA toECB . An example is given in Figure 6(a).

• CA andCB directly correspond to two export classesECA andECB respectively,both of which are in the same export database.RA corresponds to the reverse of arelationship attributeERB that linksECB toECA. Figure 6(b) depicts an example ofthis case.

• CA andCB directly correspond to two export classesECA1 andECB2 respectively,each in a different export database.RA corresponds to a simple attribute ofECA1 suchthat the attribute domain is the entity key ofECB2. An example of this is given inFigure 6(c).

• CA is derived by merging export classesECA1 andECA2. CB is derived by mergingexport classesECB1 andECB2. ECA1 andECB1 are from export database 1 andECA2 andECB2 are from export database 2.RA is a combination of relationshipattributesERA1 andERB2, whereERA1 andERB2 link from ECA1 to ECB1 andfromECB2 toECA2 respectively. An example of this is given in Figure 6(d).

Among the above possibilities, only the first allows us to directly translate a global re-lationship attribute into a relationship attribute possessed by an export class. The othersrequire a flexible mapping mechanism that is able to derive the global relationship in-dependent of how the relationship is represented in the export databases. In the case ofOO-Myriad, each relationship attribute in a global class is assigned an algebraic expressionthat computes the object pairs linking the class to the destination class. Each object paircontains the oids or entity keys of the global objects involved in the relationship. Examplesof deriving global relationships will be given in Section 3.5.

3.3. Deriving Deep Class Extent

In OO-Myriad, a global object belongs to only a global class. The set of objects that belongto a class is known as theclass extent. Thedeep extentof a class refers to the union ofthe class extent and the extent of all its direct and indirect subclasses. Given a classC,we useC to denote its class extent andC∗ to denote its deep extent. Despite the implicitmathematical relationship between the deep extent of a class and its extent, as well as the

16 LIM AND SRIVASTAVA

EMP DEPTEMP DEPT

Employee Department

workin

EMP DEPT

workin

Database

Database

Federated

Layer

Export

Layer

EMP DEPTEMP DEPT

workindeptNum

deptName

Legend:

Export database

EMP DEPTEMP DEPT

workin

deptNum

EMP DEPT

workin

Database

Database

Federated

Layer

Export

LayerEmployee Department

has_emp

(b)(a)

(c)

(d)

Database

Database

Federated

Layer

Export

LayerDepartment_b

Database

Database

Federated

Layer

Export

LayerEmployee_b Department_bEmployee_a

workin

Employee_a dept# deptName

Department_a has_emp

Figure 6. Relationship Mapping Examples

extent of all its subclasses, OO-Myriad allows the derivation of deep class extent to bebased on integration semantics independent of this mathematical relationship. For eachglobal class, we have to specify the algebraic expressions that compute its class extent anddeep class extent. The following example demonstrates that sometimes deep class extentsshould be derived independently.

Example: Figure 7 depicts three export databaseDB1, DB2 andDB3, keeping up-to-date information about employees, managers and non-managers respectively. Suppose it isknown that theeno, name andsalary of all managers and non-managers are contained inEMP1. Furthermore,NON MGR3 models a complete set of non-managers. To computethe extent ofEMP , we may specify an algebraic expression onNON MGR3. The extentof MGR can be derived fromMGR2 algebraically. Instead of specifying the deep extentof EMP as the union of theEMP ’s extent andMGR’s extent, it is simpler to derive thedeep extentEMP fromEMP1 directly.

3.4. Integration Operations

As OO-Myriad requires algebraic expressions to compute both global classes and relation-ships between classes, it is important to decide the appropriate set of integration operations

DERIVING OBJECT ORIENTED FEDERATED DATABASES 17

name eno salary

Export DB: DB1

EMP1

name eno office#

Export DB: DB2

MGR 2

name eno

Export DB: DB3

NON_MGR 3

name

eno

office#

salary

salary

salary

Federated Database

EMP

MGR

Figure 7. Derivation of Deep Class Extent

to be utilized in the algebraic expressions. These are also the operations to be supported bythe federated query processor (FQP) so that the FQP can handle data integration at runtime.

In OO-Myriad, we focus on integration operations that are often used. In the domain ofrelational FDBS, the predecessor of OO-Myriad, i.e. Myriad, has adopted the conventionalrelational operations, e.g.1, σ, π,− and∪ as well as two additional integration operations,namely two-way outerjoin (denoted by

↔1) andgeneralized attribute derivationoperation

(denoted byGAD). The↔1 operation assembles multiple sets of objects from different

export databases which model the same set of real-world entities. Predicates are associatedwith the

↔1 to determine the export objects that correspond to the same real-world entities

and therefore can be merged together.GAD is an unary operation that derives attributevalues of global objects using any system- or user-defined resolution functions. The originaldefinition ofGAD given in [28] is shown below:Definition: (Generalized Attribute Derivation - GAD)LetR be a relation with attributesA, andFi’s be attribute resolution functions.

GAD(R,F1(X1), F2(X2), · · · , Fm(Xm)) = {< F1(X1(r)), F2(X2(r)), · · · ,Fm(Xm(r)) > |r ∈ R} whereXi ⊆ A

In the above definition,GAD has been defined to operate on only one relation. Attributeresolution functions,Fi’s, can be any functions that transform or combine attribute values.EachFi is restricted to have atomic values as input and output arguments. To deployGADoperation for OO-Myriad, we extendGAD to operate on set arguments. HenceFi’s mustaccept non-atomic values as input and compute a set-value result.

Example: Suppose we wish to integrate the objects from export classesPRODUCTbandPRODUCTc given in Section 2.1. We can first outerjoin the objects fromPRODUCTbandPRODUCTc followed by aGAD operation that merges the matching objects together.Figure 8 depicts this9.

In this example, we employ several attribute resolution functions to merge the attributesof PRODUCTb andPRODUCTc objects. F oid is an oid generation function. Itcomputes oid from a class name and an entity key value. In this case, we have adopted the

18 LIM AND SRIVASTAVA

PRODUCTb cPRODUCT(oid_b,pname_b,cost_b) (oid_c,pname_c,cost_c,weight_c)

,

weight = F_i(weight_c))GAD cost = F_avg(cost_b,cost_c),

(oid = F_oid("PRODUCT",pname_b,pname_c),pname = F_any(pname_b,pname_c)

Figure 8. Example of usingGAD operation

EMPa CUSTb

ename_a=cname_b

(oid_a,ename_a,eno_a,phone_a) (oid_b,cname_b,address_b,phone_b)

Figure 9. Example of using Outer-difference

oid generation method 1. If method 2 is used, a different oid generation function has to beused.F any returns any non-null input values,F avg returns average of input values andF i is the identity function.

In general, attribute resolution functions can be treated as black boxes. By permittingthem to be user-defined, we have made theGAD operation flexible enough to resolve awide variety of attribute value conflicts. It is beyond the scope of this paper to optimizethe code of arbitrary attribute resolution functions. We are currently conducting researchin exploring the attribute resolution function semantics to perform query optimization.

Other than the extendedGAD operation, we have defined an outer-difference operation(denoted byª) to distinguish the export objects representing real-world entities modeledby only one of the two export classes that model overlapping sets of real-world entities.The formal definition of outer-difference is given below:Definition: (Outer-difference)LetO1(A) andO2(B) be two sets of export objects andp(X,Y ) be a predicate onX andY attributes ofO1 andO2 respectively (X ⊆ A, andY ⊆ B). p is the predicate thatdetermines if two export objects represent the same real-world entity.

O1 ªp(X,Y ) O2 = {o1|o1 ∈ O1 ∧ ¬∃o2 ∈ O2 s.t. p(o1.X, o2.Y )}

Example: Using the example export databases in Section 2.1, to obtain the set of globalobjects that represents employees but not customers, we may perform outer-difference onEMPa andCUSTb as shown in Figure 9.

Mathematically, anyª can be represented by an expression involving↔1, 1, − andπ

operations. That is,

DERIVING OBJECT ORIENTED FEDERATED DATABASES 19

EC

PERSON

enofloor

dname

phone

name

address pname

cost

weight

managed_by

EMPwork_in

has_empDEPT CUST PRODUCT

bought_by

buy

Figure 10.Example Global Schema

O1 ªp O2 = πA((O1↔1p O2)− (O1 1p O2))

Hence, by includingª, we have anon-minimalset of operations10. However, in thefederated database context,ªoperation is frequently needed to define the database mappingsas will be illustrated by our database mapping example in Section 3.5. Withoutª, itwill be awkward to specify the database mapping using complex expressions which alsocomplicates query processing and optimization. We therefore choose to includeª as oneof the integration operations supported by OO-Myriad.

With the availability ofª, it is interesting to note that the functionality of↔1, 1, andª now

resembles that of the set operations∪, ∩ and− respectively. The main difference betweenthe two sets of operations lies in the kind of objects on which they operate.{∪,∩,−} isdesigned to operate on homogeneous objects but{↔1,1,ª} is best used with heterogeneousobjects11. Their differences are summarized in the following table:

{∪,∩,−} {↔1,1,ª}operate on union-compatibleable to operate on union-sets only incompatible setsreturn attribute set identical

↔1 and1 return

to that of their input sets attributes of both operands;ª returns the attributesof the first operand

no predicate required predicate required to identifywhich export objects refers tothe same global objects

3.5. Example of Algebraic Approach to Database Mapping

In this section, we demonstrate the algebraic approach to database mapping using a federateddatabase example. Suppose the federated schema the global users wish to construct uponour export database example (refer to Section 2.1) is shown in Figure 10. To simplify theexplanation, the example global schema is designed to preserve most export classes andattributes with minor re-organization of employee and customer classes. Nevertheless, thisdoes not prevent us from specifying a global schema which excludes some export classesor attributes.

20 LIM AND SRIVASTAVA

In this federated schema example, the real-world employee and customer entities havebeen grouped intoEMP , CUST , PERSON andEC classes.EC, being a commonsubclass ofEMP andCUST , keeps information about people who are both employeesand customers of the company.

In additional to the knowledge about federated schema, we assume that the followingintegration semantics are available:

• Each global object inDEPT corresponds to an export object inDEPTa and thedepartment namedname is the entity key for department entities.

• The export classesEMPa andCUSTb are to be combined in a way to form the globalclass lattice consisting ofEMP , CUST , PERSON andEC. Suppose all entitiesmodeled by these classes can be identified by their names, i.e.name is the entity key.

• The export classesPRODUCTb and PRODUCTc are to be combined into thePRODUCT global class. SomePRODUCT objects correspond to eitherPRODUCTb or PRODUCTc export objects while otherPRODUCT objects canbe merged fromPRODUCTb andPRODUCTc export objects. Here, we assumethat product entities have product name as the entity key.

• The oid of each global class can be derived by a function of the global class name andits respective entity key.

Apart from specifying the global and export schema information, the OO-Myriad feder-ated database administrators must define the algebraic mappings that derive (i) extent ofglobal classes, (ii) deep extent of global classes, and (iii) object pairs of global relation-ships. Each algebraic mapping is represented as an operator tree involving the integrationoperations supported by OO-Myriad.

Figure 11 depicts the operator trees that define the class extent ofEMP , CUST , EC,DEPT , andPRODUCT . Note that no operator tree has been defined forPERSONbecause its class extent is empty. SincePERSON , EMP , CUST andEC are involvedin a class lattice, we have chosenPERSON to be the representative class of this collectionof global classes and use it inFoid to generate oids for objects in these global classes.

Figure 12 depicts the operator trees that define the deep extent ofPERSON , EMP ,CUST , EC, DEPT andPRODUCT . For classes which have no subclass, e.g.EC,DEPT andPRODUCT , the operator trees of their extent and deep extent are identical.

Figure 13 depicts the operator trees that define the object pairs of the relationshipsmanaged by, has emp, buy, andbought by andwork in. In this example, each ob-ject pair contains the entity keys of the global entities involved in the relationship. Tode-reference a relationship attribute of an export class such asmanaged by inDEPTa, wehave introduced a new operationDREF which replaces one or more relationship attributesby some attributes of the destination classes. Formally, we can defineDREF as follows:Definition: (De-reference - DREF)Let O be a set of objects. LetR1, · · · , Rm be relationship attributes ofO andX be theremaining attributes. For eachi, letRi.Ai be the attributes accessible throughRi.

DREF (O,R1 → R1.A1, · · · , Rm → Rm.Am) = {< o.X, o.R1.A1, · · · , o.R1.Am >|o ∈ O}

DERIVING OBJECT ORIENTED FEDERATED DATABASES 21

EMPa CUSTb

GADname = F_i(ename_a),eno = F_i(eno_a),

ename_a=cname_b

(oid = F_oid("PERSON",ename_a),

phone = F_i(phone_a))

(oid_a,ename_a,eno_a,phone_a,work_in_a)

(oid_b,cname_b,address_b,phone_b,buy_b)

(oid_b,cname_b,address_b,phone_b,buy_b)

(oid_a,ename_a,eno_a,phone_a,work_in_a)

EC:

CUSTb

GADname = F_i(ename_a),eno = F_i(eno_a),

ename_a=cname_b

(oid = F_oid("PERSON",ename_a),

phone = F_any(phone_a,phone_b),address = F_i(address_b))

EMPa

GAD (oid=F_oid("PRODUCT",F_any(pname_b,pname_c)),pname=F_any(pname_b,pname_c),cost=F_min(cost_b,cost_c),weight=F_i(weight_c))

(oid_b,pname_b,cost_b,bought_by_b)PRODUCTb PRODUCTc

(oid_c,pname_c,cost_c,weight_c)

pname_b=pname_c

EMP:

PRODUCT:

Figure 11.Algebraic Mappings for Global Class Extent

SinceDREF operations have to be translated into path expressions in the gatewayqueries, OO-Myriad query processor ensures that aDREF operation is always translatedtogether with its associated export class. In the current OO-Myriad gateway implementation,each object is restricted to have only atomic attribute values. Export objects retrieved from agateway are unnested if they have set attribute values. Therefore,DREF currently alwaysgenerates objects with atomic attribute values.

4. OO-Myriad Query Processor Architecture

In this section, we describe the OO-Myriad query processor architecture and its componentsas shown in Figure 14. The query processor consists of two different kinds of components,namelyfederated query manager (FQM)and federated query agent. Residing at thequery site, FQM receives queries submitted by global user or application, generating theexecution plans before sending them to the FQA for execution. A detailed description ofthe plan generation strategy is given in Section 5.2. When the query results are returnedfrom some FQAs, FQM forwards them back to the global users or applications. FQAsreside on the sites whose export databases are involved in the queries and interface to the

22 LIM AND SRIVASTAVA

(oid_a,ename_a,eno_a,phone_a,work_in_a)

EMPa

GAD(oid=F_oid("PERSON",ename_a),name=F_i(ename_a),eno=F_i(eno_a),phone=F_i(phone_a))

EMP*:

EMPa CUSTb

GAD

ename_a=cname_b

(oid_a,ename_a,eno_a,phone_a,work_in_a)

(oid_b,cname_b,address_b,phone_b,buy_b)

(oid = F_oid("PERSON",F_any(ename_a,cname_b)),name = F_any(ename_a,cname_b),phone = F_any(phone_a,phone_b))

PERSON*:

Figure 12.Algebraic Mappings for the Deep Extent of Global Classes

(managed_by_a -> managed_by_a.ename)DREF

(oid_a,dname_a,floor_a,managed_by_a,has_emp_a)DEPTa

p (dname_a,managed_by_a)

DREF

(oid_a,dname_a,floor_a,managed_by_a,has_emp_a)DEPTa

p (dname_a,has_emp_a)

(has_emp_a -> has_emp_a.ename)

managed_by:

has_emp:

Figure 13.Algebraic Mappings for Global Relationships

OO gateways handling the export database queries. In our design, when the FQAs interpretthe execution plan generated by FQM, they execute the plan together with the gateways.FQAs are designed to collaborate with one another by sending intermediate results amongthemselves due to inter-site joins.

FQM and FQAs together support fully distributed query processing strategies. Thatis, federated query results are computed by having the FQAs at different sites perform-ing computations and exchanging intermediate results without the centralized coordinationby FQM. The distributed query execution model is different from the popular client/server

DERIVING OBJECT ORIENTED FEDERATED DATABASES 23

DBLocal Local

DB

GatewayDataDictionary

local DB to(Mapping from

export DB)

GatewayDataDictionary

local DB to(Mapping from

export DB)

ResultIntermediate(FQA)

Query AgentFederated Federated

Query Agent(FQA)

Federated Query Manager(FQM)

Query Site

Local Site Local Site

Gateway Gateway

Local DBS Local DBS

MySQLMySQL Result

ResultQueryLocal

Result

Local ResultQuery

Federated Query(MySQL)

Query Result

DirectoryInformation

executionplanplan

execution Final FederatedQuery Result

Dictionary(Mapping from

Federated DB)Export DBs to

Federated Data

Figure 14.OO-Myriad Query Processor Architecture

model supported by major database vendors. While it is more difficult to design a distributedcoordination protocol for FQAs and gateways to work together, the distributed architec-ture make OO-Myriad a suitable platform to experiment with different query processingstrategies.

At the query site, FQM obtains the location and capabilities information about FQAs,gateways and local DBSs from a directory database. The federated schema and databasemapping information is kept in a federated data dictionary. To allow gateways to be genericenough to support queries on different export databases, each gateway is supplied with thelocal database to export database mapping kept in the gateway data dictionary. However,this does not imply that the same gateway can operate on different local DBMSs. Instead,the idea here is to avoid writing different gateway codes for export databases implementedon the same DBMS. While the design and implementation of gateways used in OO-Myriadis beyond the scope of this paper, we list their functionalities as follows:

• Transaction processing: They can perform begin, commit and abort transaction opera-tions.

• Query processing: They may be able to create and destroy temporary classes used inquery processing. They can process MySQL queries on both the export classes and thetemporary object classes,

• Resource management: They can allow export databases to be opened or closed whencorrect user names and passwords are given.

24 LIM AND SRIVASTAVA

5. Generation of Global Execution Plan

5.1. Design Considerations

Preserving local autonomy and heterogeneity are two main considerations in the designof any FDBS component. In OO-Myriad as well as its predecessor Myriad, the globalexecution plans do not carry the decisions as to how the operations at gateways and localDBSs are to be performed. Gateways and local DBSs enjoy full execution autonomyas they are not part of the FDBS query processor. To adapt itself to a heterogeneouscomputing environment, a federated query processor must work with FQAs and gatewayswith different query processing capabilities. OO-Myriad therefore maintains the capabilityinformation about its FQAs as well as the gateways using anoperation catalog. FQMassigns the operations in the global execution plan according to the catalog. To reduce thecommunication overhead between FQAs and gateways, our global execution plan coalescesoperations that can be executed at the gateways. This is unlike most distributed DBSs whichinstruct their local processors to evaluate operations one at a time.

5.2. OO-Myriad’s Query Processing Steps

The query processing steps in OO-Myriad include (i) parsing, (ii) query augmentation, (iii)execution plan generation and (iv) plan execution. Step (i), (ii) and (iii) are performed at theFQM while (iv) involves FQM, FQAs and gateways. We describe each of these steps usinga query example based on our federated schema example shown in Figure 10. The followingquery finds the employees who work on the 4th storey and bought some product heavierthan 300kg from the company; and retrieves the names of people and the departments theywork in.

select R1.name, R1.workin.dnamefrom EC R1where R1.workin.floor = 4 and R1.buy.weight>300

ParsingIn this step, FQM performs syntax check on the MySQL query submitted by the globalapplication or user. Erroneous queries are rejected without further processing. For acorrectly formulated query, FQM extracts the query components in its select, from, andwhere clauses into the internal data structures corresponding to the three clauses.Query AugmentationFederated database queries are targeted at federated schemas. The purpose of query aug-mentation is to translate these queries into queries on export classes. This is achieved bythe following two steps:

• We construct, for each federated query, an operator tree that has leaf nodes representingthe sets of objects or object pairs which correspond to the global classes and relationshipsused in the query. This is illustrated in Figure 15. In the object-oriented query model,apart from the global classes explicitly stated in the from clauses (also known as theanchor classes), other global classes may be referenced by the path expressions in the

DERIVING OBJECT ORIENTED FEDERATED DATABASES 25

DEPT* buy PRODUCT*workin(oid_d, (name_b, (oid_p,(oid_ec,

name_ec,phone_ec,eno_ec,address_ec)

(name_w,dname_w) dname_d,

floor_d)pname_b) pname_p,

cost_p,weight_p)

name_ec=name_wdname_w=dname_dname_ec=name_b

pname_b=pname_p

floor_d=4weight_p>300

p (name_ec,dname_d)

Anchor Class

EC

Figure 15.Query Augmentation: Step 1

select and where clauses. Other than the anchor classes, the deep extent of global classeshave to be used in forming the operator tree. In the operator tree, each relationshipattribute used in the path expression will be represented by the corresponding set ofobject pairs.

• For each class extent, deep class extent and relationship used in the initial operator tree,we replace them by the corresponding algebraic mappings supplied by the databasemappings. This is illustrated in Figure 16.

Query Fragment GenerationHaving augmented the query operator tree with database mapping information, we next

perform query optimization and determine thequery fragments to be executed at thelocal sites. A query fragment is defined as a query processing task unit executable byeither a FQA or a gateway. In the FDBS environment, the query optimization problem isknown to be difficult due to autonomous local query processors and heterogeneity betweenlocal databases. While some work in federated query optimization has been reported in[29, 30, 31], the optimization of federated queries involving new integration operationshas not been well studied. Lately, some transformation rules for outerjoins andGADoperations have been given in [32, 28]. In OO-Myriad, the query optimization problem isstill being investigated and is beyond the scope of this paper. However, we require the endresult of query optimization to be an operator tree with nodes assigned to specific FQAs andgateways at the local sites. The operation to site and FQA/gateway assignment must observethe operating constraints specified in the operation catalog mentioned in Section 5.1.

The purpose of generating query fragments is to allow each FQA or gateway to be awareof the query processing tasks assigned to it, and to perform the tasks accordingly. Currently,a FQA query fragment consists of only one operation executable at the FQA. In contrast,a gateway query fragment consists of a set of adjacent operations which collectively canbe performed as an export database query. In case a gateway query fragment consumes

26 LIM AND SRIVASTAVA

GAD

PRODUCTb PRODUCTc

DREF

p

CUSTb

DREF

p

EMPa

name_ec=name_wdname_w=dname_dname_ec=name_b

pname_b=pname_p

floor_d=4weight_p>300

p (name_ec,dname_d)

EMPa CUSTb

GAD

DEPTa

GAD

Figure 16.Query Augmentation: Step 2

one or more intermediate results, it is necessary to designate a temporary object class toeach of the intermediate results. For such a gateway query fragment, the temporary objectclasses corresponding to the intermediate results have to be created and populated beforethe fragment can be executed. Assume query optimization has been performed on ouraugmented operator tree example and each tree node has been assigned to either FQA orGW, as well as a local site as shown in Figure 17. To obtain the query fragment, OO-Myriad employs a simplerecursive tree walkalgorithm. The dotted regions denote thequery fragments extracted.Plan ExecutionIn this phase, FQM disseminates the global execution plan together with query fragmentinformation to FQAs at all local sites involved. FQAs interpret the plan and execute itaccordingly. To reduce query execution time, FQAs and gateways are allowed to processtheir tasks concurrently. The FQA which has the query fragment containing the root nodeof execution plan will return the final query result to the query initiator.

6. Conclusions

In this paper, we present a novel approach to support object-oriented query processing overmultiple heterogeneous database servers. We begin with a careful examination of differentobject-oriented schema constructs in a federated database and their derivation from theparticipating export databases. The schema constructs that can be derived include the globalobject ids, the extent and deep extent of global classes, and global relationships. To derivea federated database, we propose a set of integration operations to be used in an algebraicmapping approach that is designed to merge export databases together. The proposedintegration operations include two-way outerjoin, outer-difference(ª), generalized attribute

DERIVING OBJECT ORIENTED FEDERATED DATABASES 27

GAD

PRODUCTb PRODUCTc

DREF

p

CUSTb

DREF

p

EMPa

name_ec=name_wdname_w=dname_dname_ec=name_b

pname_b=pname_p

floor_d=4weight_p>300

p (name_ec,dname_d)

EMPa CUSTb

GAD

DEPTa

GAD

FQA query frag. atsite a

gateway queryfrag. at site a

gateway query

gateway querygateway query

gateway query

frag. at site a

frag. at site a

gateway queryfrag. at site b frag. at site b

frag. at site c

gateway queryfrag. at site b

FQA query frag. at

FQA query frag. at

FQA query frag. atsite a

site a

site aFQAb

FQAa

FQAb

GWa

GWa

FQAa

GWa

GWa GWb

GWa

GWa

GWa

GWa

GWb

GWb

GWb

GWb GWc

gateway queryfrag. at site a

Figure 17.Query Fragments Generation

derivation(GAD), de-reference(DREF ), and other usual relational operations such as join,project and selection. This mapping approach is known to be flexible and extensible. Theset of integration operations and the export to federated database mapping technique havebeen implemented in a FDBS prototype known as OO-Myriad. We further describe theOO-Myriad query processor architecture and its processing strategies. We also illustratethe usefulness of the proposed database mapping using an example federated database andits component export databases.

6.1. Status of Project

The database mapping and query processing approaches presented in this paper have beenimplemented in the OO-Myriad prototype. Since we extend only the database mapping andquery processing aspects of the Myriad FDBS, the implementation of OO-Myriad changesonly the query processing module of Myriad, leaving the transaction and communicationmodules intact.

Currently, the OO-Myriad prototype operates in the UNIX environment with the commu-nication module implemented on TCP/IP. To demonstrate its functionality, we have createdthree local relational databases in Postgres12[27] and have built three OO gateways to sup-port the sample export databases given in Section 2.1. Based on the export databases, weconstructed the necessary database mapping information to support the sample federatedschema given in Figure 10.

28 LIM AND SRIVASTAVA

6.2. Future Work

Future research in the OO-Myriad project involves:

• Federated query optimization:With a new set of integration operations, OO-Myriadrequires a new set of algebraic transformation rules in order to perform either heuristic orcost-based query optimization effectively. Currently, we are investigating the possibilityof incorporating calibrated local cost model into the query processor. A systematicevaluation of different federated query optimization techniques can also be performedusing the OO-Myriad prototype.

• Improvements to query augmentation:While the new database mapping technique isflexible in handling large class of schematic and data heterogeneities between object-oriented export databases, it may generate complex query expressions that involvemany joins between subexpressions representing global class extent/deep extent, andrelationships. For cases where global relationships directly correspond to relationshipsin the export schema, it is possible to modify our query augmentation strategy to avoidunnecessary joins.

• Full-fledged object-oriented data model:Although the present OO-Myriad prototypesupports basic object constructs (inheritance, object ids, and relationship attributes) andobject-oriented queries, it has yet to incorporate some object-oriented features such asaggregate data types (sets, lists, etc.) and methods. We are currently investigating thisand will include these features in OO-Myriad in the future.

Acknowledgments

The work reported in this paper is an extension of the Myriad FDBS research. The tremen-dous effort in making software modules reusable by the original Myriad team has en-abled us to easily extend Myriad with object-oriented query facilities on the object-orientedglobal schemas. The original Myriad members include San-Yih Hwang, Dave Clement,M. Ganesh, Satish Musukula, Kajal Claypool and Sharon Yang. We also thank Hon-KuanLee for implementing the new gateways for the OO-Myriad prototype.

Notes

1. We allow some export schemas to be directly accessed by the global users.

2. A federated schema is defined to be any integrated schema constructed from multiple export schemas. Hence,we do not restrict the FDBS to have only one global schema.

3. By implicit join, we mean that a path expression allows us to navigate from one class to another withoutperforming a join operation.

4. These are by no means the only three ways since it can be shown that a global class can be derived from somecomplex attributes. We also exclude the second order integration which is sometimes necessary[34].

5. A different termextended keyhas been used in [9].

6. Any set that is ordered partially by some relation is known as aposet. A lattice is a poset such that every pairof elements has a unique greatest lower bound and least upper bound.

DERIVING OBJECT ORIENTED FEDERATED DATABASES 29

7. To be formally correct, it should be class poset instead of class lattice.

8. In fact, there are many more possibilities. To save space, we only present a few.

9. We have renamed the attributes of classes to avoid attribute name conflicts.

10. A set of operations is minimal if no operation in the set can be realized by some composition of the otheroperations in the set.

11. It can be formally shown that∪, ∩ and− can be realized by outerjoin, join and outer-difference respectively.

12. In this case, we use Postgres purely as a relational DBMS.

References

1. D. Clements, M. Ganesh, S.-Y. Hwang, K. Mediratta, E.-P. Lim, J. Srivastava, J. Stenoien, and Yang H.-R.Myriad: Design and implementation of a federated database prototype. InACM SIGMOD InternationalConference on Management of Data, Minneapolis, 1993. Proposal Demonstration.

2. A.P. Sheth and J.A. Larson. Federated database systems for managing distributed heterogeneous, andautonomous databases.ACM Computing Surveys, 22(3), September 1990.

3. M.W. Bright, A.R. Hurson, and S.H. Pakzad. A taxonomy and current issues in multidatabase systems.IEEE Computer, March 1992.

4. W. Kent. Object-orientation and interoperability. In A. Dogac, M.T. Ozsu, A. Biliris, and T. Sellis, editors,Advances in Object-Oriented Databases, volume 230 ofNATO ASI Series, Series F: Computer and SystemsSciences. Springer-Verlag, 1994.

5. N. Coburn and P.-A. Larson. Information repository requirements of the cords multidatabase service.Technical report, University of Waterloo, 1993.

6. S.M. Deen, R.R. Amin, and M.C. Taylor. Data integration in distributed databases.IEEE Trans. on SoftwareEngineering, SE-13(7):860–864, July 1987.

7. W. Kent. The entity join. InProc. of the 5th VLDB Conf., 1979.8. A. Chatterjee and A. Segev. Data manipulation in heterogeneous databases.SIGMOD Record, 20(4),

December 1991.9. E-P. Lim, J. Srivastava, S. Prabhakar, and J. Richardson. Entity identification problem in database integration.

9th International Conference on Data Engineering, 1993.10. L.G. DeMichiel. Resolving database incompatibility:an approach to performing relational operations over

mismatched domains.IEEE TKDE, 1(4), 1989.11. E-P. Lim, J. Srivastava, and S. Shekhar. Resolving attribute incompatibility in database integration: An

evidential reasoning approach.10th International Conference on Data Engineering, 1994.12. F. S-C. Tseng, A.L.P. Chen, and W-P. Yang. A probabilistic approach to query processing in heterogeneous

database systems. InRIDE TQP 92, 1992.13. R. Busse, P. Fankhauser, G. Huck, and W. Klas. Iro-db: An object-oriented approach towards federated

and interoperable dbms. InProceedings of the International Workshop on Advances in Databases andInformation Systems ADBIS’94, Moscow, Russia, May 1994.

14. R. Busse, P. Frankhauser, and E.J. Neuhold. Federate schemata in odmg. InExtending Information SystemsTechnology - Proceedings of the Second International East-West Database Workshop, Klagenfurt, Austria,September 1994.

15. G. Gardarin, F. Sha, and Z-H. Tang. Calibrating the query optimizer cost model of iro-db, an object-orientedfederated database system. InProceedings of the 22nd VLDB Conference, Mumbai,India, 1996.

16. R.G. Cattell.Object Databases: The ODMG-93 Standard. Morgan Kaufmann, 1993.17. R. Ahmed, P.D. Smedt, W. Du, W. Kent, M. Ketabchi, W.A. Litwin, A. Rafii, and M-C. Shan. The pegasus

heterogeneous multidatabase system.IEEE Computer, December 1991.18. W. Du, R. Krishnamurthy, and M.C. Shan. Query optimization in a heterogeneous dbms. InProc. of the

18th VLDB Conf., 1992.19. Q. Chen and M-C. Shan. Abstract view objects for multiple oodb integration. InObject Technologies for

Advanced Software: 1st JSSST Int’l Symposium, Kanazawa, Japan, November 1993.20. D. Fang, J. Hammer, and D. McLeod. A mechanism and experimental system for function-based sharing in

federated databases. InProceedings of the IFIP DS-5 Working Conference on the Semantics of DatabaseInteroperability, Australia, 1992.

30 LIM AND SRIVASTAVA

21. D. Fang, J. Hammer, D. McLeod, and A. Si. Remote-exchange: An approach to controlled sharing amongautonomous, heterogeneous database systems. InProceedings of the IEEE Spring Compcon, San Franciso,February 1991. IEEE.

22. G. Wiederhold. Mediators in the architecture of future information systems.IEEE Computer, 25(5), March1992.

23. S. Chawathe, H. Garcia-Molina, J. Hammer, K. Ireland, Y. Papakonstantinou, J. Ullman, and J. Widom. Thetsimmis project: Integrating of heterogeneous information sources. InIPSJ Conference, Tokyo, 1994.

24. Y. Papakonstantinou, H. Garcia-Molina, and J. Widom. Object exchange across heterogeneous informationsources. InInt’l. Conf. on Data Engineering, Taipei, March 1995.

25. A.Y. Levy, A. Rajaraman, and J.J. Ordille. Querying heterogeneous information sources using sourcedescriptions. InProceedings of the 22nd VLDB Conference, Mumbai,India, 1996.

26. J. Banerjee, W. Kim, and K.C. Kim. Queries in object oriented databases. InProc. IEEE Data EngineeringConf., Feb. 1988.

27. M. Stonebraker and G. Kemnitz. The postgres next-generation database management system.Communica-tions of the ACM, 34(10), Oct. 1991.

28. E-.P. Lim, J. Srivastava, and S-.Y. Hwang. An algebraic transformation framework for multidatabase queries.Distributed and Parallel Database Journal, 3(3), 1995.

29. S. Salza, G. Barone, and T. Morzy. Distributed query optimization in loosely coupled multidatabase systems.In International Conference on Database Theory, Prague, 1994.

30. H. Lu and M-C. Shan. On global query optimization in multidatabase systems. InRIDE TQP 92, 1992.31. Q. Zhu and P-A. Larson. A query sampling method for estimating local cost parameters in a multidatabase

system. InProceedings of the 10th Int’l Conf. on Data Engineering, 1994.32. A.L.P. Chen. Outerjoin optimization in multidatabase. InProceedings of Databases in Parallel and Dis-

tributed Systems, pages 211–217, 1990.33. U. Dayal and H-Y. Hwang. View definition and generalization for database integration in multibase: A

system for heterogeneous distributed databases.IEEE Trans. Software Eng., SE-10(6), November 1984.34. R. Krishnamurthy, W. Litwin, and W. Kent. Interoperability of heterogeneous databases with schematic

discrepancies. InProc. of the 1st Int’l Workshop on Interoperability in Multidatabase Systems, 1991.


Recommended