+ All Categories
Home > Documents > Specification and Management of Interdependent Data in Operational Systems and Data Warehouses

Specification and Management of Interdependent Data in Operational Systems and Data Warehouses

Date post: 12-Nov-2023
Category:
Upload: umbc
View: 0 times
Download: 0 times
Share this document with a friend
46
Distributed and Parallel Databases 5, 121–166 (1997) c 1997 Kluwer Academic Publishers. Manufactured in The Netherlands. Specification and Management of Interdependent Data in Operational Systems and Data Warehouses DIMITRIOS GEORGAKOPOULOS [email protected] GTE Laboratories Incorporated, 40 Sylvan Road, MS-62, Waltham, MA 02254 GEORGE KARABATIS [email protected] Bellcore, 445 South Street, Morristown, NJ 07960 SRIDHAR GANTIMAHAPATRUNI [email protected] Sybase Incorporated, 6601 Bay Street, Emeryville, CA 94608 Received June 21, 1996; Accepted November 6, 1996 Recommended by: Ahmed Elmagarmid Abstract. (Inter)Dependent objects include data replicated or cached in multiple database systems, data col- lected and summarized in data warehouses for analysis, planning, and decision support, as well as any other category of objects whose states are related and they are maintained in different information systems. In this paper we discuss dependencies between objects in an environment consisting of operational systems and a data warehouse, and describe their specification and enforcement. To specify object dependencies we introduce Object Dependency Descriptors (ObjectDDs). These describe the relationships between dependent objects, and define how much inconsistency between original objects and their replicas/collections/summaries can be tolerated before it is necessary to restore their consistency. Object dependencies are enforced by extended transactions designed specifically for evaluating if dependent objects satisfy their specified relationships, evaluating whether possible inconsistencies can be tolerated, and (if not) restoring consistency. To describe the transactional be- havior of such consistency evaluation and restoration transactions we use Transaction Dependency Descriptors (TransactionDDs). TransactionDDs define the transactional relationships between consistency evaluation and restoration (asynchronous) transactions, as well as the relationships between such asynchronous transactions and regular (synchronous) transactions executed directly by applications. To automatically maintain the consistency of dependent objects, we propose the concept of a Dependency Management System (DMS ). A DMS monitors depen- dent objects, evaluates object consistency, and schedules and controls consistency restoration transactions to keep dependent objects within acceptable consistency levels. We describe key components in the DMS architecture, and a relatively simple implementation involving straightforward extensions in a relational DBMS. Keywords: data replication, data summarization, asynchronous transactions, ECA rules, activity models, extended transactions 1. Introduction A set of objects are (inter)dependent if they are maintained by different information systems and their states (i.e., their values and behaviors) are related to each other. In addition to defining such object relationships, object dependencies define consistency requirements for making dependent objects consistent with each other.
Transcript

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

Distributed and Parallel Databases 5, 121–166 (1997)c© 1997 Kluwer Academic Publishers. Manufactured in The Netherlands.

Specification and Management of InterdependentData in Operational Systems and Data Warehouses

DIMITRIOS GEORGAKOPOULOS [email protected] Laboratories Incorporated, 40 Sylvan Road, MS-62, Waltham, MA 02254

GEORGE KARABATIS [email protected], 445 South Street, Morristown, NJ 07960

SRIDHAR GANTIMAHAPATRUNI [email protected] Incorporated, 6601 Bay Street, Emeryville, CA 94608

Received June 21, 1996; Accepted November 6, 1996

Recommended by: Ahmed Elmagarmid

Abstract. (Inter)Dependentobjects include data replicated or cached in multiple database systems, data col-lected and summarized in data warehouses for analysis, planning, and decision support, as well as any othercategory of objects whose states are related and they are maintained in different information systems. In thispaper we discuss dependencies between objects in an environment consisting of operational systems and a datawarehouse, and describe their specification and enforcement. To specify object dependencies we introduceObject Dependency Descriptors(ObjectDDs). These describe the relationships between dependent objects, anddefine how much inconsistency between original objects and their replicas/collections/summaries can be toleratedbefore it is necessary to restore their consistency. Object dependencies are enforced byextended transactionsdesigned specifically for evaluating if dependent objects satisfy their specified relationships, evaluating whetherpossible inconsistencies can be tolerated, and (if not) restoring consistency. To describe the transactional be-havior of suchconsistency evaluationandrestorationtransactions we useTransaction Dependency Descriptors(TransactionDDs). TransactionDDs define the transactional relationships between consistency evaluation andrestoration (asynchronous) transactions, as well as the relationships between such asynchronous transactions andregular (synchronous) transactions executed directly by applications. To automatically maintain the consistency ofdependent objects, we propose the concept of aDependency Management System(DMS). A DMS monitors depen-dent objects, evaluates object consistency, and schedules and controls consistency restoration transactions to keepdependent objects within acceptable consistency levels. We describe key components in the DMS architecture,and a relatively simple implementation involving straightforward extensions in a relational DBMS.

Keywords: data replication, data summarization, asynchronous transactions, ECA rules, activity models,extended transactions

1. Introduction

A set of objects are (inter)dependentif they are maintained by different information systemsand their states (i.e., their values and behaviors) are related to each other. In addition todefining such object relationships,object dependenciesdefine consistency requirements formaking dependent objects consistent with each other.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

122 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Many organizations currently have a large number of legacy and new information systemsmaintaining replicated, summarized, and overlapping data. Such dependent objects may berows in relational databases, data and functionality maintained in legacy information sys-tems, or generalized notions of objects such as those in object-based databases. Dependentobjects can be classified in the following categories:

• objects replicated or cached in different information systems;• objects collected and summarized in data warehouses for analysis, planning, and decision

support.

Replicatedandcached objectsare the most common categories of dependent objects. Anobjecto is a replica or a cache of an objecto if the state ofo must be kept the same as thestate ofo. For example, consider customer data replicated in a Billing and a Service Ordersystem in a telecommunication company. The Billing system contains customer billinginformation, while the Service Order system keeps track of the present and past servicesthat have been provided to each customer. Customer data (e.g., name, telephone number,and address) are replicated in these two systems. Such object replication is usually requiredfor the following reasons:

• to deal with heterogeneity and object access inefficiencies in legacy information systems,e.g., the Service Order system provides limited or inefficient access to its local customerdata;• to provide greater performance and availability, e.g., sending a query to another system

to ask for customer information is too expensive compared to accessing local data;• to accommodate autonomous organizations, e.g., the customer service and billing orga-

nizations require complete control of the data they use.

In addition to creating and maintaining dependent objects for dealing with heterogeneity,efficiency, and autonomy, many organizations often need to replicate and summarize objectsto process analytical queries for planning and decision support. For example, the data inthe Service Order system can be used to answer questions like: “Which type of services doour customers request most?” or “How many new customers asked for a particular serviceduring the third quarter of 1995 for a specific geographical region?” Although answersto such queries may be possible to derive from Service Order data, it is often difficult orimpossible to process them on operational systems for the following reasons:

• Analytical queries are often long-lived and interfere with (i.e., impose additional over-head and slow down performance of ) the operational system applications updating thesame data.• Efficient data access by analytical queries may require different access methods than

those provided by operational systems. For example, if the operational data reside in anIMS system and an analytical query does not correspond to the hierarchical structure ofthe IMS data, it may be necessary to scan the entire IMS database to provide the queryresults.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 123

• Operational systems may not maintain historical data. In such cases analytical queriesreferencing historical objects cannot be answered.• Operational data may be spread among heterogeneous information systems that do not

provide uniform access interfaces. If multi-system analytical queries are difficult to poseor take too long to execute, replicating data in a separate homogeneous database makessuch multi-system queries single-system queries.

These practices create a tendency to differentiate between two types of dependent objects:Operational objectsthat are used for day-to-day operations andanalytical objectsthat areused for planning and decision support. Analytical objects may be replicas of operationalobjects (as in the Service Order database example), or may be collections or summaries ofoperational objects and/or other analytical objects.

The concept ofdata warehouseemerged from the realization that legacy and operationalsystems do not address the planning and decision support requirements of an organization[26, 30, 31]. A Data Warehouse stores large amounts of objects in several granularitiesranging from detailed information (i.e., analytical objects replicated from operational ob-jects) to highly summarized information (i.e., analytical objects created by summarizingother analytical objects).

Objects in a data warehouse (figure 1) often form a structure that consists of three lay-ers. The top layer contains most-recently replicated objects from the operational systems,referred to asdetaileddata. At this level, the data warehouse supports day-to-day decisionmaking. The middle layer of a data warehouse is populated with summarized data fromthe detailed data layer, referred to asdepartmentaldata ordata marts. For example, theobjects in the middle layer provide information suited for departmental decision making.Finally, the lowest layer of the data warehouse contains highly summarized data derivedfrom either the detailed or the departmental data, referred to asexecutivedata. These objectsare designed for executive decision making and are far fewer than objects stored in the othertwo layers.

Figure 1. A data warehouse organization.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

124 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

There are several differences between operational systems and data warehouses.

1. In general, operational systems exhibit high rate or “read” and “update” transactions,while data warehouses are mostly for read-only use for analysis and decision support.When an update occurs at the operational system, it is not immediately propagated tothe data warehouse as an on-line synchronous transaction. Usually a data warehouse ispopulated with coarse off-line bulk updates.

2. The size of the data in an operational system is relatively smaller than the size of a datawarehouse. A data warehouse may contain data that originate from more than a singleoperational system, and old data is usually resident in the warehouse for long periods oftime.

3. Data in a warehouse tend to create more complex dependencies between them, sincethey originate from many different and possibly dependent operational systems.

In addition to forming replication and summarization relationships, dependent objectsrequire consistency maintenance to ensure that their states satisfy the relationships as theobjects from which they were derived are updated. There are two basic approaches forrestoring object consistency: immediate consistency and eventual consistency.

Immediate consistencyrequires dependent objects to become consistent whenever a trans-action commits [35].Eventual consistency[32, 34] allows dependent objects to becomeinconsistent during some period, as long as they will be made consistent periodically.

Immediate consistency can be very expensive or impossible to achieve (e.g., if the systemsmaintaining dependent objects are heterogeneous and autonomous). Therefore, immediateconsistency is often impractical. Eventual consistency can be used whenever some incon-sistency between dependent objects can be tolerated. For example, consider two dependentobjectsa andb that represent integers stored in different systems. Although we consideraandb replicated objects, suppose we can tolerate some discrepancy in their values as long astheir difference is not greater than 5. Eventual consistency requires makinga andb consis-tent from time to time (i.e., whenever their value difference becomes greater than 5). Objectreplicas that can tolerate some controlled inconsistency are calledquasi-copiesin [1].

In this paper we address the problem of how to specify and manage dependent objectsmaintained in different information systems. We allow dependent objects to choose whetherthey require immediate or eventual consistency. However, since many applications can tol-erate limited inconsistencies between dependent objects, we focus on eventual consistency.The techniques we propose can be applied to objects ranging from simple objects such asrows, columns, and specific column(s) in a row in relational DBMSs, to generalized notionsof objects such as those supported in object DBMSs.

In Section 2, we introduceObject Dependency Descriptors(ObjectDDs) for specify-ing object dependencies. ObjectDDs provide the following: (i) a set oriented language(i.e., extended SQL) for capturing the relationships between dependent objects, e.g., bydescribing how replicas or summaries are created from the original data, and (ii) idiomsfor describing tolerated inconsistencies between dependent objects, e.g., by describing howmuch inconsistency between original objects and their replicas/summaries can be toleratedbefore it is necessary to restore their consistency. We give several examples of dependencesbetween objects in an environment consisting of operational systems and a data warehouseenvironment and discuss their specification and enforcement. ObjectDDs are an enhancedvariation of the Data Dependency Descriptors, introduced in [32].

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 125

In Section 3, we discuss how to use transactions to manage dependent objects, i.e., toenforce specified object dependencies. This involves doing the following: (i) evaluatingwhether dependent objects satisfy their specified relationships (ii) evaluating whether pos-sible inconsistencies can be tolerated, and (iii) restoring consistency (if necessary), e.g.,by updating replicas with the current values of original data, recomputing summaries, etc.These tasks are performed byconsistency evaluationandrestorationtransactions.

Although traditional ACID transactions can be used to evaluate and restore the con-sistency of dependent objects, many applications that involve dependent objects do notrequire such strict transactional properties. To illustrate this point, we discussextendedtransaction models(ETMs) designed specifically for managing dependent objects. In par-ticular, we give examples of ETMs that relax the ACID properties but maintain the cor-rectness and reliability of dependent objects in data warehouse applications. To specifysuchextended transaction models(ETMs) we introduceTransaction Dependency Descrip-tors (TransactionDDs). These are based in our prior work described in [19, 20] and theyare complementary to ObjectDDs. TransactionDDs define the transaction dependenciesbetween consistency restoration (asynchronous) transactions, as well as the dependenciesbetween such asynchronous transactions and regular (synchronous) transactions executeddirectly by applications. Conditions under which regular transactions are allowed to updatedependent objects are discussed in Section 4.

In Section 5 we discuss how to automatically maintain the consistency of dependent ob-jects. In particular, we introduce the concept of aDependency Management System(DMS).A DMS manages ObjectDDs and TransactionDDs, i.e., monitors dependent objects, eval-uates object consistency, and schedules and controls consistency restoration transactionsto keep dependent objects within acceptable consistency levels. We discuss key compo-nents in the DMS architecture, and outline a relatively simple implementation involvingstraightforward extensions to a relational DBMS.

In Section 6 we discuss related work. Our conclusions are presented in Section 7.

2. Specification of object dependencies

In this section, we introduceobject dependency descriptors(ObjectDDs) supporting declar-ative specification of object dependencies. ObjectDDs are similar to integrity constraints[36]. However, unlike integrity constraints they do not require integrity enforcement tooccur immediately. ObjectDDs allow the specification of “how long” and “how much” in-consistency can be tolerated between dependent objects. In addition, ObjectDDs explicitlyspecify the relationship between such objects, the events that indicate possible violationsof object consistency with respect to maintaining their relationship, and the transactionsperforming consistency restoration.

Specifically, an ObjectDD is a 6-element tuple that has the following form:

(o,O, E,C, D, τV )

o is thedependentobject, i.e., thetargetobject.O is the set ofsourceobjectso depends on.E, C, andD are boolean-valued predicates.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

126 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Table 1. Evaluation of ObjectDDs.

E C D Target consistency with source(s) ObjectDD

Tolerated inconsistency√

√Consistency

√√

Inconsistency√

Tolerated inconsistency√

√ √Consistency

√√ √

Inconsistency

The event predicate Especifies temporal, object, and transaction events indicatingchanges in the source object(s).Wheneverany of the specified events occurs,E evalu-ates to true. If there are no event occurrences, or occurred events have been handled byinitiating an evaluation ofC, E evaluates to false.

The consistency predicate Cspecifies for “how long” and/or “how much” inconsis-tency between the target object and the source object(s) can be tolerated.C is typicallyevaluated whenE becomes true. IfC evaluates to false, the inconsistency between thetarget and the source object(s) (if any) can be tolerated. Otherwise, consistency must berestored.

Theobject dependency predicate Ddefines the relationship that must hold between thetarget and the source object(s). IfD evaluates to true, the target object is consistent withthe source object(s). Otherwise, target and source objects are inconsistent. As we notedearlier, inconsistency is tolerated whileC is true.

To make ObjectDDsenforceablewe associated a boolean value with each ObjectDD.Table 1 illustrates how to evaluate theE, C, and D values to derive a boolean value foran ObjectDD. We use a

√mark to indicate a predicate evaluating to true, and leave a

blank space to denote a predicate evaluating to false. The values ofE, C, and D areillustrated in columns one, two, and three, respectively. Columns whereC andD are bothtrue are invalid and they are not depicted (since it is impossible to have a situation wherethe target is consistent with the source objects and at the same time haveC indicate anintolerable inconsistency). The fourth column in Table 1 notes whether the target object isinconsistent with the source object(s), and whether such an inconsistency can be tolerated.The ObjectDD value is depicted in the last column.

The ObjectDD value reveals whether the specified object dependency is preserved. If anObjectDD evaluates to true, the object dependency holds (i.e., the relationship between thetarget object and the source object(s) holds within the specified inconsistency tolerance).

The last element in an ObjectDD is thetransaction vectorτV . Its purpose is to establishthe transactions responsible for evaluatingE andC, and enforcingD as needed to maintainthe consistency of the target object with the source object(s). The vector is interpreted basedon the number of transaction sets it contains:

• empty vector: if the vector contains no transaction sets, the transaction(s) that create theevents inE are responsible for evaluatingE andC, and enforcingD.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 127

• one transaction set: if the vector contains one transaction set, the transaction(s) in theset are responsible for evaluatingC and enforcingD. Just as in the empty vector case,E is evaluated by the transaction(s) that creates the events inE.• two transaction sets: if the vector contains two transaction sets, the transaction(s) in the

first set are responsible for evaluatingC and the transaction(s) in the second set enforceD. Just as in the previous two cases,E is evaluated by the transaction(s) that creates theevents inE.

The transaction vector and the specification of theD, C, andE elements of ObjectDDsare discussed further in this paper. In the following two sections we provide examplesillustrating the use of ObjectDDs in describing object dependencies and explain the needfor implementation independent specifications.

2.1. A detailed object dependency specification example

Suppose each regional organization maintains a separate billing database. Data from theregional billing databases is replicated in a data warehouse located in the company’s head-quarters.

Figure 2 illustrates such a telecommunications organization having two regional billingdatabases (BillingWest and BillingEast DBs), a central DW database (Warehouse DB),and a data mart database (DMart DB). BillingWest maintains a CUST(C#, Cname, Area#,LastBill) table containing information about the company’s customers in the western regionof the company. In addition, the CUST table includes the latest telephone bill for eachcustomer. BillingEast maintains the same customer and billing information for the eastregion of the company. The CUST table in the central Warehouse DB is the union of the

Figure 2. Object dependencies between billing and warehouse data in a telecommunications organization.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

128 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

CUST tables in the regional billing databases. This defines aunion dependencybetweenBillingWest.CUST, BillingEast.CUST, and Warehouse.CUST. The AREABILL table inthe DMart DB is derived from the CUST table in the Warehouse DB by using the followingquery: For every telephone company region, get each area and average bill of all customersin this area. This defines asummarization dependencybetween Warehouse.CUST andDMart.AREA BILL.

Suppose that the Warehouse DB and the DMart DB applications can tolerate some in-consistency. In particular, consider the following consistency terms for our union and sum-marization dependencies:

• Temporal consistency term for union: BillingWest.CUST, BillingEast.CUST, and Ware-house.CUST must become consistent in the end of every billing period, say, the first dayof each month.• State consistency term for summarization: Warehouse.CUST and DMart.AREABILL

must become consistent whenever: (i) the average bill of the customers in a specific areastored in the AREABILL table is different from the same average computed using theWarehouse DB, and (ii) the difference is greater than $100.

These object relationships and corresponding consistency requirements define a uniondependency with a temporal consistency term and a summarization dependency with astate consistency term. Such dependencies and consistency requirements can be effectivelyspecified by ObjectDDs. For example, the following ObjectDD specifies the union depen-dency between BillingWest.CUST, BillingEast.CUST, and Warehouse.CUST, including itstemporal consistency term:

(o,O, E,C, D, τV )

where

o: Warehouse.CUSTO: {BillingWest.CUST, BillingEast.CUST}E: “first day of a month”C: current= “first day of a month”D: Warehouse.CUST= BillingWest.CUST∪ BillingEast.CUSTτV : {TWarehouse, TBillingWest, TBillingEast}

The BillingWest.CUST and BillingEast.CUST objects are the sources. Warehouse.CUSTis their target object. The event predicateE includes the “first day of the month” event.The occurrence of this event indicates that theC consistency predicate must be checkedto determine whether consistency between the target and source objects is violated. Theconsistency predicateC tests whether the “first day of the month” event violates the temporalconsistency term we discussed earlier, i.e., whether today is the “first day of a month”. Theobject dependency predicateD defines that the target object is consistent if it is the unionof the source objects.

The transaction vectorτV specifies that consistency is restored byoneset of transactions:{TWarehouse, TBillingWest, TBillingEast}. As we discussed earlier in Section 2, ifτV contains only

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 129

one set of transactions these transactions evaluateC and (if necessary) enforceD. Thesetransactions are triggered by events inE, but they do not evaluateE. The E evaluationeither does not require a transaction execution (ifE consists of a simple temporal eventas in this example), orE is evaluated by the transactions that create the events inE (if Eis more complex than a simple temporal event). In the following section we discuss theconsistency restoration tasks performed by the transactions inτV and explain why thesetasks are not explicitly specified in ObjectDDs. In Section 3, we will discuss consistencyrestoration transactions inτV further. In particular, we discuss how to ensure that thesetransactions maintain consistency, i.e., they do not interfere with each other and with othertransactions in the presence of concurrency and failures.

2.2. Implementation of independent specification of object dependencies

Assuming an unoptimized system for enforcingD, the transactions in our exampleτV per-form the following consistency restoration tasks: First, theTWarehousetransaction evaluatesC, i.e., determines whether today is the first day of the month. If this is true,TBillingWest

retrieves the entire BillingWest.CUST table and passes it to theTWarehousetransaction. Sim-ilarly, TBillingEast retrieves and passes the BillingEast.CUST table to theTWarehousetrans-action. Finally, theTWarehousetransaction takes the union of the BillingWest.CUST andBillingEast.CUST and uses the result to overwrite the Warehouse.CUST table. This maynot be an efficient way of restoring object consistency, since it involves moving and com-bining entire tables with possibly unmodified data, e.g., the C#, Cname, and Area# columnsin the CUST tables may not be updated for several billing periods.

One way to deal with efficiency problems is to design a consistency enforcement sys-tem that maintainsdelta objects for each updateable source object. In our telecommu-nications example this will involve creating and maintaining two delta tables: Billing-West.CUSTdelta and BillingEast.CUSTdelta. The purpose of such delta objects is torecord updates applied to the original objects since the time the last consistency restora-tion took place. When consistency needs to be restored, delta objects provide a recordof all updates that need to be considered. Therefore, consistency restoration is performedby applying corresponding updates to the target object and not by recreating the targetfrom the sources. Consider the BillingWest.CUST table in our telecommunications ex-ample. At the end of a billing period BillingWest.CUST usually contains a new LastBillvalue for each customer. In addition, BillingWest.CUST may include some insertions ofnew customers as well as deletions and updates of customers that have moved since thelast billing period. However, the vast majority of customers usually remains unchanged.Therefore, most updates documented in the BillingWest.CUSTdelta table are (C#, Last-Bill) pairs recording changes in customer bills. Using the BillingWest.CUSTdelta table torestore the consistency of Warehouse.CUST table involves: (i) having theTBillingWest trans-action retrieve the BillingWest.CUSTdelta table and pass it to theTWarehousetransaction,and (ii) having theTWarehousetransaction take the union of the BillingWest.CUSTdelta andBillingEast.CUSTdelta and update the LastBill field in Warehouse.CUST table.

ObjectDDs donot explicitly describe how object consistency is actually enforced, i.e.,whether delta objects or some other implementation technique is used. Therefore,

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

130 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

ObjectDDs are declarative and implementation-independent. They allow dependency en-forcement systems to use different techniques to enforce object consistency, and deal withdifferent, often conflicting requirements, for providing efficiency and preserving databaseautonomy. Our example ObjectDD specifies the union dependency the same way for anunoptimized dependency enforcement system as well as for a system using delta objects toimplement consistency restoration. The architecture for a system to enforce object consis-tencies is discussed further in Section 5.

In the following sections we discuss ObjectDDs in more detail. In Section 2.3 we addressthe specification of object dependencies and give examples. In Section 2.4 we discuss thespecification of object consistency requirements. Section 2.5 focuses on the specificationof events and corresponding predicates.

2.3. Specification of object dependency predicates

The simplified specification language we used in Section 2 to describe our example uniondependency is not powerful enough to capture more complex dependencies such as thesummarization dependency in our telecommunication example. In this section, we definethe object dependency predicateD in more detail and describe a more powerful specificationlanguage for specifying theD component of ObjectDDs.

The object dependency predicateD is a boolean-valued predicate specifying the rela-tionship that should hold between the target and source objects. This predicate is of theform:

Target Expression setop SourceExpression

SourceExpressionandTargetExpressiondenote set expressions identifying the sourceand target objects, whilesetop is a set operator indicating the relationship that must holdbetween them. Thesetopoperator is used to express equality and containment constraints,thereforesetop is one of=,⊂,⊃,⊆,⊇.

SourceExpressionandTargetExpressionare specified using a set-oriented language,such as SQL. Here we consider set-oriented languages that are based on the relational oran object algebra which includes selection, projection, join, union, difference, intersection,etc. Together with the basic algebra operators, we require aggregate and closure operators.Aggregate operators allow specification of aggregate functions such as sum, average, orcount for a set of objects or for subsets obtained by partitioning the set according to aspecified attribute. The closure operator is needed to compute the transitive closure ofsource objects in an ObjectDD. Aggregate and subset operators can be used inside theclosure operator. Note that if anobject algebrais used, ordinary functions could be directlyincluded inSourceExpressionandTargetExpressionspecification.

To illustrate the specification of object dependency predicates using a language basedon such an algebra, consider the summarization dependency in our telecommunicationsexample (figure 2). In particular, consider that AREABILL table in the DMart DB isderived from the CUST table in the Warehouse DB by using the following SQL query: Forevery telephone company region, get each area and the average bill of all customers in this

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 131

area. This summarization dependency can be specified by the following object dependencypredicate:

D: Area Bill Expression= CustExpression

where

Area Bill Expression:=SELECT Area#,AvgLastBill FROM DMart.AREA BILLCustExpression:=SELECT X.AREA#, AVG(Y.LastBill)

FROM Warehouse.CUST X, Warehouse.CUST YWHERE X.Area#=Y.Area#

Using SQL to describe the union dependency between BillingWest.CUST, Billing-East.CUST and Warehouse.CUST is straightforward. The “Warehouse.CUST= Billing-West.CUST∪ BillingEast.CUST” expression is replaced by the following object depen-dency predicate:

D: WarehouseCustExpression= OperationalCustExpression

where

WarehouseCustExpression:=SELECT C#,Cname,Area#,LastBill FROM Warehouse.CUST

OperationalCustExpression:=SELECT C#,Cname,Area#,LastBill FROM BillingWest.CUSTUNIONSELECT C#,Cname,Area#,LastBill FROM BillingEast.CUST

2.4. Specification of object consistency requirements

There are two basic approaches for maintaining consistency between the target and sourceobjects in an ObjectDD: immediate consistency and eventual consistency.

Immediate consistencyrequires target objects to be consistent with their source objectswhenever a transaction commits (even if there are transactions updating source objectswithout updating their target objects) [35].

Specification of immediate consistency using ObjectDDs involves:

1. adding an object update event for each source object in the event predicateE,2. setting the consistency predicateC to true (or just leavingC blank), and3. leaving the transaction vectorτV empty.

Suppose that event.update(BillingWest.CUST) represents update events for Billing-West.CUST and event.update(BillingWest.CUST) are update events for BillingWest.CUST.The following ObjectDD specifies immediate consistency for our example union depen-dency.

(o,O, E,C, D, τV )

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

132 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

where

o: Warehouse.CUSTO: {BillingWest.CUST, BillingEast.CUST}E: event.update(BillingWest.CUST) OR event.update(BillingEast.CUST)C: trueD: Warehouse.CUST= BillingWest.CUST∪ BillingEast.CUST

{using our simplified notation}τV : { }

Enforcing immediate consistency in this example requires the ability to issue a multi-system transaction to access the BillingWest, BillingEast, and Warehouse DBs and performall necessary updates. Such transactions are typically initiated by a user updating one of thesource objects, and they are subsequently extended by the system that enforces object de-pendencies to perform target object updates as specified by the ObjectDD. Providing multi-system transactions requires the object dependency enforcement system to provide capabil-ities similar to those supported by commercial TP monitors and DBMS replication servers.

Eventual consistencyallows target and source objects to become inconsistent duringsome period, as long as they will be made consistent periodically. Thus, unlike immediateconsistency, eventual consistency allows target and source objects to be inconsistent whentransactions commit. Eventual consistency [32, 34] requires target objects to become con-sistent at certain points of time specified by a consistency condition/term. The consistencycondition is specified as a combination of conditions ontime1, object state2, andnumberand/or type of source object accesses. For example, the union dependency in Section 2 re-quires Warehouse.CUST to become consistent with BillingWest.CUST, BillingEast.CUSTonly on the first day of each month. This requires eventual consistency.

To allow the specification of eventual consistency requirements, ObjectDDs provide theconsistency predicateC. Consistency predicates may consist of multiple boolean condi-tions, each referred to as aconsistency term. In the following sections we discuss the spec-ification of various consistency terms and give examples. In Section 2.4.1 we describe thespecification oftemporal consistency terms. In Section 2.4.2 we discussstate consistencytermsfor specifying state differences between target and source objects. In Section 2.4.3we describe the specification oftransaction consistency termsthat capture inconsistenciesintroduced by specific transactions performing specific operations. Composite consistencyterms are discussed in Section 2.4.4.

2.4.1. Temporal consistency terms.Terms in this category specify the instant in timewhen actions should be taken to restore the consistency between target and source objects.

Each temporal consistency term consists of the seven consecutive fields listed in Table 2.Fields are separated by spaces indicating logical AND operators. Each field may be eitheran asterisk “∗” (denoting all legal values listed in the second and third column in Table 2)or a list of elements separated by commas. An element consists of either one of the legalvalues or two legal values separated by a minus sign “−” (indicating an inclusive range).Note that the specification of days may be performed by two fields (day of the month andweekday). If both are specified in a list of elements, both are adhered to.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 133

Table 2. Fields comprising a temporal consistency term.

Field names Absolute values Relative values

second 0–59 Now

minute 0–59 Now

hour 0–23 Now, noon, midnight

dayof the month 1–31 Today, tomorrow

monthof the year 1–12 Currentmonth

year nnnn,n = 0−9 Currentyear

weekday 0–6, or Su-Sa Currentweekday

For example, consider the following temporal consistency term between a target and asource object:

second(0) AND minute(0) AND hour(0) AND day(1, 15) AND month(*) ANDyear(*)AND weekday(Mo), or

second(0) minute(0) hour(0) day(1, 15) month(*) year(*) weekday(Mo)

This specifies that the target object must become consistent with its source object on thefirst and fifteenth of each month, as well as on every Monday.

To specify days by using only one field (e.g., day of the month), the other field (weekday)should be set to “*”. To specify that the target object in our union dependency must becomeconsistent with the source objects only the first day of each month, we write:

second(0) minute(0) hour(0) day(1) month(*) year(*) weekday(*)

Relative time values such asnow or today are provided to allow specifying restora-tion of consistency in a relative time instead of the absolute times used in the previousexamples. Consider the following temporal consistency term between target and sourceobjects:

second(0) minute(0) hour(0) day(today+ 5) month(*) year(*) weekday(*)

This specifies that consistency should be restored every fifth day staring from today.Many operating systems provide services supporting specifications of temporal terms

for scheduling program execution in background or batch. In addition, there is extensiveliterature on temporal event specification. The seven-field specification language we dis-cussed in this section is similar to that supported by the Unix crontab service. Therefore, itsbasic advantage is straightforward UNIX-based implementation of any specified temporalconsistency term. In our work, we represent temporal terms in fixed-time format and notin time intervals. The main reason that we chose fixed-time format is that scheduling ofrestoration transactions is easier and less complicated to implement. For example, if weneed to specify that a transaction must occur after 24 hours, we specify the exact time

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

134 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

(fixed-time) that the transaction must begin execution. Additionally, it is straightforward tomap our existing fixed-time format to temporal intervals, where we assume that the updatehappens in the leftmost time of the interval or “now”, whichever happens to be the earliesttime.

2.4.2. State consistency terms.State consistency terms are conditions on the states ofsource and (possibly) target objects that specify “how far” these objects may be allowedto diverge since the last time their consistency was restored. If the divergence exceeds apre-specified limit, consistency between target and source objects must be restored. Stateconsistency terms can be classified as follows:

• differences in the states of source and target objects;• changes in the states of source objects;• changes in the set of the source objects (e.g., in membership, cardinality).

In the rest of this section, we use the termstateto refer to the (possibly composite) valueand behavior of an object. In addition, we use the termstate componentto refer to: (i) theindividual values composing a composite object value, and (ii) the individual operationscomprising the behavior and interface of an object. For object maintained in object-baseddatabases, we consider only object state components whose specification does not violateobject encapsulation. In relational databases there is no object encapsulation.

Target-source state terms: specify the tolerable differences between the states (or spe-cific state components) of source and target objects. Target-source state terms are specifiedby the following boolean function:

Differenceexpression(Target expression,Sourceexpression)

Specification ofSourceExpression, TargetExpression, andDifferenceexpressionre-quires a SQL-like (i.e., set-oriented) language as discussed in Section 2.3.TargetexpressionandSourceexpressionspecify: (i) the target and source objects, respectively, and (ii) thespecific target and source state components considered in defining tolerable state differ-ences between target and source objects. The actual specification of tolerable differences isperformed in theDifferenceexpression. If there is a non-tolerable difference between thestates of target and source objects, the target-source state term evaluates to true. Otherwise,the term evaluates to false.Differenceexpressionprovides SQL-style return codes. Inparticular, whenever the target-source state term evaluates to false,Differenceexpressionreturns zero to indicate that no action is necessary. In situations where the target-sourcestate term evaluates to true,Differenceexpressionreturns a positive integer indicating thatconsistency must be restored.

To illustrate the specification of a target-source state term, consider again the summariza-tion dependency between the DMart.AREABILL and Warehouse.CUST objects. Supposethat Warehouse.CUST and DMart.AREABILL must become consistent whenever: (i) theaverage bill of the customers in a specific area stored in the DMart.AREABILL table isdifferent from the same average computed using the Warehouse.CUST table, and (ii) the

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 135

difference is greater than $100. This consistency requirement is a target-source state termwhich, using an SQL-like language, can be specified as follows:

AreaBillDifference(DMartAreaBill,DWAreaBill)

where

DMartAreaBill :=SELECT Area#,AvgLastBill FROM DMart.AREA BILLDWAreaBill:=CREATE VIEW ObjectDDdb.AREABILL(Area#, Avg LastBill) AS

SELECT X.Area#, AVG(Y.LastBill)FROM Warehouse.CUST X, Warehouse.CUST YWHERE X.Area#= Y.Area#

AreaBillDifference:=SELECT COUNT(*)FROM DMart.AREA BILL, ObjectDDdb.AREABILLWHERE DMart.AREABILL.Area#= ObjectDDdb.AREABILL.Area# AND(DMart.AREA BILL.Avg LastBill −

ObjectDDdb.AREABILL.Avg LastBill >100)

This target-source state term specification requires SQL extensions for ObjectDDs. Inparticular, the view definition “CREATE VIEW ObjectDDdb.AREABILL” requires twoSQL extensions. The first, is the introduction of the ObjectDDdb database for maintaininginformation about ObjectDDs. The other SQL extension involves the view itself. SuchObject DD-related view definitions have slightly different semantics than a view definitionin a relational DBMS. Basic differences include:

1. “ObjectDDdb.AREABILL” is a multi-system view (if dependent objects are stored indatabases this becomes a multidatabase view), and

2. ObjectDDdb views are visible only to ObjectDDs, not to regular transactions. Thiscan be accomplished by placing the “ObjectDDdb.AREABILL” view definition in theObjectDDdb schema (in this case, ObjectDDdb can be viewed as a multidatabase schemacreated specifically for ObjectDDs).

Source-source state terms: specify the limits of tolerable changes in the state of sourceobjects. Unlike target-source state terms that involve the states of both target and sourceobjects, source-source state terms involve only the state of source objects. If the change inthe state of the source object exceeds the specified tolerance, the state of the target objectmust be updated to reflect the changes in the source objects.

Source-source state terms are specified by the following boolean function:

Changeexpression(Sourcestateexpression)

Sourcestateexpressionspecifies the source objects, and the specific source state compo-nents considered in defining tolerable state changes.Changeexpressionspecifies tolerable

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

136 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

changes between: (i) thecurrent stateof the source objects, and (ii) the state of the sourceobjects the last time consistency between target and source objects was restored, referred toas thelast consistent interdependent stateof the source objects. If there is a non-tolerablechange between the current state and last consistent interdependent state of the source ob-jects, the source-source state term evaluates to true. Otherwise, the source-source state termevaluates to false. Just likeDifferenceexpressionin the target-source state term specifica-tion,Changeexpressionprovides SQL-style return codes. In particular,Changeexpressionreturns zero to indicate that no action is necessary, and returns a positive integer to indicatethat consistency must be restored.

To illustrate the specification of a source-source state term consider the following summa-rization dependency between DMart.AREABILL and Warehouse.CUST objects: Supposethat Warehouse.CUST and DMart.AREABILL must become consistent whenever the av-erage bill of all customers in the Warehouse.CUST table has changed by more than $100since the last time consistency between DMart.AREABILL and Warehouse.CUST wasrestored. This is source-source state term that can be specified as follows:

AvgCustBillChange(DWAvgCustBill)

where

DWAvgCustBill:=CREATE VIEW ObjectDDdb.AVGBILL(Avg Bill) ASSELECT AVG(LastBill) FROM Warehouse.CUST

AvgCustBillChange:=SELECT COUNT(*)FROM ObjectDDdb.AVGBILL X,

LAST CONSISTENT ObjectDDdb.AVGBILL YWHERE (X.AVG BILL.Avg Bill −Y.AVG BILL.Avg Bill) > 100

Source-source state term specification requires further SQL extensions for ObjectDDs tosupport the “LASTCONSISTENT” keyword. In our target-source state term specificationexample we introduced ObjectDDdb, an ObjectDD-specific database for defining views forObjectDDs. Here, we introduce a new ObjectDDdbdimensionfor representing the lastconsistent interdependent state of source objects. In particular, in our source-source stateterm specification we used “LASTCONSISTENT ObjectDDdb.AVGBILL” to representthe last consistent interdependent state of ObjectDDdb.AVGBILL.

An ObjectDDdb implementation must do the following to support “LASTCONSIS-TENT”: (i) materialize the ObjectDDdb.AVGBILL view every time consistency betweenDMart.AREA BILL and Warehouse.CUST is restored, and (ii) maintain a dimension ofmaterialized ObjectDDdb.AVGBILL views to allow referencing dimension elements bythe using the “LASTCONSISTENT” ObjectDD view qualifier.

Source-source set terms: specify the limits for tolerable changes in the set of sourceobjects, e.g., changes involving cardinality, membership, or percentage of elements thathave certain values or operations in their interface. If a change in the set of source objectsviolates the specified tolerance, the target object must be updated to reflect the changes inthe set of source objects.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 137

Source-source set terms are specified by the following boolean function:

Changeexpression(Sourcesetexpression)

Sourcesetexpressionspecifies the set of source objects, whileChangeexpressionspec-ifies tolerable discrepancies between: (i) thecurrent setof source objects, and (ii) thelastconsistent interdependent setof the source objects with respect to the target object, i.e., theset of the source objects the last time consistency between target and source objects wasrestored. If there is a non-tolerable change between the current set and last consistent inter-dependent set of the source objects, the source-source set term evaluates to true. Otherwise,the source-source set term evaluates to false.

To illustrate a source-source set term specification consider again the union depen-dency between BillingWest.CUST, BillingEast.CUST and Warehouse.CUST. Suppose thatthe target (Warehouse.CUST) must become consistent with the source objects (Billing-West.CUST and BillingEast.CUST) whenever there are more than 100 customers in theWarehouse.CUST that do not exist in BillingWest.CUST and BillingEast.CUST (e.g., be-cause they moved since the last time consistency was restored or changed service provider).This is source-source set term that can be specified as follows:

CustNumChange(BillingCustSet)

where

BillingCustSet:=CREATE VIEW ObjectDDdb.CUSTSET (C#) ASSELECT C# FROM Billing.West.CUSTUNIONSELECT C# FROM Billing.East.CUST

CustNumChange:=SELECT ABS(COUNT(UNIQUE *)− 100)FROM ObjectDDdb.CUSTSETWHERE C# NOT IN(SELECT C# FROM LASTCONSISTENT ObjectDDdb.CUSTSET)

2.4.3. Transaction consistency terms.These are conditions that specify whether specificoperations performed by specific transactions introduce non-tolerable inconsistencies be-tween the target and source objects. Transaction consistency terms can capture specifictransaction types and instances, as well as specific operation types and instances. In partic-ular, each transaction consistency term is comprised of four consecutive fields (Table 3).

Fields in a transaction consistency term are separated by spaces indicating logical ANDoperators. Each field may be either an asterisk “*” (denoting all legal values) or a list of legalvalues separated by commas. Legal values foroperationtypeinclude the types of all opera-tions that can be performed by the transaction types defined intransactiontype. Basic oper-ation types includereadandwrite. Relational databases provideinsert, delete, update, andqueryoperations. These operations can be performed by all transaction types. Objects andabstract data types typically support user-defined operations that can be used in the specifi-cation of transaction consistency terms. Legal values fortransactiontypeinclude basic and

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

138 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Table 3. Fields comprising a transaction consistency term.

Field names Default values Legal values

transactiontype Update Any transaction type

transactioninstance 1 Any integer

operationtype Write Any operation type intransactiontype

operationinstance 1 Any integer

user-defined transaction types. Relational databases provide genericACID transactionsandseveral generictransaction isolation levelsas basic transaction types. User-defined trans-action types are discussed further in Section 3.Transactioninstance(operationinstance)specifies how many times a transaction (operation) of a type intransactiontype(opera-tion type) can be executed until it violates consistency between target and source objects.

To illustrate a transaction consistency term specification consider our example union de-pendency between BillingWest.CUST, BillingEast.CUST and Warehouse.CUST. Supposethat the target (Warehouse.CUST) must become consistent with the source objects (Billing-West.CUST and BillingEast.CUST) whenever the total number of inserts into Billing-West.CUST and BillingEast.CUST is greater than 100 (e.g., consistency must be restoredif there are 100 new customers). This consistency requirement is specified by the followingtransaction consistency term:

transactiontype(*) AND transactioninstance(*) AND operationtype(insert)AND operationinstance(100), or

transactiontype(*) transactioninstance(*) operationtype(insert)operationinstance(100)

This transaction consistency term takes into account all insertions to BillingWest.CUSTand BillingEast.CUST. Therefore, it will restore Warehouse.CUST consistency in situationswhere there are 100 customer insertions, but fewer than 100 new customers. For example,this may occur if 50 customers move from the West region to the East region of the company,since the customers moving from the West region will be deleted from BillingWest.CUSTand be inserted in BillingEast.CUST. Similarly, those moving from the East to the Westregion will be deleted from BillingEast.CUST and be inserted to BillingWest.CUST.

Suppose that our objective is to ensure that Warehouse.CUST is made consistentonlyif there are 100 new customers. This can be specified by introducing a new transactiontypenewcustfor the transactions performing new customer insertions. This consistencyrequirement is specified by the following transaction consistency term:

transactiontype(new cust)transactioninstance(*) operationtype(insert)operationinstance(100)

This transaction consistency term does not specify the number ofnewcusttransactions.Therefore, it allows eachnewcusttransaction to insert multiple new customers. Suppose

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 139

now that we want to specify that consistency must be restored whenever 100 transactionsof typenewcustinsert at least one new customer in the source objects. This consistencyrequirement is specified by the following transaction consistency term:

transactiontype(new cust)transactioninstance(100)operationtype(insert)operationinstance(*)

2.4.4. Complex consistency terms.Temporal, state, and transaction consistency termsaresimpleconsistency terms.Complexconsistency terms are composed from consistencysubterms. Each consistency subterm is either a complex consistency term or a simpleconsistency term. Consistency subterms are combined together with logical AND, OR andNOT operators, and PRE, a consistency term precedence operator. PRE defines a prece-dence relationship between consistency subterms. In particular,cx PREcy evaluates to trueif the consistency subtermcx is true before the consistency subtermcy becomes true.

To illustrate the specification of complex consistency terms, consider the summarizationdependency between DMart.AREABILL and Warehouse.CUST discussed in the source-source state term example in Section 2.4.2. Suppose that specifying when Warehouse.CUSTand DMart.AREABILL must become consistent involves the following consistency sub-terms:

1. State subterm.Whenever the average bill of all customers in the Warehouse.CUST tablehas changed by more than $100 since the last time consistency between DMart.AREABILL and Warehouse.CUST was restored. In Section 2.4.2, we specified this source-source state term as follows: “AvgCustBillChange(DWAvgCustlBill)”.

2. Temporal subterm.Whenever the current day is the first day of each month. In Section2.4.1, we specified this temporal term as follows:second(0) minute(0) hour(0) day(1)month(*) year(*) weekday(*).

3. Transaction subterm.Whenever there are 100 inserts in Warehouse.CUST. This trans-action term was specified in Section 2.4.3 as follows:transactiontype(∗) transac-tion instance(*) operationtype(insert)operationinstance(100).

Suppose that Warehouse.CUST and DMart.AREABILL must become consistent if eitherthe temporal or transaction subterm holds before the state subterm holds, i.e, the averagebill of all customers in the Warehouse.CUST has changed by more than $100 and this holdsafter the first day of the month or after 100 inserts in Warehouse.CUST. This is specifiedby the following complex consistency term:

[second(0) minute(0) hour(0) day(1) month(*) year(*) weekday(*) ORtransactiontype(*) transactioninstance(*) operationtype(insert)operationinstance(100)] PREAvgCustBillChange(DWAvgCustlBill)

2.5. Specification of events

Consistency terms can be, at least in principle, directly evaluated to determine if consis-tency between target and source objects must be restored. Therefore, it could be argued

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

140 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

that event specification is an implementation issue for consistency terms. Although ourobjective is to keep ObjectDDs implementation independent, we believe that ObjectDDsmust include event specification to allow ObjectDD designers to deal with the followingissues:

1. cost of consistency term evaluation, and2. event heterogeneity in multi-system environments.

The event predicateE in ObjectDDs addresses (1) by allowing specification of eventswhose evaluation cost is low compared to the cost of evaluating the consistency terms inC orenforcing immediate consistency. Therefore, by usingE to specify such events ObjectDDdesigners can deal with the cost of evaluating consistency terms.

In a heterogeneous multi-system environment, local events are generated by possiblyheterogeneous systems maintaining source objects. Since such local events may be het-erogeneous, we need to provide a uniform global abstraction of local events. The eventpredicateE in ObjectDDs deals with (2) by including onlycanonicalevents. These areuniform events representinglocal events (or local event compositions).

Each simple consistency term is associated with at least one or moresimplecanonicalevents. Simple events can be composed tocomplexcanonical events using the same oper-ators as complex consistency terms. This is not the only approach for composing events.Several proposals have been made recently that introduce other languages for event speci-fication. For example such languages usually support the construction of composite eventssuch as (a)disjunction: the event (E1 OR E2) is signalled when eitherE1 or E2 is sig-nalled, (b)sequence: the event (E1; E2) is signalled when an occurrence ofE1 is followedby an occurrence ofE2; and (c)closure: the event (E∗) is signalled whenE has occurreda non-zero number of times. In this paper, we have no intention to introduce another eventspecification language. An interested reader may refer to [16, 17] for details on powerfulevent specification languages.

To specify and deal with event occurrences we model canonical events as objects withstate and behavior. When such an event object is created, this is equivalent to the event beingsignaled. The state of an event contains arguments that may be important in consistencyterm evaluation. The ObjectDD enforcement system maintains a correspondence betweeneach event type and the consistency terms that must be evaluated if an event of this typeoccurs. The “create” function for each event type creates the event instance and thenstarts evaluating all consistency terms associated (waiting) for an event of this type. Thearguments that can accompany a canonical event (i.e., its state) are defined on subtypes oftypeEvent. Thus, a consistency term whose triggering event isEventwill be evaluated whenany event occurs. A consistency term associated with a event of typeEvent.Transactionwill be evaluated for any transaction-related event, and so on. Our canonical event typesinclude:

• state events• transaction events• temporal events• user-defined events

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 141

State events occur when objects are referenced. However, objects are not uniformlydefined across the various data models (relational, object-oriented, etc.). For example, ina relational database, an object state can be a column, a row, a column in a row, a table, adatabase, or any set of these. A state event occurs when a retrieve, delete, insert, updateoperation is issued on an object. In an object-oriented database, a state event occurs when anobject method is invoked. Many DBMS and object models allow definition of user-definedevents (e.g., the ORACLE DBMSs provides database alerters for defining and signalinguser-defined events). The basic list of canonical events includes delete, insert, update, andretrieve, as well as user-defined state events.

Transaction and temporal events are defined uniformly. Transaction events occur whena transaction reaches a distinct detectable stage (Begin, Commit, Abort, etc.). Temporalevents are generated by a timer.

3. Specification of transactions enforcing object dependencies

In Section 2 we noted that ObjectDDs aretransactional, i.e., they use transactions to evalu-ate their predicates and enforce object dependencies. In this section, we discuss transactionmodels and properties from the perspective of ObjectDD application requirements. In addi-tion we discuss the specification of transaction models specifically designed for ObjectDDenforcement. More specifically, in Section 3.1 we discuss traditional extended transactionmodels. In Section 3.2 we describe the use of theactivity transaction model for ECA rules[8, 9] in enforcing ObjectDDs. In Section 3.3 we discuss a framework for specification ofapplication-specifictransaction models for ObjectDD enforcement.

3.1. Traditional and extended transaction models

Traditional transactions allow data/object manipulation operations to be grouped togetherin units, and ensure that such transaction units have the so called ACID properties, i.e.,atomicity, consistency, isolation, anddurability [3]. ACID transactions are targeted to-wards traditional database application areas such as banking and airline reservations. ACIDtransactions allow sufficient reliability, performance, and functionality in such applicationdomains. However, ACID transactions do not work as well in other application domains.

The need for introducingextended transaction models(ETMs) to support specializedapplications that are not supported by the ACID transaction model provided by commercialDBMSs and TP monitors has been recognized for some time and several ETMs have beenproposed. Examples include, closed nested transactions [29], sagas [15], contracts [38],flexible transactions [12], compatible transactions [13], and multitransactions [6]. ETMsextend the traditional (ACID) transaction model to:

• allow nested transaction structure and/or transaction grouping;• use of new correctness criteria for permitting functionality necessary for application

cooperation, improving throughput, and/or dealing with the autonomy of local databasesin a multidatabase environment;• support transactional rule execution.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

142 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

ETMs proposed to allow transaction cooperation and improve throughput (e.g., reducetransaction blocking and abortion caused by transaction synchronization) relax the atomicityand isolation properties of the ACID transaction model [6, 9, 11, 13, 14, 15, 29].

ETMs introduced to deal with the autonomy of local databases in a multidatabase en-vironment supportmultidatabase transactionsconsisting of ACID transactions submittedat different local DBMSs [10, 12, 21, 38]. Since local DBMSs are autonomous, ETMsin this category were designed by integrating the ACID transaction models enforced bythe local DBMSs. Examples includeglobal serializabilitydiscussed, e.g., in [21], andquasi-serializabilitydiscussed in [10]. Since ETMs in this category must be designed to becompatible with the transaction models of the local DBMSs, they require all subtransactionsthat access local DBMSs to be ACID.

ETMs proposed to support transactional rule execution allow the division of the ruleevaluation and execution tasks to multiple transactions andcouplethese transactions to-gether in an extended transaction. For example, theactivity ETM proposed for providingtransactional semantics to Event-Condition-Action (ECA) rules [8, 9] supports the divisionof each ECA rule into up to three transactions for evaluatingE, C, and performingA,respectively. These transactions may either becoupledtogether using the closed nestedtransaction ETM [29], or be executed as different ACID transactions. Activity models areparticularly important for maintaining object dependencies, since they can be used to makeObjectDDs transactional.

Many of these ACID model extensions resulted in application-specific ETMs offeringadequate correctness guarantees in one particular application, but being too restrictive or notensuring correctness in others. This is true for the activity ETM and other data warehouse-specific ETMs we discuss in the following sections.

3.2. Object dependency enforcement using ECA rules and the activity ETM

In this section we compare ObjectDDs to ECA rules and discuss whether the activity ETM(introduced in [8, 9] to provide transactional ECA rule execution) can support arbitraryObjectDDs.

ObjectDDs and ECA rules have similarE components. The only difference is in hetero-geneous multi-system environments where theE component of ObjectDDs includes onlycanonical events as discussed in Section 2.5.

TheC component of ECA rules allows arbitrary predicates. TheC component of Object-DDs is restricted to predicates involving the consistency terms described in Section 2.2.Therefore, ECA rules can be customized to support the same consistency terms asObjectDDs.

Basic differences between ObjectDDs and ECA rules involve the following:

1. consistency evaluation and restoration actions and corresponding transactions, and2. ETMs provided for consistency evaluation and restoration transactions.

ObjectDDs support implementation-independent specification of object dependencies,i.e., specify which conditions must hold between target and source objects but do not

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 143

specify how to enforce these conditions. On the other hand, ECA rules are implementation-dependent, i.e., do not specify which conditions must hold between target and sourceobjects but provide system-specific consistency restoration actions to perform. Therefore,ObjectDDs capture object dependencies at a higher level than ECA rules.

ObjectDDs explicitly specify the transactions that restore object consistency. ECA rules,do not capture consistency restoration transactions.

The activity ETM was introduced [8, 9] to add transactions to ECA rules. The activityETM defines: (i) which transactions detect events, evaluate conditions, and/or performactions in ECA rules, and (ii) how such transactions arecoupledtogether. In particular, ECArules have three possible couplings:event-condition, condition-action, andevent-action.These determine when and how the condition is evaluated and the action is performed. Foreach coupling, four modes are provided:

• immediate—execute (the condition or action) as part of the current transaction immedi-ately;• deferred—execute as part of the current transaction but defer execution until current

transaction is ready to enter its prepared state;• detached and causally independent—execute in a separate transaction;• detached but causally dependent—execute in separate transaction but commit this trans-

action only if the current transaction commits.

The activity ETM combines these coupling modes with the closed nested transactionETM [29]. In particular, immediate and deferred execution of condition or action occurs inasubtransactionof the current transaction. The current transaction may be atop(root-level)transaction or another subtransaction. Detached execution of condition or action occurs ina new top transaction.

ECA rules together with the activity ETM can be used to provide transactional enforce-ment of ObjectDDs in a centralized system3. However, the closed nested ETM is too re-strictive (i.e., does not allow enough throughput and does not provide necessary transactionstructure) to support many ObjectDD applications.

For example, consider using closed nested transactions to enforce the consistency oftarget objects in a data warehouse and source objects in operational systems, such as in thewarehouse examples discussed in Section 2. The closed nested ETM requires warehousetransactions to acquire locks in all operational systems maintaining the source objects forthe warehouse data, and hold these locks until each warehouse transaction terminates. Sincedata warehouse transactions are often long-lived, this will cause blocking update transac-tions in the operational systems for as long as data warehouse transactions are active. This istoo restrictive in data warehouse environments where applications are read only. Therefore,the closed nested ETM (and the activity ETM using it) are not suited for such applications,since they introduce serious transaction throughput problems in the operational systems.

A less restrictive ETM can be introduced to ensure the consistency of target and sourceobjects and at the same time avoid blocking updates in operational system. We discuss suchan application-specific ETM example in Section 3.3.

If no existing ETM satisfies the requirements of an application, a new ETM must bedefined to do so. To define new ETMs we need a framework for implementation-independent

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

144 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

specification of ETMs. Using such a framework to reason about ETMs is necessary, sincereasoning in terms of locking, timestamp ordering, logging, transaction spawning, and theother heterogeneous techniques that are used to implement ETMs is extremely hard orimpossible. In the following section we discuss such a framework proposed for designingapplication-specific ETMs [18, 19], and give ETM examples designed specifically fortransactions enforcing ObjectDDs.

3.3. ETM specification principles

Specification of ETMs is based on the observation that extended transactions consist ofa set ofconstituent transactionsand a set oftransaction dependenciesbetween them.Each constituent transaction of an extended transaction is either asimpletransaction (i.e.,a transaction that has no constituent transactions) or another extended transaction (if anETM permits nesting or grouping). Each extended transactionT that is not a constituenttransaction of any other transaction has the following two kinds of transaction dependencies:

• Inter-transaction dependenciesthat define the relationships betweenT and all transac-tions that are not constituent transactions ofT .• Intra-transaction dependenciesthat define the relationships between the constituent trans-

actions ofT .

For example, consider nested transactions [29]. Intra-transaction dependencies exist be-tween a parent and its child transactions, and among sibling transactions. Inter-transactiondependencies occur betweentop transactions (i.e., nested transactions that are not con-stituent transaction of any other transaction).

To illustrate both intra- and inter-transaction dependencies in the context of an ObjectDD,consider the union dependency between BillingWest.CUST, BillingEast.CUST, and Ware-house.CUST in figure 2:

(o,O, E,C, D, τV )

where

o: Warehouse.CUSTO: {BillingWest.CUST, BillingEast.CUST}E: event.temporal(second(0) minute(0) hour(0) day(1) month(*) year(*) weekday(*))C: second(0) minute(0) hour(0) day(1) month(*) year(*) weekday(*)D: Warehouse.CUST= BillingWest.CUST∪ BillingEast.CUSTτV : {TWest, TEast}

Suppose transactionTWest retrieves the changes in BillingWest.CUST table and usesthem to update the Warehouse.CUST table. Similarly, transactionTEastapplies changes inBillingEast.CUST table to the Warehouse.CUST table. Since both these transactions accessthe data warehouse and a billing system, they aremulti-systemtransactions. Consistency

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 145

Figure 3. Structural dependencies of transactionT .

between the target Warehouse.CUST object and its source objects is restored if bothTWest

andTEastsuccessfully complete their execution (since successful completion of only one ofthese transactions does not apply all BillingWest.CUST and BillingEast.CUST updates toWarehouse.CUST table).

To capture these requirements we create a newdata warehouse-specificETM that defines:(i) the coupling of transactions in the ObjectDD transaction vector, and (ii) the isolationbetween such transaction instances and all other concurrent transaction instances. In partic-ular, our warehouse-specific ETM supports instances of an extended transactionT = (TC,T D), whereTC is the set of the constituent transactions ofT andT D is a set of transactiondependencies between them.TC is constructed by taking the union of all transactions inthe transaction vectorτV of the union dependency ObjectDD, i.e.,TC = {TWest, TEast}.T D consists of two types of transaction dependencies:transaction statedependencies andcorrectnessdependencies. Transaction correctness dependencies can be intra- and inter-transaction dependencies, while transaction state dependencies can be only intra-transactiondependencies.

Transaction state dependenciesare conditions on transaction states that define theexe-cution structureof extended transactions. Figure 3 depicts the following intra-transactiondependencies that define the execution structure ofT , assuming thatTWest andTEastcannotbegin beforeT begins, and thatTWest andTEastexecute concurrently:

1. backward-begin-begin: TWest cannot beginbefore Tbegins2. backward-begin-begin: TEastcannot beginbefore Tbegins3. backward-commit-commit: T cannot commitbefore TWest andTEastcommit4. strong-abort-begin: TWest mustbegin if TWest aborts5. strong-abort-begin: TEastmustbegin if TEastaborts

Note that dependency 3 requires thatT cannot commit unless bothTWest andTEastcom-mit. This is necessary because consistency between the target and its source objects maynot be restored unless bothTWest and TEast are able to complete their tasks successfully.Dependencies 4 and 5 specify thatTWest andTEast must be restarted if they fail to commit(e.g., due to a local deadlock). These dependencies are required because: (i) dependency 3makesT unable to commit ifTWest and/orTEastaborts, and (ii) the abortion ofTEast (TWest)after a commitment ofTWest (TEast) makeT unable to abort. SinceT cannot be rolled back,our ETM requires forward recovery. This is necessary to ensure that consistency betweentarget and source object is always maintained.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

146 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Correctness dependenciesspecify which concurrent executions (schedules) of extendedtransactions preserve consistency and produce correct results, thereby defining acorrectnesscriterion. Correctness dependencies include:

• Serialization dependencies, which specify whether the operations performed by a set oftransactions on a set of objects must be serializable, or whether the serialization order ofa transactionTi must precede the serialization order of another transactionTj .• Visibility dependencies, which define whether the operations performed by a set of trans-

actions on a set of objects must be recoverable, cascadeless, strict, rigorous, semi-rigorousetc. [3, 5, 22].• Cooperation dependencies, which define whether a set of transactions may perform

operations on a set of objects without restrictions [13, 14, 28].

Correctness dependencies may reflect application semantics or be application-inde-pendent. To illustrate these, consider again the extended transactionT that ensures theconsistency of the Warehouse.CUST object with its source BillingWest.CUST andBillingEast.CUST objects. The following dependencies are application-independent andsufficient to ensure correctness in the presence of concurrency:

• Inter-transaction serialization dependency.The schedule of operations performed byTand those performed by all committed transactions must be serializable. Note that thisdependency does not require the constituent transactions ofT to appear isolated fromeach other.• Intra-transaction serialization dependency.The schedule of operations performed by

TWest and TEast must be serializable. This dependency requiresTWest and TEast to beisolated from each other.

While application-independent correctness dependencies are sufficient to ensure correct-ness, they may impose unnecessary restrictions in applications such as our data warehouseenvironment. For example, our data warehouse-specific ETM can take advantage of ap-plication semantics to allow correctness preserving but non-serializable schedules. This isspecified by the following dependencies:

• Intra-transaction cooperation dependency. TWest and TEast can interleave without anyrestrictions, since they do not access the same data in the Warehouse.CUST table.• Inter-transaction cooperation dependency. Tcan interleave without any restrictions

(does not need to be serializable) with transactions accessing the source objects Billing-West.CUST and BillingEast.CUST. This is because: (i)T cannot violate the consistencyof BillingWest.CUST and BillingEast.CUST, sinceT does not perform updates on theseobjects, and (ii) data warehouse applications can tolerate inconsistencies resulting fromthe interleaving ofT and the transactions updating these source objects.

To illustrate such application dependent/specific correctness, consider again our datawarehouse in figure 2. Suppose a warehouse applicationM and our example extendedtransactionT behave as follows: (i)M uses warehouse data updated byT to compute the

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 147

average bill of all customers several times each day, and plots the daily mean, (ii)T is alsoexecuted several times a day, but at different times thanM , and (iii) T is not synchronizedwith the transactions updating the billing information in the operational systems to ensureserializability. If despiteT being unserializable with the transactions updating the billinginformation in the operational systems the computed mean stays “close enough” to theactual mean, application dependent correctness is preserved.

In the following section we introducetransaction dependency descriptorsfor specifyingapplication-specific ETMs for enforcing ObjectDDs.

3.4. ETM specification using transaction dependency descriptors

To specify state and correctness dependencies among transactions, we usetransactiondependency descriptors(TransactionDDs) which are defined as a 5-element tuple of theform: (Ti , t , O, En, Post) [19]. Ti is adependenttransaction;τ is the set of transactionsthat Ti depends on; andO is the set of objects the dependency must consider. IfTi is inthe transaction vector of an ObjectDD,O includes the union of target and source objectsspecified in the ObjectDD.τ together withO define that the dependency considers onlyoperations performed by transactions inτ on objects that belongO. The last two elementsare boolean-valued logical predicates. Theenabling condition(En) specifies when thepostcondition must be considered. Thepostcondition(Post)must evaluate to true wheneverthe dependency is satisfied.

A dependency descriptor can be viewed as an integrity constraint on the execution ofextended transactions. A basic requirement for dependency specification is that at any pointduring the execution of an extended transactionT , T ’s dependency descriptors must revealwhether the execution that took place until then satisfiesT ’s dependencies. However, it isnot always possible to determine whether a dependency is satisfied or not, simply becausethe available execution schedule does not provide enough information.

For example, consider the dependency “T2 cannot begin beforeT1 commits” whereT2

depends onT1. Until T2 begins, there is not enough information in the schedule to determinewhether the dependency is satisfied. To prevent dependency descriptor evaluation until thereis enough information available, the enabling condition En must be set to “T2.state= begin”.While En is false, the dependency descriptor evaluates to “don’t know”. The postconditionis evaluated if and only if the enabling condition becomes true. If the postcondition evaluatesto false, the dependency is violated; otherwise, it is satisfied. Figure 4 illustrates the results ofthe evaluation of a dependency descriptor with respect to the value of its enabling conditionand postcondition.

Figure 4. Evaluation of a dependency specification.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

148 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Due to the existence of enabling conditions, TransactionDDs provide reliable true andfalse evaluation results. This allows:

• determining specification violations as soon as the enabling condition of a TransactionDDbecomes true;• evaluating TransactionDDs before a finite execution schedule is available;• using TransactionDDs to synchronize transaction execution.

The ETM specification framework using TransactionDDs is discussed in greater detail in[19]. ObjectDDs combined with TransactionDDs allow more flexibility than ECA rules andthe activity ETM. For example, an ObjectDD may use either the activity ETM or the datawarehouse-specific ETM we defined in Section 3.3. If none of the ETMs is appropriate,additional ETMs can be defined as needed by using TransactionDDs.

In the following section we illustrate the use of TransactionDDs. In Section 3.4.1 wediscuss the specification of the data-warehouse specific ETM we described in Section 3.3.In Section 3.4.2 we provide the specification of the activity ETM. In Section 3.4.3 wediscuss other application-specific ETMs for enforcing ObjectDDs.

3.4.1. Specification of the data warehouse-specific ETM.In Section 3.3 we noted thatour data warehouse specific ETM has the following transaction state dependencies:

1. backward-begin-begin: TWest cannot begin beforeT begins2. backward-begin-begin: TEastcannot begin beforeT begins3. backward-commit-commit: T cannot commit beforeTWest andTEastcommit4. strong-abort-begin: TWest must begin ifTWest aborts5. strong-abort-begin: TEastmust begin ifTEastaborts

There are two kinds of state dependencies in this ETM:backwardandstrong.Backward state dependenciesbetween a pair of transactionsTi andTj impose condi-

tions of the following type:Ti cannot enter state X before Tj has entered state Y. Suchdependencies are defined by a descriptor of the form:

(Ti , {Tj },O, Ti .state= X,Y(Tj ) < X(Ti ))

The set of transactions thatTi depends on includes onlyTj , andO is the set of all objects.Ti .state= X indicates that the current state ofTi is X. X(Ti ) denotes the operation thatchanges the state ofTi to X. The difference betweenTi .state= X and X(Ti ) is that theexecution schedule containsX(Ti ) even ifTi leaves stateX. We use< to specify the orderof operations in the execution schedule. The dependency descriptor is enabled whenTi

enters stateX. The dependency is satisfied only ifTj has entered stateY beforeTi entersstateX, i.e.,Tj has performedY(Tj ) beforeTi issuesX(Ti ).

The following TransactionDDs specify the three backward dependencies in data ware-house specific ETM.

1. (TWest, {T}, O, TWest.state= Begin, Begin(T) < Begin(TWest))

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 149

2. (TEast, {T}, O, TEast.state= Begin, Begin(T) < Begin(TEast))

3. (T , {TEast, TWest}, O, T.state= Commit, Commit(TEast) < Commit(T) ANDCommit(TWest) < Commit(T))

Strong state dependenciesexpress conditions of the following type:Ti must enter stateX if Tj has entered state Y. They are defined by the following descriptor:

(Ti , {Tj },O, Tj .state= Y, X(Ti ))

In strong dependencies,X(Ti ) specifies an immediate operation execution. For example,the strong state dependencies in our data warehouse ETM are specified as follows:

4. (TWest, {TWest}, O, TWest.state= Abort, Begin(TWest))

5. (TEast, {TEast}, O, TEast.state= Abort, Begin(TEast))

These define thatTWest and TEast must be automatically restarted whenever they areaborted. It is important to notice that some state dependencies may not be enforceable.For example, since a transaction commit cannot be guaranteed, it may be impossible toenforce a strong dependency such as “T1 must commit ifT2 aborts”. In addition, forwarddependencies requiring that a transactionT1 cannot abort may be impossible to enforce,since a transaction may choose to abort, or the system where the transaction executes mayunilaterally abort it as a result of a local deadlock.

In Section 3.3 we noted that data warehouse specific ETM has the following correctnessdependencies:

1. inter-transaction cooperation dependency: T (and its constituent transactionsTWest andTEast) can interleave without any restrictions with transactions accessing the source ob-jects BillingWest.CUST and BillingEast.CUST.

2. inter-transaction serialization dependencies: The schedule of operations performed byT and those performed by all committed transactions that do not access the source objectsBillingWest.CUST and BillingEast.CUST must be serializable.

The basic difference between serializability and cooperative correctness criteria is in theway they defineconflicts.

Conflict definition by serializability: Traditional (conflict) serializability theory [3] con-siders that two operationspi andqj conflict if they are performed on the same object and atleast one of them is a write operation. To capture this, in the following we assume that objectoperations have an object-provided property that reveals if the operation changes the state ofthe object on which it is performed. In particular, we consider that the type ofpi (ox) iswrite,i.e.,pi (ox).op type= w, if the execution of an operationpi of transactionTi changes the stateof objectox without reading it. The type ofpi (ox) is read, i.e.,pi (ox).op type= r , if pi onlyreads the state ofox. If pi both reads and changes the state ofox it is both a read and a write.

Conflicts are defined by aconflict tablewhich is a relation that associates pairs of opera-tions with boolean predicates. Each objectox must be associated4 with a conflict table werefer to asox.conflict table. The predicateox.conflict table associated with a pair(p,q)

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

150 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

of operationsp andq is represented byox.conflict table(p, q). Operationsp andq con-flict if the predicateox.conflict table(p, q) evaluates to true. Thedefault conflict tableforserializability is defined as follows:

ox.conflict table(pi ,qj ) := [ pi .op type= w ORqj .op type= w]

Various notions of conflicts that consider the semantics of the operations (and possiblytheir return values) have been proposed in the literature [2, 23, 39, 40]. Such definitions ofconflicts can be used to allow more concurrency than the default read/write conflicts.

Conflict definition by cooperative correctness criteria: Cooperative correctness criteria[3, 13, 14, 28] use less restrictive notions of conflicts that take into account transactionsemantics. For example, consider again the transactionsTWestandTEastin Section 3.3. Thesetransactions enforce the union dependency between BillingWest.CUST, BillingEast.CUST,and Warehouse.CUST tables.TWest retrieves the changes in BillingWest.CUST table anduses them to update the Warehouse.CUST table, whileTEast applies changes in Billing-East.CUST table to the Warehouse.CUST table.

Assuming serializability as our correctness criterion, BillingWest.CUST and Billing-East.CUST use the default read/write conflict table we discussed earlier. This conflict tabledoes not permit transactions to update BillingWest.CUST (BillingEast.CUST) untilTWest

(TEast) completes its execution.The data warehouse-specific ETM in Section 3.3 allowsTWest (TEast) to interleave with-

out constraints with transactions performing billing updates in BillingWest.CUST (Billing-East.CUST). We noted that although this ETM allows non-serializable interleavings be-tweenTWest (TEast) and billing updates in BillingWest.CUST (BillingEast.CUST), it doesnot cause warehouse applications to produce incorrect results. Next we define a new con-flict table for BillingWest.CUST (BillingEast.CUST) that allowsTWest (TEast) to interleavewithout restrictions with transactions performing billing updates on BillingWest.CUST(BillingEast.CUST).

To represent these transactions, we introduce four corresponding transaction types.Billing update transactions have a typeBILLING UPDATE; TWesthas typeBILLING WESTTRANSFER; TEast has typeBILLING EASTTRANSFER; and the extended transactionTthat groupsTWest andTEast instances together has typeUNION ADJUSTMENT. To spec-ify that instances of these transactions can cooperate, we augment the default read-writeconflict table for BillingWest.CUST objects as follows:

BillingWest.CUST.conflicttable(pi , qj ) :=BillingWest.CUST.conflicttable(pi , qj )AND¬[ pi .op type= w AND qj .op type= r AND type(Ti ) = BILLING UPDATE

AND Tj ∈ T AND type(T) = UNION ADJUSTMENTAND type(Tj ) = BILLING WESTTRANSFER]

AND¬[ pi .op type= r AND qj .op type= w AND type(Tj ) = BILLING UPDATE

AND Ti ∈ T AND type(T) = UNION ADJUSTMENTAND type(Ti ) = BILLING WESTTRANSFER]

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 151

Figure 5. Immediate mode with subtransaction execution.

In this specification, we state that read and write operations do not conflict if (i) the read isperformed by a transaction of typeBILLING WESTTRANSFERand the write is performedby a transaction of typeBILLING UPDATE, and (ii) the transaction that performs the readis a constituent transaction of an extended transaction of typeUNION ADJUSTMENT.Specification of theBillingEast.CUST.conflicttable is similar.

The specification of cooperative correctness dependencies in this section illustratesthat our data warehouse-specific ETM allows only transactions of specific types (e.g.,BILLING WESTTRANSFERandBILLING UPDATE) to cooperate. Thus, our ETM spec-ifications can directly capture application semantics.

3.4.2. Specification of the activity ETM. In this section we illustrate that TransactionDDsare powerful enough to specify the coupling modes defined for the activity ETM.

ETM specification for immediate: We illustrate the immediate mode graphically infigure 5. In the immediate event-condition coupling,T is the current transaction thatexecutes some statements and then invokes an operationp which signals an event e causingthe activation of an ECA rule. This creates a subtransactionTs that evaluates the ECA rulecondition. If the condition evaluates to true, the coupling mode between condition-actiondetermines how the action is executed.

In the immediate condition-action coupling,T is the transaction that evaluates the con-dition, whileTs performs the action. Finally, in the immediate event-action coupling,T isthe current transaction andTs evaluates the condition and performs the action.

The following TransactionDDs specify the transaction state dependencies betweenT andTs as defined by the immediate coupling mode:

(Ts, {T}, O, Ts.state= Begin, Begin(T) < begin(Ts))

(T , {Ts}, O, T.state= Abort, Abort(Ts) < Abort(T))(T , {Ts}, O, T.state= Commit, Commit(Ts) < Commit(T))(Ts, {T}, O, T.state= Abort, Abort(Ts))

In addition to structure dependencies betweenT and Ts, TransactionDDs can specifywhen an ECA rule createsTs. Since the ECA rule event predicateE becomes true whenan event triggers the rule, the following TransactionDD specifies the point in time theimmediate event-condition and event-action couplings createTs:

(Ts, { }, O, E = true, Begin(Ts))

TheTs creation for immediate condition-action coupling is specified as follows (C is theECA rule condition):

(Ts, { }, O, C = true, Begin(Ts))

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

152 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Figure 6. Deferred mode with subtransaction execution.

Figure 7. Detached and causally independent mode.

Figure 8. Detached and causally dependent mode.

ETM specification for deferred: We illustrate the deferred mode graphically in figure 6.Specification of transaction state dependencies betweenT andTs in the deferred mode isthe same as in the immediate coupling mode. The following TransactionDDs describe whenan ECA rule createsTs in the deferred coupling mode:

(Ts, { }, O, C = true AND Prepare(T), Begin(Ts)) for deferred condition-action coupling(Ts, { }, O, E = true AND Prepare(T), Begin(Ts)) for event-condition and event-action

ETM specification for detached and causally independent: The detached and causallyindependent mode is illustrated in figure 7.Td is a separate transaction detached fromT .

In this coupling modeTd cannot begin beforeT . This is specified by the flowing Tran-sactionDD:

(Td, {T}, O, Td.state= Begin, Begin(T) < begin(Td))

The creation ofTd occurs any time afterE or C become true.

ETM specification for detached but causally dependent: This mode is depicted in figure 8.The detached but causally dependent mode requiresTd to commit only if T commits.

The following TransactionDDs specify the transaction state dependencies betweenT andTd as defined by the detached but causally dependent mode:

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 153

(Td, {T}, O, Td.state= Begin, Begin(T) < begin(Td))

(Td, {T}, O, Td.state= Commit, Commit(T) < Commit(Td))

(Td, {T}, O, Ts.state= Abort, Abort(Td))

In addition to these transaction state dependencies, the detached but causally depen-dent mode requires that the detached transaction be serialized after the current transaction.This can be enforced by (i) delaying the start of the detached transaction until the currenttransaction commits, or (ii) beginning the detached transaction immediately, but enforcingserializability and synchronizing the commit of the two transactions.

3.4.3. Other application-specific ETMs for object dependencies.In Section 3.4.2, wenoted that ECA rule coupling modes as defined by the activity ETM can be readily expressedusing TransactionDDs. From this, it is clear that the ECA coupling modes are compositefeatures, composed of primitives captured by TransactionDDs. Additional application-specific ETMs can be specified using TransactionDDs.

To illustrate the kinds of application specific ETMs that can be developed for a datawarehouse environment, consider again the union dependency we discussed in Section3.3. In restoring consistency between BillingWest.CUST, BillingEast.CUST, and Ware-house.CUST, the consistency restoration transaction(s) must perform the following tasks:

t1: retrieves the changes in BillingWest.CUSTt2: applies the BillingWest.CUST changes to Warehouse.CUSTt3: retrieves the changes in BillingEast.CUSTt4: applies the BillingEast.CUST changes to Warehouse.CUST

In the data warehouses-specific ETM we introduced in Section 3.4.1, we group these tasksin two transactionsTWest andTEast as follows: TWest = {t1, t2} andTEast = {t3, t4}. Theactivity ETM groups{t1, t2, t3, t4} in ECA rule actions. Additional application specificETM can be defined by grouping these tasks into transactions as follows:

1. {t1}, {t2}, {t3}, and{t4}2. {t1}, {t3}, and{t2, t4}3. {t2}, {t4}, and{t1, t3}

As in the data warehouse ETM, the semantics of data warehouse union dependenciesallow transactions performingt1 and/ort3 to have cooperative dependencies with transac-tions accessing BillingWest.CUST and BillingEast.CUST in the operational systems. Thespecification of these ETMs using TransactionDDs is straightforward.

4. Updateability of target objects

Our basic requirements in the management of dependent objects are to: (i) schedule andcontrol asynchronous transactions to restore object consistency whenever inconsistencybetween dependent objects becomes unacceptable, and (ii) ensure that applications do not

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

154 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Figure 9. Consistency problems if we allow updates on objecto2.

access dependent objects that are in a state of unacceptable inconsistency. If updatesare performed on source objects that are not target objects in any ObjectDD, updates areguaranteed to propagate to all the dependent objects. Therefore, in this case dependent objectconsistency is maintained. However, if a regular transaction (not a consistency restorationtransaction) updates a target object in some ObjectDD, this may introduce inconsistencieswhich cannot be corrected by this ObjectDD and its consistency restoration transactions.

For example consider an objecto2 which participates in two ObjectDDs: as a target inObjectDD1, and as a source in ObjectDD2 (figure 9).

Consider a regular transactionT that updateso2. SupposeT introduces a tolerableinconsistency betweeno1 ando2 (T does not violate ObjectDD1), but introduces a non-tolerable inconsistency betweeno2 ando3. This will cause enforcement of ObjectDD2,i.e., ObjectDD2 will propagateT ’s update ono2 to o3. Therefore, ifT does not introducean non-tolerable inconsistency betweeno1 ando2, updatingo3 is necessary to maintain theconsistency ofo2 ando3, as well aso1 ando3 (indirectly). However, ifT introduces annon-tolerable inconsistency betweeno1 ando2, updatingo3 may further violate consistency,since in addition to makingo1 ando2 inconsistent such an update may violate the consistencybetweeno1 ando3 as defined by ObjectDD1 and ObjectDD2 combined. This applies toObjectDDs defined by the same or different ObjectDD definers.

One possible solution to the above problem is to disallow regular transaction updateson target objects. That is, regular transactions can update only source objects that arenot target objects in any ObjectDD. Only consistency restoration transactions can updatetarget objects as specified by ObjectDDs. This is an acceptable solution for applicationswhere target object access is read-only. For example, consider a data warehouse whereoperational data are source objects and warehouse data are read-only target objects. Sinceregular updates are applied only to operational data, ObjectDDs reliably propagate updatesto warehouse data. Although this may be flexible enough for applications such as datawarehouses, this solution is too restrictive for applications that allow regular transactionsto perform updates on target objects.

There are two alternative solutions to this problem. The first is to allow regular transac-tions to perform updates on target objects as long as they introduce tolerable inconsistenciesbetween target and source objects. For example, in figure 9 we can allowo2 updates if theydo not introduce a non-tolerable inconsistency betweeno1 ando2. If such updates producea non-tolerable inconsistency betweeno2 ando3, this inconsistency is can be corrected byObjectDD2. However, the stricter the ObjectDD1 consistency specification is, the smallertheo2 update margin is. This may severely limit target updates.

The second solution is the most general since it involves defininginverseObjectDDsfrom updateable target objects to their source objects. This is illustrated in figure 10.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 155

Figure 10. Consistency problems if we allow updates on objecto2.

Inverse ObjectDDs make the original target object a source, and make each of the originalsource objects a target. Inverse ObjectDDs maintain consistency while allowing targetupdates by regular transactions if the following requirements hold:

1. The consistency restoration transactions performed by an ObjectDD and those performedby its inverse ObjectDDs, arevital to each other, i.e., they all commit or none commits,and

2. The source updates can be determined from the target update.

Requirement (1) can be easily satisfied by defining appropriate ObjectDDs for vitalconsistency restoration transactions and enforcing them (this has been done in the end ofSection 3.4.2, since causally dependent transactions are vital). Requirement (2) is similarto the view update problem in databases. Therefore, it is not unique to inverse ObjectDDsand cannot be always satisfied.

5. A system for managing object dependencies

In this section we discuss the architecture of a system for specifying and enforcing objectdependencies. In particular, we focus on automatic enforcement of object dependenciesthat require “relaxed” consistency between dependent objects. We discuss a conceptualdesign of a genericDependency Management System(DMS) and its major components.We then concentrate our discussion on specific design alternatives and techniques we mightuse in a DMS.

5.1. Object dependency management systems

Figure 11 illustrates the conceptual representation of a DMS. The dependent objects aredepicted by circles and the dependencies formed between them by arrows. These objects,are under the control of separatelocal systems, i.e., software systems that maintain objectsstored at a given site. The DMS detects all events that may affect the consistency of dependentobjects. Following the event detection, the DMS performs consistency term evaluation todetermine whether there are non-tolerable inconsistencies between target and source objects.If the consistency between dependent objects is violated beyond the tolerated limits, theDMS executes the specified consistency restoration transactions to bring the consistency ofdependent objects back to acceptable levels.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

156 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Figure 11. Conceptual representation of a DMS.

Figure 12. Major DMS components.

In the following sections we focus on DMS environments where local systems arelocaldatabase systems(LDBSs). Just likemultidatabase systems(MDBSs), such DMSs resideon top of possibly heterogeneous LDBSs, have similar layered component structure, andsupportsynchronous(regular) multidatabase transactions. Unlike MDBSs, DMSs providefunctionality for managing ObjectDDs and supportasynchronous(consistency evaluationand restoration) transactions.

5.2. DMS architecture

Figure 12 illustrates the major components of a DMS. The lower layer consists of LDBSswith their local transaction managers, schedulers and data managers. The higher layer iscomprised ofDependency Subsystems(DSs). There is a DS located on top of each LDBS.Each DS is responsible for monitoring events occurring in its LDBS and for maintaining theconsistency of target objects stored in its local LDBS. Consistency is achieved by submittingconsistency restoration transactions to the LDBS that update the target objects on behalf ofthe DS. All Dependency Subsystems are interconnected and communicate with one anotherto monitor, evaluate and preserve the consistency of dependent objects.

5.2.1. ObjectDD distribution and replication. The DMS stores and manages ObjectDDmetadata, such as the ObjectDD definitions, activation status, and auxiliary objects neededto evaluate consistency terms and perform consistency restoration transactions. ObjectDDmetadata can be distributed and/or replicated to increase DMS efficiency and ObjectDDavailability. Therefore, a DMS must allow ObjectDD definers/administrators to determinea strategy for distributing and/or replicating ObjectDDs over participating DSs.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 157

Basic ObjectDD distribution and replication strategies include the following:

• storing and managing ObjectDDs at the sites of their source objects• storing and managing ObjectDDs at the sites of their target objects• replicating ObjectDDs at all sites containing their source and target objects

To discuss the advantages and disadvantages of these approaches we use the termlocalsiteto refer to the DS site that stores and maintains an ObjectDD. If a DS site does not storeand manage a specific ObjectDD, but this site contains source or target objects referencedby this object DD, we use the termremote siteto refer to it.

The main advantage of storing and managing ObjectDDs at the sites of their source ob-jects is that the DMS can take advantage of local event detection. In particular, if the localsite of an ObjectDD is the site whereall its source objects are located, the DS at this site canmonitor local events and evaluate the ObjectDD consistency terms without communicatingwith other DSs. Therefore, this approach allows more DMS efficiency when consistencyrestoration is relatively rare compared to consistency term evaluation (since restoring con-sistency involves communicating with a remote site containing the target object).

If the source objects of an ObjectDD are located at different sites, the ObjectDD de-finer/administrator must choose one of the source object sites to store and maintain theObjectDD. For example, ObjectDD administrator may store the ObjectDD at the site wherethe majority of its sources are located, or the site that contains the source object(s) that isexpected to have a higher degree of event activity associated with it.

The main advantages of storing and managing ObjectDDs at the sites of their targetobjects are: (i) no administrative overhead in distribution, since there is only one targetobject in each ObjectDD, (ii) consistency restoration involves only the local site. The maindisadvantage of this approach is communication overhead to deliver event occurrences tothe site that stores and manages the ObjectDD.

Another strategy is to replicate each ObjectDDs at all sites containing its source andtarget objects. This allows location transparency in managing ObjectDDs at the expenseof additional network overhead during ObjectDD definition and update. This strategyinvolves no administration overhead in distributing ObjectDDs and no DMS overhead inlocating ObjectDDs. Another significant advantage of this strategy is that it can be easilyimplemented using a commercial replication server to automatically and reliably replicateObjectDDs to all DS sites containing their source and target obejcts. Therefore, ObjectDDreplication simplifies the DMS design.

In the following sections we will assume that ObjectDDs are not replicated, i.e., they arestored and managed at the site they are created according to one of ObjectDD distributionstrategies discussed in this section.

5.2.2. Components of a dependency subsystem.Each DS consists of a collection of soft-ware components that manage dependent objects. Figure 13 illustrates these components.A DS performs the following tasks for each ObjectDD:

• monitors the occurrence of events on local source objects;• communicates with all other DSs to monitor the occurrence of events on remote source

objects;

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

158 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Figure 13. Components of a dependency subsystem.

• evaluatesE;• evaluates the consistency terms inC andC itself;• submits consistency restoration transactions to maintain the consistency of target objects

within specified tolerance.

Each DS in a DMS consists of five distributed components: TheObject DependencySpecification Facility(ODSF) supports the definition of ObjectDDs. TheMonitor de-tects the occurrence of specified events on dependent objects and evaluates theE (eventpredicate) component of ObjectDDs. TheConsistency Manager(CM) evaluates theC(consistency predicate) component of ObjectDDs. If the consistency predicate of an Ob-jectDD is violated, the CM initiates the execution of consistency restoration transactionsby submitting them to theProgrammable Transaction Management Mechanism(PTMM).The PTMM is responsible for controlling the concurrent execution of restoration trans-actions. In addition, the PTMM ensures the fault tolerance of the DMS by dealing withtransaction and site failures. The PTMM interleaves the execution of consistency restora-tion transactions (asynchronous transactions) and regular transactions submitted directly byapplications/users (synchronous transactions). Application specific ETMs are defined usingthe Transaction Dependency Specification Facility(TDSF) that supports TransactionDDdefinition (as we discussed in Section 3.3). Specified ETMs are enforced by the PTMM.

The DS architecture can be simplified if consistency restoration transactions do not requireapplication-specific ETMs, i.e., the consistency restoration transactions are not extendedtransactions designed to take advantage of specific applications semantics. In this case, thereis no need for TDSF. In addition, the PTMM can be replaced by a TMM similar to that pro-vided by a TP monitor. A more detailed discussion about each component is presented next.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 159

5.2.3. Object dependency specification facility.The distribution of ObjectDDs among thevarious participating sites in a DMS leads naturally to the creation of a distributed ODSF.There is an ODSF component in each DS. Each of these ODSF components is responsiblefor the maintenance of the ObjectDDs at that site and consists of two major subcomponents:the ObjectDD Administrator and the ObjectDD Catalog.

The ObjectDD Administratorprovides lifecycle and activation services to ObjectDDs.The ObjectDD lifecycle operations include:insert, delete, andmodify, whereas the activa-tion operations are:enableanddisable. When a new ObjectDD is created it is assigned anobject identifier that uniquely identifies the new ObjectDD in all DSs. The local ConsistencyManager is informed, to initialize an ObjectDDdb for the new ObjectDD. Furthermore, re-mote DSs may need to be notified when a new ObjectDD is created. In particular, the ODSFmust notify the Monitors at remote sites storing the source objects to inform the Monitor atthe site of the new ObjectDD of any updates to its source objects.

The ObjectDD Catalogmaintains the activation status of each ObjectDD stored in thelocal site. When the ObjectDD Administrator issues a disable or enable operation on anObjectDD, its status is updated in the ObjectDD Catalog. When an ObjectDD is enabled,all events pertinent to this ObjectDD must be observed when they occur. Therefore, theMonitors in the local DS and all remote DS sites containing source objects must be notifiedto start watching for these events. In a similar fashion, these remote Monitors must also benotified to cease monitoring events for descriptors that have been disabled.

5.2.4. Monitor. In a DMS, there is a monitor component at each DS. These monitorsobserve primitive state, temporal, and transaction events. Monitors maintain anevent listthat contains all primitive events that each monitor should detect and anevent detectionmechanismthat actually detects the events in the event list. To describe monitor behavior,consider a monitor at some specific DS site. If the Monitor observes an event specified intheE component of a locally maintained ObjectDD, the monitor evaluatesE. If E evaluatesto true, the Monitor informs the local Consistency Manager. If a Monitor observes an eventspecified in theE component of an ObjectDD maintained at a remote site, the Monitorforwards the event to the Monitor at the ObjectDD site.

We note that detection of events in a DMS is more complicated when compared to anactive database system. The main reasons for such complexity are:

• Distribution: A DMS is inherently distributed over several sites, thus events are moredifficult to detect. To the best of our knowledge, there has been limited research towardsdistributed active systems. Only centralized active databases have been designed andimplemented.• Heterogeneity and autonomy: The Monitor in each Dependency Subsystem must utilize

existing detection capabilities that are provided by the LDBS. DMSs are built on thetop of existing LDBSs, therefore we are limited by the tools provided by the underlyingLDBSs. Some commercial LDBSs do not provide event detection mechanisms, thus wemust resort to additional techniques to detect events under such conditions.

In the following paragraphs we discuss how DS Monitors in a DMS detect state andtransaction events that occur at their sites5. We discuss detection techniques for such events

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

160 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

generated by LDBSs that provide trigger capabilities and those LDBSs that do not. Weassume that LDBSs view monitors as ordinary users with all the privileges and limitationsof a proxy account.

Detecting events on LDBSs that provide triggering capabilities.Many LDBSs supporttriggerswhich contain anEventand anAction part. A trigger is fired by the LDBS whenthe triggering event occurs and the action is executed. Triggers that restrict actions to bedatabase operations cannot notify the local Monitor and they are calledrestricted, whereastriggers that allow actions beyond database operations (e.g., user provided procedures orprograms) are calledunrestricted.

If a LDBS provides restricted triggers, the local monitor can be notified of events indirectlyby creating anactive tablefor every (passive) table we wish to monitor. Active tables aretables in append-only mode, similar to the ones used in [33]. When any access is madeto the passive table, the action component of the trigger appends the access information(database operations and their corresponding arguments) to the active table. Theseactivetablesare polled regularly by the local Monitor to detect local events.

A LDBS with unrestricted triggering capabilities (e.g., SYBASE trigger actions, or OR-ACLE pipes) can invoke DMS-provided procedures or programs to directly notify theirlocal monitor of event occurrences.

Detecting events on LDBSs without triggering capabilities.In this case, we must modifythe DBMS software, probably violating the autonomy of the LDBS. For example, [37]discusses an approach that can be applied to DBMS with a client-server architecture. Thisinvolves modification of theSql Connectcall to intercept the database operations sent tothe DBMS server, enabling the analysis of each submitted DML and DDL command toidentify the events generated by the operation. This technique can be extended to notifythe Monitor when such events are detected.

If it is not possible to modify the DBMS software, an alternate approach is to modify theapplications to detect events. In this case the code to inform the Monitor of the occurrenceof an event may be added to each application. While not providing a high degree of dataindependence, such a solution should be preferred to incorporating the integrity checkingcode itself into every application.

5.2.5. Consistency manager.Every time a Monitor detects an event that makes theEcomponent of a ObjectDD true, it notifies the local consistency manager. The consistencymanager evaluates theC component of the ObjectDD to determine whether to restoredconsistency between target and source obejcts.

The Consistency Manager evaluates simple consistency terms as follows:State terms.To evaluate state terms, the Consistency Manager implements theObject

DDdb database and the “LASTCONSISTENT” view qualifier, as discussed in Section2.2.2. In particular, the CM creates a different ObjectDDdb for each ObjectDD whoseCcomponent includes state consistency terms. The ObjectDDdb databases are used to createand maintain the view definitions specified in each ObjectDD. As discussed in Section2.2.2, view definition in ObjectDDs has slightly different semantics than a view definitionin a relational DBMS. Basic differences include: (i) ObjectDD views are heterogeneous

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 161

multidatabase views, and (ii) ObjectDDdb views are visible only to the ObjectDD for whichthey are specified, not to other ObjectDDs or regular transactions. This is accomplished bycreating a new ObjectDDdb database for each ObjectDD and placing the view definitionsin this ObjectDD in the schema of its (private) ObjectDDdb.

The CM implements the “LASTCONSISTENT” view qualifier by materializing theviews specified in each ObjectDD every time consistency between the ObjectDD target andsource objects is restored. In a relational ObjectDDdb implementation, materialized viewdata are stored in separate tables (if view materialization produces multiple rows), or asrows in the same table (if view materialization produces a single row or a single value).These materialized view objects, are ordered according to the time of their creation (e.g.,by assigning them timestamps or inserting their identifier in a table ordered by time ofcreation). As discussed in Section 2.2.2, this creates a “LASTCONSISTENT” dimensionin each ObjectDD.

Transaction terms: The consistency manager keeps a log for evaluating consistencyterms. Each log entry has the same fields as transaction consistency terms (i.e., trans-action type, transactioninstance, operationtype, operationinstance). A straightforwardapproach for maintaining such a log is to store it as data in a DBMS (e.g., as a table inthe ObjectDDdb of each ObjectDD), and take advantage of the DBMS query capability inevaluating transaction terms.

Temporal terms: These are evaluated using a facility such as the Unix crontab.The Consistency Manager evaluates complex consistency terms as follows: When the

evaluation of a simple consistency term is completed, the consistency manager replacesthe evaluated term with its boolean value. Consistency subterms of the formcx PREcy are evaluated in the specified precedence order, i.e.,cy is evaluated aftercx. Next, theConsistency Manager evaluates the subterms that have all their subterms replaced by booleanvalues. This is repeated until all subterms are eventually evaluated. The evaluation of the toplevel of subterms in a complex consistency term determines complex consistency term value.

Whenever theC component of an ObjectDD evaluates to true, the consistency managersubmits the ObjectDD consistency restoration transactions to the LDBSs through the localcomponent of the PTMM.

5.2.6. Programmable transaction management mechanism.To enforce specified ETMs,the PTMM implements a set of distributed transactional services. The PTMM services aredepicted in figure 14 and described in detail in [20]. PTMM-provided services include:(1) a transaction structure(TS) service used to enforce the transaction structure depen-dencies required by each ETM, such as the coupling modes in the activity ETM discussedin Section 3.4, and (2) aglobal correctness(GC) service used to enforce ETM-definedcorrectness criteria over multiple LDBSs, such as global serializability in a multidatabaseenvironment. A TS service is implemented using simple event-action rules. In its simplestconfiguration the GC service is similar to a TP monitor Transaction Manager.

In-depth discussion of the interfaces, and behavior of PTMM services is beyond the scopeof this paper. The PTMM is discussed extensively in [18–20, 22].

DMS consistency managers can submit asynchronous transactions that behave accordingto an ETM defined by the TDSF and are enforced by the PTMM. At the same time, other

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

162 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

Figure 14. TSME architecture.

applications can submit synchronous transactions that behave according the same ETM ora different ETM. If asynchronous and synchronous transactions use different ETMs, theseETMs must be integrated using the TDSF.

5.2.7. Transaction dependency specification facility.The TDSF supports the definitionof application-specific ETMs. In particular, the TDSF accepts specifications of ETMs ex-pressed in terms of TransactionDD and conflict table definitions, as described in Section 3.3.The TDSF provides lifecycle and activation services for TransactionDDs, and maintains aTransactionDD repository.

ETM specifications are defined by a transaction model designer. A transaction modeldesigner is the individual or group of individuals who determine the appropriate transactionmodel for an application. To provide an appropriate transaction model, the individual inthis role must have a clear understanding of the application correctness and reliabilityrequirements, concurrency control and recovery issues, and of various transaction modelsand the correctness and reliability they provide.

6. Related work and commercial products

The architecture of the DMS that we presented in this paper, is similar to the architecture ofan Interdependent Database System (IDS), presented in [27], where dependencies amongobjects are specified using Data Dependency Descriptors (D3) and enforcement of thedependencies is performed using polytransactions and the Activity model. Unlike theearlier IDS, the DMS allows a simpler implementation for consistency term evaluationthat straightforward extensions in a relational DBMS. In addition, the DMS supports thedefinition and enforcement of application-specific ETMs.

ObjectDDs can be viewed as an extension of the ECA rule paradigm to allow application-specific ETMs. This accomplished by combining ObjectDDs with TransactionDDs.

There are two kinds of commercial products that can partially deal with data dependen-cies: data warehouse and replication server products.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 163

Data warehouse products assume that data consistency is restored by explicitly reloadingthe warehouse database. This is a manual process that is too costly if only some of thewarehouse data need to become consistent with their original data. On the other hand,if consistency restoration of data is delayed beyond the point where the correctness ofwarehouse application results is guaranteed, results based on old or invalid data may belead to incorrect decisions. This suggests that data warehouse updates must be done: (i) infiner granularity as needed to maintain the consistency of the warehouse data, and (ii)concurrently with regular transactions accessing operational data. The DMS addressesboth these requirements.

Database replication servers support concurrent updating of finer granularity data repli-cated in different databases. However, unlike the DMS, databases replication servers sup-port only replication dependencies (e.g., they do not support dependency management forsummarization dependencies). Furthermore, they do not support declarative specificationof data dependencies and tend to enforce consistency immediately. Although some com-mercial replication servers also allow consistency restoration at specific points in time andpermit user-defined consistency restoration procedures, it is the responsibility of applica-tion developers to decide how to use these facilities. Furthermore, when such decisions aremade, data relationships and consistency restoration policies are buried in the applicationcode. These problems are addressed by ObjectDDs and the DMS.

7. Summary and conclusions

In this paper, we discussed problems and corresponding solutions associated with enterprise-wide specification and management of dependent objects. To perform enterprise-widemanagement of all dependent data, organizations need to solve the following problems:(i) identify dependent data in legacy systems, new operational systems, and data ware-houses and specify their dependencies, and (ii) manage dependent data and maintain theirconsistencyautomatically.

To solve the first problem we introduced ObjectDDs for the declarative specificationof dependent objects. ObjectDDs capture the relationships between dependent objects,the limits of tolerated inconsistencies between them, and transactions that restore theirconsistency if it is violated beyond the acceptable limits. Important features of ObjectDDsinclude:

• Declarative specification of the relationship between dependent objects.• Transaction enforcement of consistency restoration actions.• Integration with a framework for specifying application-specific transaction models for

consistency restoration transactions.• Use of specifications to automatically schedule transactions that maintain the consistency

between dependent objects.

Object dependencies are enforced using transactions. Although traditional ACID trans-actions can guarantee correctness and reliability of dependent objects, many applicationsthat involve object dependencies do not require such strict properties. Application-specific

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

164 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

ETMs relax the ACID restrictions to allow required functionality and efficiency without vio-lating object consistency. To support the definition of such ETMs we used TransactionDDs.These are complimentary to ObjectDDs. The combined ObjectDD and TransactionDDframework is the only framework we are aware of that: (i) allows declarative specifica-tion of object dependencies and corresponding consistency requirements, (ii) integratesthe specification of transaction and object dependencies, and (iii) supports the definitionand management of application-specific ETMs for both asynchronous and synchronoustransactions.

To automatically maintain the consistency of dependent objects, we introduced the con-cept of anDependency Management System(DMS). A DMS monitors dependent objects,evaluates their consistency, and uses consistency restoration transactions to keep them withinacceptable consistency levels. In addition, DMS allows the definition and enforcement ofapplication specific ETMs for consistency restoration transactions. We presented the con-ceptual design and architecture of DMS and discussed a simple implementation involvingstraightforward extensions performed on a relational DBMS. In the absence of frameworksfor reasoning about object dependencies and of products with DMS capabilities, this paperprovides a basis for developing a system for performing enterprise-wide management ofdependent objects.

Notes

1. Mutual consistency requirements using timing constraints were introduced in [25, 41] and were further exploredin [7, 34, 42].

2. Related work includes the specification of coherency condition used to define “how far” primary copy andquasi-copies can diverge based on time, versions, and arithmetic conditions [1].

3. The same discussion applies to triggers supported by commercial DBMSs, such as Sybase, Oracle, Informix,and Illustra.

4. For efficiency reasons, several objects, a class of objects, or even several objects and classes may be associatedwith the same conflict table. This issue is discussed further in [24].

5. Temporal events can be detected using a mechanism similar to the crontab service in UNIX systems.

References

1. R. Alonso, D. Barbara, and H. Garcia-Molina, “Data caching issues in an information retrieval system,”ACM-TODS, vol. 15, no. 3, Sept. 1990.

2. B.R. Badrinath and K. Ramamritham, “Semantics-based concurrency control: Beyond commutativity,” inProceedings of 3rd International Conference on Data Engineering, 1987.

3. P. Bernstein, J. Rothnie, N. Goodman, and C. Papadimitriou, “The concurrency control mechanism of SDD-1:A system for distributed databases (The fully redundant case),” IEEE Trans. on Software Engineering, vol. 4,no. 3, May 1978.

4. P.A. Bernstein, V. Hadzilacos, and N. Goodman, Concurrency Control and Recovery in Database Systems,Addison-Wesley, 1987.

5. Y. Breitbart, D. Georgakopoulos, M. Rusinkiewicz, and A. Silberschatz, “On rigorous transaction scheduling,”IEEE Trans. on Software Engineering, vol. 17, no. 9, Sept. 1991.

6. A. Buchmann, M. Ozsu, M. Hornick, D. Georgakopoulos, and F. Manola, “A transaction model for activedistributed object systems,” Database Transaction Models for Advanced Applications, A. Elmagarmid (Ed.),Morgan-Kaufmann, 1992.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

INTERDEPENDENT DATA IN OPERATIONAL SYSTEMS 165

7. S. Chakravarthy, B. Blaustein et al., “HiPAC: A research project in active, time-constrained database man-agement,” Tech. Report, XEROX (XAIT), Cambridge, MA, July 1989.

8. U. Dayal, M. Hsu, and R. Ladin, “Organizing long-running activities with triggers and transactions,” inProceedings of the ACM SIGMOD Conf. on Management of Data, 1990.

9. U. Dayal, M. Hsu, and R. Ladin, “A transactional model for long-running activities,” in Proceedings of the17th International Conference on VLDB, 1991.

10. W. Du and A. Elmagarmid, “QSR: A correctness criterion for global concurrency control in interbase,” inProceedings of the 15th International Conference on VLDB, 1989.

11. A. Elmagarmid (Ed.), Database Transaction Models for Advanced Applications, Morgan-Kaufmann,1992.

12. A. Elmagarmid, Y. Leu, W. Litwin, and M. Rusinkiewicz, “A multidatabase transaction model for interbase,”in Proceedings of the 16th Int. Conf. on VLDB, 1990.

13. A. Farrag and T. Ozsu, “Using semantic knowledge of transactions to increase concurrency,” ACM Transac-tions on Database Systems, vol. 14, no. 4, Dec. 1989.

14. H. Garcia-Molina, “Using semantic knowledge for transaction processing in a distributed database,” ACMTrans. on Database Systems, vol. 8, no. 2, June 1983.

15. H. Garcia-Molina and K. Salem, “SAGAS,” in Proceedings of ACM SIGMOD Conf. on Management of Data,1987.

16. S. Gatziu and K. Dittrich, “Events in an active object-oriented database system,” Technical Report 93.11,Institut fur Informatik der Universitat Zurich, Zurich, Switzerland, 1993.

17. N. Gehani, H. Jagadish, and O. Shmueli, “Event specification in an active object-oriented database,” inProceedings of ACM SIGMOD Conference, June 1992.

18. D. Georgakopoulos and M. Hornick, “An environment for the specification and management of extendedtransactions and workflows in DOMS,” Tech. Report, TR-0218-09-92-165, GTE Laboratories Incorporated,Oct. 1992.

19. D. Georgakopoulos and M. Hornick, “A framework for enforceable specification of extended transaction mod-els and transactional workflows,” International Journal of Intelligent and Cooperative Information Systems,World Scientific, Sept. 1994.

20. D. Georgakopoulos, M. Hornick, Piotr Krychniak, and Frank Manola, “Specification and management ofextended transactions in a programmable transaction environment,” in Proceedings of the 10th Int. Conf. onData Engineering, Houston, TX, Feb. 1994.

21. D. Georgakopoulos, M. Rusinkiewicz, and A. Sheth, “Using ticket-based methods to enforce the serializabilityof multidatabase transactions,” IEEE Trans. on Data and Knowledge Engineering, Feb. 1994.

22. D. Georgakopoulos, M. Hornick, and Frank Manola, “Customizing transaction models and mechanisms ina programmable environment supporting reliable workflow automation,” IEEE Transactions on Knowledgeand Data Engineering, vol. 8, no. 4, Aug. 1996.

23. M. Herlihy, “Apologizing versus asking permission: Optimistic concurrency control for abstract data types,”ACM Transactions on Database Systems, vol. 15, no. 1, 1990.

24. M. Hornick and D. Georgakopoulos, “Extending heterogeneous transaction systems to support application-specific requirements,” Tech. Report, TR-0241-12-93-165, GTE Laboratories Incorporated, Dec. 1993.

25. M. Hsu, R. Ladin, and D. McCarthy, “An execution model for active data base management systems,” inProceedings of the 3rd International Conference on Data and Knowledge Bases, June 1988.

26. W. Inmon, “Data warehouse defined,” Computerworld, March 1995. Special Advertising Supplement.27. G. Karabatis, Management of Interdependent Data in a Multidatabase Environment: A Polytransaction

Approach, Ph.D. thesis, University of Houston, May 1995.28. N. Lynch, “Multilevel atomicity: A new correctness criterion for database concurrency control,” ACM Trans.

on Database Systems, vol. 8, no. 4, Dec. 1983.29. E. Moss, Nested Transactions, MIT Press: Cambridge, Mass., 1985.30. A. Radding, “Support decision makers with a data warehouse,” Datamation, March 1995.31. D. Rinaldi, “Metadata management separates prism from data warehouse pack,” Client/Server Computing,

Mar. 1995.32. M. Rusinkiewicz, A. Sheth, and G. Karabatis, “Specifying interdatabase dependencies in a multidatabase

environment,” IEEE Computer, vol. 24, no. 12, Dec. 1991.

P1: STR/PMR/TKL/SRK P2: PMR/SFI P3: PMR/SFI QC: PMR/BSA T1: PMR

Distributed and Parallel Databases KL429-01-Georga February 27, 1997 9:50

166 GEORGAKOPOULOS, KARABATIS AND GANTIMAHAPATRUNI

33. U. Schreier, H. Pirahesh, R. Agrawal, and C. Mohan, “Alert: An architecture for transforming a passiveDBMS into an active DBMS,” in Proceedings of the 17th VLDB Conference, Sept. 1991.

34. A. Sheth and P. Krishnamurthy, “Redundant data management in bellcore and BCC databases,” Tech. ReportTM-STS-015011, Bellcore, Dec. 1989.

35. A. Sheth, Y. Leu, and A. Elmagarmid, “Maintaining consistency of interdependent data in multidatabasesystems,” Technical Report CSD-TR-91-016, Computer Sciences Department, Purdue University, March1991.

36. E. Simon and P. Valduriez, “Integrity control in distributed database systems,” in Proceedings of the 20thHawaii International Conference on System Sciences, 1986.

37. E. Simon, J. Kiernan, and C. de Maindreville, “Implementing high level active rules on top of a relationalDBMS,” in Proceedings of the 18th VLDB Conference, 1992.

38. H. Wachter and A. Reuter, “Contracts: A means for extending control beyond transaction boundaries,”Database Transaction Models for Advanced Applications, A. Elmagarmid (Ed.), Morgan-Kaufmann, 1992.

39. W. Weihl, “Commutativity-based concurrency control for abstract data types,” IEEE Transactions on Com-puters, vol. 37, no. 12, Dec. 1988.

40. W. Weihl, “Local atomicity properties: Modular concurrency control for abstract data types,” ACM Transac-tions on Programming Languages and Systems, vol. 11, no. 2, 1989.

41. G. Wiederhold and X. Qian, “Modeling asynchrony in distributed databases,” in Proceedings of the IEEEInternational Conference on Data Engineering, Feb. 1987.

42. G. Wiederhold and X. Qian, “Consistency control of replicated data in federated databases,” in Proceedingsof the Workshop on the Management of Replicated Data, Houston, TX, Nov. 1990.


Recommended