MC0077 - Unit 10

1. MC0077-Unit-10-Transaction Management in Distributed Database Management Unit-10-Transaction Management in Distributed Database Management

Structure:

10.1 Introduction

10.2 Features of Distributed vs. Centralized Databases or Differences in Distributed & Centralized Databases

10.2.1 Centralized Control vs. Decentralized Control

10.2.2 Data Independence

10.2.3 Reduction of Redundancy

10.2.4 Complex Physical Structures and Efficient Access

10.2.5 Integrity, Recovery and Concurrency Control

10.2.6 Privacy and Security

10.2. 7 Distributed Query Processing

10.2.8 Distributed Directory (Catalog) Management

Self Assessment Questions

10.3 Relative Advantages of Distributed Databases over Centralized Databases:

10.3.1 Organizational and Economic Reasons

10.3.2 Incremental Growth

10.3.3 Reduced Communication Overhead

10.3.4 Performance Considerations

10.3.5 Reliability and Availability

10.3.6 Management of Distributed Data with Different Levels of Transparency

10.3.6.1 Distribution or Network Transparency

10.3.6.2 Replication Transparency

10.3.6.3 Fragmentation Transparency


10.4 Problem Areas of Distributed Databases

10.5 Structure of Distributed Database Management Systems

Self Assessment Question

10.6 Transaction Processing Framework


10.7 Models of Failure

10.8 Two-Phase Commit Protocol


10.9 Recovery in Two-Phase Commit Protocol

10.9.1 Site Failures

10.9.2 Failure of Coordinator

10.9.3 Network Partitions


10.10 Elimination of “Prepare” Message

10.10.1 Increasing Efficiency by Using Defaults

10.10.2 Remote Recovery Information Problem


10.11 Three-Phase Commit Protocol

10.11.1 Recovery in Three-Phase Commit Protocol

10.11.1.1 Site Failures

10.11.1.2 Failure of the Coordinator:

10.11.1.3 Coordinator Failure Protocol


10.12 Classification of Concurrency Control Techniques

10.13 Two-Phase Locking Algorithm

10.14 Concurrency Control (Serializability)


10.15 Locking Protocols for Concurrency Control in Distributed Databases

10.15.1 Single-Lock-Manager Approach

10.15.2 Multiple Coordinators

10.15.3 Majority Protocol

10.15.4 Biased Protocol

10.15.5 Primary Copy


10.16 Concurrency Control Techniques

10.16.1 Timestamp-Based Algorithms

10.16.2 Conservative Timestamp Ordering Algorithm

10.16.3 Optimistic Algorithm


10.17 Deadlock Handling

10.17.1 Deadlock Prevention

10.17.2 Deadlock Detection

10.17.3 Centralized Deadlock Detection

10.17.4 Hierarchical Deadlock Detection

10.18 Distributed Deadlock Detection

10.18.1 False Deadlock


10.19 Summary

10.20 Terminal Questions

10.1 Introduction

Distributed Database is a collection of data which belong to the same system but are spread over the sites of a computer network. This definition implies:

1. Distribution: that data are not resident at the same site.

2. Logical Correlation: that data have some properties which tie them together.

Objectives:

Objectives of this unit are:

· Provide detailed coverage on transaction related issues in distributed database environment

· Provide a comparative detail of distributed databases with the centralized database systems

· Provide understanding of concurrent transactions and mechanisms to deal with them

10.2 Features of Distributed vs. Centralized Databases or Differences in Distributed & Centralized Databases

10.2.1 Centralized Control vs. Decentralized Control

In centralized control one "database administrator" ensures safety of data whereas in distributed control, it is possible to use hierarchical control structure based on a "global database administrator" having the central responsibility of whole data along with "local database administrators", who have the responsibility of local databases.

10.2.2 Data Independence

In central databases it means the actual organization of data is transparent to the application programmer. The programs are written with "conceptual" view of the data (called "Conceptual schema"), and the programs are unaffected by physical organization of data. In Distributed Databases, another aspect of "distribution dependency" is added to the notion of data independence as used in Centralized databases. Distribution Dependency means programs are written assuming the data is not distributed. Thus correctness of programs is unaffected by the movement of data from one site to another; however, their speed of execution is affected.

10.2.3 Reduction of Redundancy

In centralized databases redundancy was reduced for two reasons: (a) inconsistencies among several copies of the same logical data are avoided, (b) storage space is saved. Reduction of redundancy is obtained by data sharing. In distributed databases data redundancy is desirable as (a) locality of applications can be increased if data is replicated at all sites where applications need it, (b) the availability of the system can be increased, because a site failure does not stop the execution of applications at other sites if the data is replicated. With data replication, retrieval can be performed on any copy, while updates must be performed consistently on all copies.

10.2.4 Complex Physical Structures and Efficient Access

In centralized databases complex accessing structures like secondary indexed, interfile chains are used. All these features provide efficient access to data. In distributed databases efficient access requires accessing data from different sites. For this an efficient distributed data access plan is required which can be generated either by the programmer or produced automatically by an optimizer.

Problems faced in the design of an optimizer can be classified in two categories:

a) Global optimization consists of determining which data must be accessed at which sites and which data files must consequently be transmitted between sites.

b) Local optimization consists of deciding how to perform the local database accesses at each site.

10.2.5 Integrity, Recovery and Concurrency Control

A transaction is an atomic unit of execution and atomic transactions are the means to obtain database integrity. Failures and concurrency are two dangers of atomicity. Failures may cause the system to stop in midst of transaction execution, thus violating the atomicity requirement. Concurrent execution of different transactions may permit one transaction to observe an inconsistent, transient state created by another transaction

during its execution. Concurrent execution requires synchronization amongst the transactions, which is much harder in all distributed systems.

10.2.6 Privacy and Security

In traditional databases, the database administrator, having centralized control, can ensure that only authorized access to the data is performed.

In distributed databases, local administrators face the same as well as two new aspects of the problem; (a) security (protection) problems because of communication networks is intrinsic to database systems. (b) In certain databases with a high degree of "site autonomy" may feel more protected because they can enforce their own protections instead of depending on a central database administrator.

10.2.7 Distributed Query Processing

The DDBMS should be capable of gathering and presenting data from more than one site to answer a single query. In theory a distributed system can handle queries more quickly than a centralized one, by exploiting parallelism and reducing disc contention; in practice the main delays (and costs) will be imposed by the communications network. Routing algorithms must take many factors into account to determine the location and ordering of operations. Communications costs for each link in the network are relevant, as also are variable processing capabilities and loadings for different nodes, and (where data fragments are replicated) trade-offs between cost and currency. If some nodes are updated less frequently than others there may be a choice between querying the local out-of-date copy very cheaply and getting a more up-to-date answer by accessing a distant location. The ability to do query optimization is essential in this context – the main objective being to minimize the quantity of data to be moved around. With single-site databases one must consider both generalized operations on internal query representations and the exploitation of information about the current state of the database.

10.2.8 Distributed Directory (Catalog) Management

Catalogs for distributed databases contain information like fragmentation description, allocation description, mappings to local names, access method description, statistics on the database, protection and integrity constraints (consistency information) which are more detailed as compared to centralized databases.


1. How distributed databases differ from a centralized one?

2. Explain, how physical data independence is achieved in Databases.

3. What do you mean by the catalog management?

10.3 Relative Advantages of Distributed Databases over Centralized Databases

10.3.1 Organizational and Economic Reasons

Many organizations are decentralized, and a distributed database approach fits more naturally the structure of the organization. The organizational and economic motivations are amongst the main reasons for the development of distributed databases. In organizations already having several databases and feeling the necessity of global applications, distributed databases is the natural choice.

10.3.2 Incremental Growth

In a distributed environment, expansion of the system in terms of adding more data, increasing database size, or adding more processors is much easier.

10.3.3 Reduced Communication Overhead

Many applications are local, and these applications do not have any communication overhead. Therefore, the maximization of the locality of applications is one of the primary objectives in distributed database design.

10.3.4 Performance Considerations

Data localization reduces the contention for CPU and I/O services and simultaneously reduces access delays involved in wide are networks. Local queries and transactions accessing data at a single site have better performance because of the smaller local databases. In addition, each site has a smaller number of transactions executing than if all transactions are submitted to a single centralized database. Moreover, inter-query and intra-query parallelism can be achieved by executing multiple queries at different sites, or breaking up a query into a number of sub queries that execute in parallel. This contributes to improved performance.

10.3.5 Reliability and Availability

Reliability is defined as the probability that a system is running (not down) at a certain time point. Availability is the probability that the system is continuously available during a time interval. When the data and DBMS software are distributed over several sites, one site may fail while other sites continue to operate. Only the data and software that exist at the failed site cannot be accessed. This improves both reliability and availability. Further improvement is achieved by judiciously replicating data and software at more than one site.

10.3.6 Management of Distributed Data with Different Levels of Transparency In a distributed database, following types of transparencies are possible:

10.3.6.1 Distribution or Network Transparency

This refers to freedom for the user from the operational details of the network. It may be divided into location and naming transparency. Location transparency refers to the fact that the command used to perform a task is independent of the location of data and the location of the system where the command was issued. Naming transparency implies that once a name is specified, the named objects can be accessed unambiguously without additional specification.

10.3.6.2 Replication Transparency

Copies of the data may be stored at multiple sites for better availability, performance, and reliability. Replication transparency makes the user unaware of the existence of copies.

10.3.6.3 Fragmentation Transparency

Two main types of fragmentation are Horizontal fragmentation, which distributes a relation into sets of tuples (rows), and Vertical Fragmentation which distributes a relation into sub relations where each sub relation is defined by a subset of the column of the original relation. A global query by the user must be transformed into several fragment queries. Fragmentation transparency makes the user unaware of the existence of fragments.


1. What are the advantages of distributed databases over centralized databases?

2. Explain the concept of Network Transparency.

3. What is the significance of a fragment? How Fragmentation transparency is achieved?

10.4 Problem Areas of Distributed Databases

Following are the crucial areas in a Distributed Database environment that needs to look into carefully in order to make it a successful. We shall be discussing these in much detail in following sections:

1. Distributed Database Design

2. Distributed Query Processing

3. Distributed Directory Management

4. Distributed Concurrency Control

5. Distributed Deadlock Management

6. Reliability in Distributed DBMS

7. Operating System Support

8. Heterogeneous Databases

10.5 Structure of Distributed Database Management Systems

A distributed database management system supports the creation and maintenance of distributed databases.

A) Software components which are typically necessary for building a distributed database are

1. The database management component (DB) or Data Processor

2. The data communication component (DC) or User Processor

3. The data dictionary (DD), which is extended to represent information about the distribution of data in the network.

4. The distributed database component (DDB)

It should be noted that DDB is only the specialized distributed database component, and this component is not there in conventional database management systems. User Processor consists of following parts, and is shown in figure below:

1) The ‘User Interface Handler’ is responsible for interpreting user commands as they come in, and formatting the result data as it is sent to the user.

2) The ’semantic Data Controller’ uses the integrity constraints and authorizations that are defined as part of the global conceptual schema to check if the user query can be processed.

3) The ‘Global Query Optimizer and Decomposer’ determines an execution strategy to minimize a cost function, and translates the global queries into local ones using the global and local conceptual schemas as well as the global directory. The global query optimizer is responsible for generating the best strategy to execute distributed join operations.

4) The ‘Distributed Execution Monitor’ coordinates the execution of the user request. The execution monitor is also called the ‘Distributed Transaction Manager’. In executing queries in a distributed fashion, the execution monitors at various sites communicate with one another.

The ‘Data Processor’ consists of three elements:

1) The ‘Local Query Optimizer’ which acts as the ‘Access Path Selector’ is responsible for choosing the best access path to access any data item.

2) The ‘local recovery manager’ is responsible for making sure that the local database remains consistent even when failures occur.

3) The ‘run-time support processor’ physically accesses the database according to the physical commands in the schedule generated by the query optimizer. The run-time support processor is the interface to the operating system and contains the database buffer (or cache) manager, which is responsible for maintaining the main memory buffers and managing the data accesses.

The distributed execution monitor consists of two modules: a transaction manager (TM) and a scheduler (SC).

The ‘transaction manager’ is responsible for coordinating the execution of the database operations on behalf of the application (at the transaction’s originating site). The transaction managers implement an interface for the application programs which consists of commands: begin_transaction, read, write, commit or abort. In providing these services, a TM can communicate with SC’S and data processors at the same or at different sites.

The ’scheduler or Transaction coordinator’ is responsible for the implementation of a specific concurrency control algorithm for synchronizing access to the database. Another component that participates in the management of distributed transactions is the local recovery managers that exist at each site. Their function is to implement the local procedures by which the local database can be recovered to a consistent state following a failure. The access to a remote database by an application can be performed in one of the two basic ways.

In the first method, the application’s request for a access to a remote data is routed by DDBMS to the site where data is located. Request is executed at the remote site and the result is returned. In this way, the basic units which are shipped between systems are the database access primitive and the result obtained by executing this primitive. In this approach distribution transparency can be implemented by providing global filenames.

In the second approach, application requires the execution of an auxiliary program at the remote site. This auxiliary program written by the application programmer, accesses the remote database and returns the result to the requesting application. This scheme can be more efficient if many database accesses are required, because the auxiliary program can perform all the required accesses and send only the result back

B) Another property of DDBMS is whether they are homogeneous or heterogeneous.

Homogeneous DDBMS refers to a DDBMS with the same DDBMS at each site, even if the computers and/or the operating systems are not the same.

Heterogeneous DDBMS use instead at least two different DBMSs. Such systems add the problem of translating between the different data models of the different local DDBMSs

to the complexity of homogeneous DDBMSs. Although the problems of heterogeneous DDBMSs are very hard, commercially available systems provide some degree of support for heterogeneity. While none of these systems performs translation between different data models, which is really hard problem, some of them support the communication between different data communication (DC) components. This type of communication was not developed especially for building distributed databases, but was developed for compatibility reasons within a centralized system.


1. What are the major areas of concern in a distributed database environment?

2. Explain the structure of a Distributed Database.

10.6 Transaction Processing Framework

A transaction is always part of an application. At some time after its invocation by the user, the application issues a begin_transaction primitive; from this moment, all actions which are performed by the application, until a commit or abort primitive is issued, are to be considered part of the same transaction. Alternatively, the beginning of a transaction is implicitly associated with the beginning of the application, and commit/abort primitive ends a transaction and automatically begins a new one, so that explicit begin_transaction primitive is not necessary.

In order to perform functions at different sites, a distributed application has to execute several processes at these sites. Let us call these processes as agents of application. An agent is therefore a local process which performs some actions on behalf of an application.

Any transaction must satisfy the four properties,

Atomicity: Either all or none of the transaction’s operations are performed. In other words if a transaction is interrupted by a failure, its partial results are undone.

Consistency Preservation: A transaction is consistency preserving if its complete execution takes the database from one consistent state to another.

Isolation: Execution of a transaction should not be interfered with by any other transactions executing concurrently. It should appear that a transaction is being executed in isolation from other transactions. An incomplete transaction cannot reveal its results to other transactions before its commitment. This property is needed in order to avoid the problem of cascading aborts.

Durability (Permanency): Once a transaction has committed, the system must guarantee that the results of its operations will never be lost, independent of subsequent failures.

Since the results of a transaction, which must be preserved by the system, are stored in the database, the activity of providing the transaction’s durability is called database recovery.

Goals of Transaction Management in a Distributed Database: Efficient, reliable and concurrent execution of transactions. These three goals are strongly interrelated; moreover, there is a trade-off between them.

In order to cooperate in the execution of the global operation required by the application, the agents have to communicate. As they are resident at different sites, the communication between agents is performed through messages. Assume that

1) There exists a root agent which starts the whole transaction, so that when the user requests the execution of an application, the root agent is started; the site of the root agent is called the site of origin of the transaction.

2) The root agent has the responsibility of issuing the begin_transaction, commit and abort primitives.

3) Only the root agent can request the creation of a new agent.

In order to build a distributed transaction manager which implements global primitives begin_transaction, commit and abort, it is convenient to assume that we have at each site a ‘local transaction manager(LTM)’ which is capable of implementing local transactions.

Let us take the example of fund transfer to demonstrate the application of above reference model

FUND_TRANSFER:

Read(terminal,$AMOUNT,$FROM_ACC,$TO-ACC)

Begin_transaction;

Select AMOUNT into $FROM_AMOUNT

from ACCOUNT

where ACCOUNT_NUMBER = $FROM_ACC;

if $FROM_AMOUNT – $AMOUNT < 0 then abort

else begin

Update ACCOUNT

set AMOUNT = AMOUNT – $AMOUNT

where ACCOUNT = $FROM_ACC;

Update ACCOUNT

set AMOUNT = AMOUNT + $AMOUNT

where ACCOUNT = $TO_ACC;

Commit

end

a) The FUND_TRANSFER transaction at the global level

Note: The above reference model is a conceptual model for understanding at which level an operation belongs and is not necessarily an implementation structure.

ROOT_AGENT;

Read(terminal,$AMOUNT,$FROM_ACC,$TO-ACC)

Begin_transaction;

Select AMOUNT into $FROM_AMOUNT

From ACCOUNT

where ACCOUNT_NUMBER = $FROM_ACC;

if $FROM_AMOUNT – $AMOUNT < 0 then abort

else begin

Update ACCOUNT

set AMOUNT = AMOUNT – $AMOUNT

where ACCOUNT = $FROM_ACC;

Create AGENT1;

Send to AGENT1($AMOUNT,$TO_ACC);

Commit

end

AGENT1:

Receive from ROOT_AGENT($AMOUNT,$TO_ACC);

Update ACCOUNT

set AMOUNT = AMOUNT + $AMOUNT

where ACCOUNT = $TO_ACC;

b) The FUND_TRANSFER transaction constituted by two agents

When a begin_transaction is issued by the root agent, the DTM will have to issue a local_begin primitive to the LTM at the site of origin and at all the sites at which there are already active agents of the same application, thus transforming all agents into sub-transactions; from this time on the activation of a new agent by the same distributed transaction requires that the local_g=begin be issued to the LTM where the agent is activated, so that the new agent is created as a sub-transaction.


1. What do you mean by a transaction?

2. What are the properties of a Transaction? List and explain.

3. What are the goals of transaction management in a distributed database?

10.7 Models of Failures

Failures can be classified as

1) Transaction Failures

a) Error in transaction due to incorrect data input.

b) Present or potential deadlock.

c) ‘Abort’ of transactions due to non-availability of resources or deadlock.

2) Site Failures: From recovery point of view, failure has to be judged from the viewpoint of loss of memory. So failures can be classified as

a) Failure with Loss of Volatile Storage: In these failures, the content of main memory is lost; however, all the information which is recorded on disks is not affected by failure. Typical failures of this kind are system crashes.

b) Media Failures (Failures with loss of Nonvolatile Storage): In these failures the content of disk storage is lost. Failures of this type can be reduced by replicating the information on several disks having ‘independent failure modes’.

Stable storage is the most resilient storage medium available in the system implemented by replicating the same information on several disks with (i) independent failure modes, and (ii) using the so-called careful replacement strategy, at every update operation, first one copy of the information is updated, then the correctness of the update is verified, and finally the second copy is updated.

3) Communication Failures: There are two basic types of possible communication errors: lost messages and partitions.

When a site X does not receive an acknowledgment of a message from a site Y within a predefined time interval, X is uncertain about the following things:

i) Did a failure occur at all, or is the system simply slow?

ii) If a failure occurred, was it a communication failure, or a crash of site Y?

iii) Has the message been delivered at Y or not? (as the communication failure or the crash can happen before or after the delivery of the message.)

Network Partition

Thus all failures can be regrouped as

i) Failure of a site

ii) Loss of message(s), with or without site failures but no partitions.

iii) Network Partition: Dealing with network partitions is a harder problem than dealing with site crashes or lost messages.

10.8 Two-Phase Commit Protocol

The basic idea of 2PC is to determine a unique decision for all participants with respect to committing or aborting all the local sub-transactions. If a participant is unable to locally commit its sub-transaction, then all participants must locally abort.

The goal of the first phase of the protocol is to reach a common decision; the goal of the second phase is to implement this decision.

Phase One: In this phase coordinator asks all the participants to ‘prepare for commitment’; each participant answers ‘READY’ if it is ready to commit and willing to do so. Before sending the first ‘prepare for commitment’ message, the coordinator records on stable storage a log record of a new type, called a "prepare" log record, in which the identifiers of all sub-transactions participating to the 2-phase-commitment are recorded. The coordinator also activates a ‘timeout’ mechanism, which will interrupt the coordinator after that a given time interval is expired.

When a participant answers ‘ready’, it ensures that it will be able to commit the sub-transaction even if failures occur at that site. This implies that each participant has to record on stable storage two things:

1. All the information which is required for locally committing the sub-transaction. This means that all the log records of the sub-transaction must be recorded on stable storage.

2. The fact that this sub-transaction has declared to be ready to commit. This means that a log record of a new type, called a "ready" log record, must be recorded on stable storage. The coordinator decides whether to commit or abort the transaction as a result of the answers which it has received from the participants. If all participants have answered "ready", it decides to commit the transaction. If instead some participant has answered "abort" or has not yet answered when the timeout expires, it decides to abort the transaction.

Phase Two: The coordinator begins the second phase of 2PC by recording on stable storage its decision. This corresponds to writing a "global_commit" or "global_abort" record in the log.

The fact that the coordinator records its decision on stable storage means that the distributed transaction will eventually be committed or aborted, in spite of failures. Then the coordinator informs all participants of its decision, by sending them the command message. All the participants write a commit or abort record in the log, based on the command message received from the coordinator. From this moment, the local recovery procedure is capable of ensuring that the effect of the sub-transaction will not be lost.(All log records related to the sub-transaction can be taken offline after the next checkpoint.)

All participants send a final acknowledgment (ACK) message to the coordinator, and perform the actions required for committing or aborting the sub-transactions. When the coordinator has received an ACK message from all participants, it writes a log record of a new type, called a "complete" record. After having written this record, the coordinator can forget the outcome of the transaction; thus all records related to this transaction can be taken offline after the next checkpoint. Note that ACK message is a regular message of the protocol which informs that the command has been recorded in stable storage.

The 2-phase-commitment protocol is resilient to all failures in which no log information is lost.

10.9 Recovery in Two-Phase Commit Protocol

10.9.1 Site failures

Assume that site Sk fails and subsequently recovers. The following cases will have to considered and action taken accordingly.

a) Site Sk fails before responding with a "ready" message to coordinator, it is assumed to have responded with an "abort T" message.

b) Site Sk fails after coordinator has received a "ready" message from the site, the rest of the commit protocol is executed in a normal fashion, ignoring the site failure. When site Sk recovers from a failure, it examines its log to determine the fate of those transactions that were in the midst of execution when failure occurred. If the log of the site contains

i) <commit T> record, site executes "redo" for transaction.

ii) <abort T> record, site executes "undo" for transaction.

iii) <ready T> record,

If coordinator is up,

site Si asks the coordinator to determine the fate of transaction; coordinator will notify whether the transaction committed or aborted; in case of commit, site Sk will execute "redo", else it executes "undo".

If coordinator is down,

site Sk sends a "query-status" message to all sites in the system. Other sites on receiving such a message will consult their log if that transaction was executed at that site or not. Site Sk is informed of the information as per the records of other sites that the transaction was committed or not.

If no other site responds with the status of commit or abort, then site Sk will periodically resend the "query status" message to the other sites, till a site recovers which has the information of transaction. Note that coordinator site will have this information.

iv) Log does not contain any control records of either commit, or abort or ready concerning the transaction, that means site Sk failed before responding to <prepare T> message of coordinator, so the site will do "undo".

10.9.2 Failure of coordinator

If the coordinator fails in the midst of execution of protocol for transaction T, then the participating sites must decide the fate of the fate of T. In some cases sites cannot decide

whether to commit or abort T, and therefore these sites must wait for the recovery of the failed coordinator.

If an active participating site contains in its log

i) <commit T> record, then T must be committed.

ii) <abort T> record, then T must be aborted.

iii) if any participating and active site does not contain <ready T> record in its log, then the coordinator could not have decide to commit T, because a site that does not have a <ready T> record in its log could not have sent the <ready T> message to coordinator, hence T must be aborted and it is not necessary to wait for coordinator to recover.

iv) If all participating sites have <ready T> record in their logs and no other record like <commit T> or <abort T>, then the decision of the coordinator to commit or abort T cannot be known until the coordinator recovers. Thus all participating sites must wait till coordinator recovers and the transaction T will continue to hold system resources. Such a situation is called "blocking".

10.9.3 Network Partitions

When a network partitions, two possibilities exist:

a) The coordinator and all its participants remain in one partition. In such a case the failure has no effect on the commit protocol.

b) The coordinator and its participating sites belong to one or more partitions. Sites in the partition having the coordinator feel that some other participating site has failed, and hence execute the usual protocol. Sites that are in the partition not having the coordinator feel that the coordinator has failed, and hence execute the protocol to deal with the failure of coordinator.

Thus the major disadvantage of 2PC protocol is that coordinator failure may result in blocking, where a decision either to commit or to abort may have to be postponed until coordinator recovers.


1. What are the types of failures that can occur in a distributed database environment?

2. Discuss the phases of a Two-Phase commit protocol with significance of each.

3. What is network partitioning? How recovery is done using two-phase commit protocols?

10.10 Elimination of “Prepare” Message

We had assumed that the whole process is started by the coordinator when the main execution of the transaction is terminated. This is not necessary and the first phase of these protocols can be incorporated in the transaction execution. When agents finish performing their operations, they can return the READY message immediately, without waiting for a prepare message. In fact, when the root agent knows that all required operations have been performed and the transaction was not unilaterally aborted, it also knows that all agents are ready to commit.

But this approach has a serious disadvantage; once a participant declares to ready to commit, it will remain in the state until the whole transaction is terminated. Therefore, a higher probability of blocking exists, and site autonomy is reduced.

10.10.1 Increasing efficiency by using defaults

The modified protocols are based on the idea of assuming by default that a transaction is committed (or aborted) if no information about it is found in the log. These protocols are called "presumed commit" and "presumed abort". This is used in R* system.

10.10.2 Remote Recovery Information Problem

If a participant fails after sending "READY" message but before receiving the command from coordinator (or a case of lost command message), the participant has to send an enquiry message to know the outcome of the transaction. The most straight forward solution is that the participant sends its inquiry to the coordinator. However, the coordinator might have failed when the participant recovers. It is possible to let the participant wait and later retry the enquiry; with this solution the participant remains blocked. Two more complex alternatives exist:

a) Redirecting the Enquiry: The enquiry can be sent from the recovering site to other participants. If at least one participant has received the decision, then all other participants can find out the decision from it. This means, that in case of a network partition, all the participant-groups in which at least one participant has received the command can complete the transaction. This solution requires that information about terminated transactions be maintained also at the participant’s site.

b) Spooling the Command Message: This approach consists in assigning to each site i a set of spooling sites S(i). A spooling site has the responsibility of receiving and storing the messages which are directed to site i while it is down. When site i recovers, it receives from one of the sites of S(i) all messages which were directed to it.

With this method, the coordinator sends the command message for a crashed participant to all its spoolers, and receives the ACK messages from them. When the participant recovers, it receives the decision from the spoolers and does not interact with the

coordinator. If K spoolers are used, K-resiliency is obtained, independently of the number of participants. This method is used in SDD-1 system.


1. How we can eliminate the prepare message in a two-phase commit protocol?

10.11 Three-Phase Commit Protocol

3-phase protocol described here has the following requirements:

1. No network partition occurs.

2. While 3PC is in progress for a transaction, at most K participating site can fail, where K indicates the resiliency of the protocol to site failures.

3. At any point, at least K+1 sites must be up.

Let T is a transaction initiated at site Si and transaction coordinator at Si be Ci.

Phase 1: This phase is the same as phase 1 of 2PC protocol.

Phase 2: If Ci receives an "abort" message from a participating site, or if Ci receives no response within a pre-specified interval from a participating site, then Ci decides to abort T. The abort decision is implemented in the same way as is the 2PC protocol. If Ci receives "ready" message from all participating sites, Ci makes the preliminary decision to "pre-commit T". "Pre-commit" differs from commit in that T may still be aborted eventually. The pre-commit decision allows the coordinator to inform each participating site that all participating sites are ready. Ci adds a "pre-commit T" record to log and forces the log onto the stable storage. Then Ci sends a "pre-commit" message to all participating sites. When a site receives a message from the coordinator (either abort T or commit T), it records that message in its log, forces this information to stable storage, and sends a message acknowledge T to the coordinator.

Phase 3: This phase is executed only if the decision in phase 2 was to pre-commit. After the "pre-commit T" messages are sent to all participating sites, the coordinator must wait until it receives at least K acknowledge T messages.Then the coordinator reaches a commit decision. It adds a "commit T" record to its log, and forces the log onto the stable storage. Then Ci sends a "commit T" message to all participating sites. When a site receives that message, it records the information in its log.

10.11.1 Recovery in 3-Phase Commit Protocol

10.11.1.1 Site failures

If the coordinator Ci detects that a site has failed, the actions taken are similar to that of 2PC protocol. If site fails before responding with a "ready T" message to Ci, it is assumed to have responded with an "abort T" message.

When a participating site Sj recovers from a failure, it must examine its log to determine the fate of those transactions that were in the midst of execution when the failure occurred. Let T be one such transaction. We consider each of the following cases:

i) The log contains a <commit T> record. In this case, the site executes "Redo(T)".

ii) The log contains an <abort T> record. In this case, the site executes undo(T).

iii) The log contains a <ready T> record, but no <abort T> or <pre-commit T> record.

In this case, the site attempts to consult Ci to determine the fate of T.

a) If Ci responds with a message that T aborted, the site executes "undo(T)".

b) If Ci responds with a message <pre-commit T> , the site (as in phase 2) records this information in its log, and resumes protocol by sending an <acknowledge T> message to the coordinator.

c) If Ci responds with a message that T committed, the site executes "redo T".

d) If Ci fails to respond within a pre-specified time, the site executes a coordinator failure protocol.

iv) The log contains a <pre-commit T> record, but no <abort T> or <commit-T> record. As before, the site consults Ci.

a) If Ci responds with a message that T aborted, the site executes "undo(T)".

b) If Ci responds with a message <pre-commit T>, the site (as in phase 2) records this information in its log, and resumes protocol by sending an <acknowledge T> message to the coordinator.

c) If Ci responds with a message that T committed, the site executes "redo T".

d) If Ci fails to respond within a pre-specified time, the site executes a coordinator failure protocol.

10.11.1.2 Failure of the coordinator

Failure of the Coordinator: When a participating site fails to receive a response from the coordinator, it executes the "coordinator failure protocol". This protocol results in the selection of a new coordinator. When the failed coordinator recovers, it does so in the

role of a participating site. It no longer acts as coordinator; rather, it must determine the decision that has been reached by the new coordinator.

10.11.1.3 Coordinator Failure Protocol

The coordinator failure protocol is triggered by a participating site that fails to receive a response from the coordinator within a specified interval. Since we assume no network partition, the only possible cause for this situation is the failure of the coordinator.

1) The active participating sites select a new coordinator using an election protocol.

2) The new coordinator, Cnew, sends a message to each participating site requesting the local status of T.

3) Each participating site, including Cnew, determines the local status of T;

a) Committed: The log contains a <commit T> record.

b) Aborted: The log contains an <abort T> record.

c) Ready: The log contains a <ready T> record, but contains no <abort T> or <pre-commit T> record.

d) Pre-committed: The log contains a <pre-commit T> record, but contains no <abort T> or <commit T> record.

e) Not ready: The log contains neither a <ready T> nor an <abort T> record. Each participating site sends its local status to Cnew.

4) Depending on the responses received, Cnew decides either to commit or abort T, or to restart the 3C protocol:

a) If at least one site has local status = committed, the Cnew commits T.

b) If al least one site has local status = aborted, then Cnew aborts T. (Note that it is not possible for some site to have local status = committed while another has local status = aborted.)

c) If no site has local status = aborted, and no site has local status = committed, but at least one site has local status = pre-commited, then Cnew resumes the 3PC protocol by sending new pre-commit messages.

d) Otherwise, Cnew aborts T.

The coordinator-failure protocol allows the new coordinator to obtain knowledge about the state of the failed coordinator Ci. If any active site has a <commit T> in its log, then

Ci must have decided to commit. If an active site has <pre-commit T> in its log, then Ci must have reached a preliminary decision to pre-commit T, and that means that all sites, including any that may have failed, have reached "ready" states. It is therefore, safe to commit T. However, Cnew does not commit T unilateraly; doing so would create the same blocking problem, if Cnew fails, as in 2PC. It is for this reason that phase 3 is resumed by Cnew. Consider the case where no active site has received a pre-commit message from Ci. We must consider three possibilities:

a) Ci had decide to commit T prior to Ci failing.

b) Ci had decided to abort T prior to Ci failing.

c) Ci had not yet decided the fate of T.

We shall show that the first of these three alternatives is not possible, and that, therefore, it is safe to abort T.

Suppose that Ci had decided to commit T. Then at least K sites must have decided to pre-commit T and have sent acknowledgement messages to Ci. Since Ci has failed and we assume that at most K sites fail while the 3PC protocol is executed for a transaction, at least one of the K sites must be active, and hence at least one active site would inform Cnew that it has received a pre-commit message. Thus, if no active site had received a pre-commit message, a commit decision certainly could not have been reached by Ci. Therefore, it is indeed safe to abort T. It is possible that Ci had not decided to abort T, so it amy still be possible to commit T. However, detecting that Ci had not decided to abort T would require waiting for Ci (or for some other failed site that had received a pre-commit message) to recover. Hence, the protocol aborts T if no active site has received a pre-commit message.

In the preceding discussion, if more than K sites could fail while the 3PC protocol is executed for a transaction, it may not be possible for the surviving participants to determine the action taken by Ci prior to failing; this situation would force would force blocking to occur until Ci recovers. Although a large value of K is best from this standpoint, it forces a coordinator to wait for more responses before deciding to commit – thus delaying routine (failure-free) processing. Further, if fewer than K participants (in addition to the coordinator) are active, it may not be possible for coordinator to complete the commit protocol, resulting in blocking. Thus, the choice of a value for K is crucial, as it determines the degree to which the protocol avoids blocking.

Our assumption of network partition is crucial to our discussion. It is impossible to differentiate between network failure and site failure. Thus network partitioning could lead to the election of two new coordinators (each of which believes that all sites in partitions other than its own have failed). The decisions of the two coordinators may not agree, resulting in the transaction being committed in some sites while being aborted in others.


1. Explain the underlying concept of three-phase commit protocol.

2. How recovery is performed using three-phase commit protocol?

10.12 Classification of Concurrency Control Techniques

There are number of ways to classify the concurrency control approaches. Some of these are:

1) Mode of Database Distribution; fully replicated database or partially replicated or partitioned databases.

2) According to Network Topology, such as those requiring a communication subnet with broadcasting capability or those working in a star-type network or a circularly connected network.

3) Synchronizing Primitive: There are two broad classes (a) pessimistic concurrency control methods (b) optimistic concurrency control methods.

Pessimistic algorithms synchronize the concurrent execution of transactions early in their execution life cycle, whereas Optimistic algorithms delay the synchronization of transactions until their termination. The ‘pessimistic’ group consists of locking-based algorithms, transaction ordering based algorithms and hybrid algorithms. The ‘optimistic’ can be classified as locking or timestamp ordering based.

In ‘locking-based approach’, the synchronization of transactions is achieved by employing physical or logical locks on some portion or granule of the database. The size of these portions (also called granularity) is an important issue. These protocols can be further classified as 1) Single-Lock Manager Approach, 2) Multiple coordinators, 3) Majority Protocol 4) Biased Protocol, and 5) Primary Copy Protocol.

‘Timestamp Ordering’ (TO) class involves organizing the execution order of transactions so that they maintain mutual and inter-consistency. This ordering is maintained by assigning timestamps to both the transactions and the data items that are stored in the database. These algorithms can be basic TO or conservative TO.

10.13 Two-Phase Locking Algorithm

A transaction locks a data item in "shared mode" if it wants only to read the data item and in "exclusive mode" if it wants to write the data item. A transaction is "well-formed" if it always locks a data item in shared mode before reading it, and always locks a data item in exclusive mode before writing it. The correctness of locking mechanisms is based on the assumption that all transactions are well-formed.

Following compatibility rules exist between lock modes:

1. A transaction can lock a data item in shared mode if it is not locked at all or it is locked in shared mode by another transaction.

2. A transaction can lock a data item in exclusive mode only if it is not locked at all.

Two transactions are in "conflict" if they want to lock the same data item with two incompatible modes; we have shared-exclusive (or read-write) conflicts and exclusive-exclusive (or write-write) conflicts.

Concurrent execution of transactions is correct provided that the following rules are observed:

1. Transactions are well-formed.

2. Compatibility rules for locking are observed.

3. Each transaction does not request new locks after it has released a lock.

The last condition is also called "2-phase locked", because each transaction is in a first phase during which new locks are acquired (growing phase) and in a second phase during which locks are only released (shrinking phase).

Although 2-phase locking is sufficient to preserve the serializability of transactions, it is not sufficient to guarantee isolation. In order to guarantee isolation we must therefore require that transactions hold all their exclusive locks until commitment.

If the transactions are performed according to the following scheme:

(Begin application)

Begin transaction

Acquire locks before reading or writing

Commit

Release locks

(End application)

In this way transactions are well-formed, 2-phase locked, and isolated.

10.14 Concurrency Control (Serializability)

In a distributed database, each transaction performs operations at several sites. The sequence of operations performed by transactions at a site is a ‘local schedule’. An execution of n distributed transactions T1, T2, … Tn at m sites is modeled by a set of local schedules S1, S2, … Sm.

The executions of transactions T1, … , Tn is correct if

1. Each local schedule Sk is serializable.

2. There exists a "total ordering" of T1, … Tn such that, if Ti < Tj

in the total ordering, then there is a serial schedule Sk‘ such that

Sk is equivalent to Sk‘ and Ti < Tj in Serial(Sk‘), for each site k

where both transactions have executed some action.

In other words, if T1, … Tn be a set of transactions, and E be an execution of these transactions modeled by schedules S1, … Sm; E is correct (serializable) if there exists a total ordering of the transactions such that for each pair of conflicting operations Oi and Oj from transactions Ti and Tj, respectively, Oi precedes Oj in any schedule S1, … Sm if and only if Ti precedes Tj in the total ordering.

Two-phase locking ensures that all executions of transactions are serializable, but it does not allow all serializable executions to be produced. In other words, the 2-phase-locking mechanism is more restrictive than would be required by the serializability condition, and the some transaction is forced to wait unnecessarily.

Serializability is the weakest (i.e., the least restrictive) criterion for preserving consistency of concurrent transactions if no semantic knowledge about them is available; however, if semantic knowledge is available, then other approaches may be more attractive.


1. What do you understand by Concurrent Transactions?

2. Explain two-phase locking?

3. Discuss Locking Protocol mechanism in concurrency control.

10.15 Locking Protocol for Concurrency Control in Distributed Databases

10.15.1 Single-Lock-Manager Approach

The system maintains a single lock manager that resides in a single chosen site (Si). All lock and unlock requests are made at site Si. When transaction needs to lock a data item, it sends a lock request to site Si. The lock manager determines whether the lock can be granted immediately. If the lock can be granted, the lock manager sends a message to that effect to the

site at which lock request was initiated. Otherwise, the request is delayed until it can be granted, at which time a message is sent to the site at which the lock request was initiated. The transaction can read the data item from any one of the sites at which a replica of the data item resides. In case of a write, all the sites where a replica of the data item resides must be involved in writing.

The scheme has the following advantages:

Simple implementation: This scheme requires two messages for handling lock requests, and one message for handling unlock requests.

Simple deadlock handling: Since all lock and unlock requests are made at one site, the deadlock handling algorithms used in central databases can be applied directly.

The scheme has following disadvantages:

Bottleneck: The site Si becomes a bottleneck, since all requests must be processed there.

Vulnerability: If the site Si fails, the concurrency controller is lost. Either processing must stop, or a recovery scheme must be used so that a new site can take over lock management from Si.

10.15.2 Multiple Coordinators

A compromise between the advantages and disadvantages of the single-lock-manager scheme can be achieved through a multiple-coordinator approach, in which the lock-manager function is distributed over several sites. Each lock manager administers the lock and unlock requests for a subset of data items. Each lock manager resides in a different site. This approach reduces the degree to which the coordinator is a bottleneck, but it complicates deadlock handling, since the lock and unlock requests are not made at a single site.

10.15.3 Majority Protocol

In a majority protocol, each site maintains a local lock manager whose function is to administer the lock and unlock requests for those data items that are stored in that site.

When a transaction wishes to lock data item Q, which is not replicated and resides at site Si, a message is sent to lock manager at site Si requesting a lock (in a particular lock mode). If data item Q is locked in an incompatible mode, then the request is delayed until

it can be granted. Once it has determined that the lock request can be granted, the lock manager sends a message back to the initiator indicating that it has granted the lock request. If data item Q is replicated in different sites, then a lock-request message must be sent to more than one-half of the sites in which Q is stored. Each lock manager determines whether the lock can be granted immediately (as far as it is concerned). The response is delayed until the request can be granted. The transaction does not operate on Q until it has successfully obtained a lock on a majority of the replicas of Q.

The scheme has the advantage of simple implementation. It requires two message transfers for handling lock requests, and one message transfer for handling unlock requests. This scheme deals with replicated data in a decentralized manner, thus avoiding the drawbacks of central control.

When there is replication of data, it suffers from the following disadvantages:

Implementation: The majority protocol is more complicated to implement than are the previous schemes. It requires 2 (n/2 + 1) messages for handling lock requests, and (n/2 + 1) messages for handling unlock requests.

Deadlock handling: Since the lock and unlock requests are not made at one site, the deadlock handling is more complex. In addition, it is possible for a deadlock to occur even if only one data item is being locked. Luckily we can avoid such deadlocks with relative ease, by requiring all sites to request locks on the replicas of a data item

in the same predetermined order.

10.15.4 Biased Protocol

The biased protocol is based on a model similar to that of the majority protocol. The difference is that requests for shared locks are given more favorable treatment than requests for exclusive locks. The system maintains a lock manager at each site. Each manager manages the locks for all the data items stored at that site. Shared and exclusive locks are handled differently.

Shared Locks: When a transaction needs to lock data item Q, it simply requests a lock on Q from the lock manager at one site that contains a replica of Q.

Exclusive Locks: When a transaction needs to lock data item Q, it requests a lock on Q from the lock manager at all sites that contain a replica of Q.

The response to the request is delayed until it can be granted.

The biased scheme has the advantage of imposing less overhead on read operations than does the majority protocol. This savings is especially significant in common cases in which the frequency of read is much greater than the frequency of write. However, the

additional overhead on writes is a disadvantage. Furthermore, the biased protocol shares the majority protocol’s disadvantage of complexity in handling deadlock.

10.15.5 Primary Copy

In the case of data replication, we can choose one of the replicas as the primary copy. Thus for each data item Q, the primary copy of Q must reside in precisely one site, which we can call the primary site of Q. When a transaction needs to lock a data item Q, it requests a lock at the primary site of Q. The response to the request is delayed until it can be granted. Thus, the primary copy enables concurrency control for replicated data to be handled in a manner similar to that for un-replicated data. This similarity allows for a simple implementation. However, if the primary site of Q fails, Q is inaccessible, even though other sites containing a replica may be accessible.


1. What is single Lock-Manager Approach? Explain Majority and Biased Protocols.

2. What are the advantages and limitations of Single-Lock-Manager Approach?

3. What is Primary Copy?

10.16 Concurrency Control Techniques

10.16.1 Timestamp-Based Algorithm

Timestamp-based concurrency control algorithms do not attempt to maintain serializability by mutual exclusion. Instead, they select, a priori, a serialization order and execute transactions accordingly. To establish this ordering, the transaction manager assigns each transaction Ti a unique timestamp, ts(Ti), at its initiation.

A timestamp is a simple identifier that serves to identify each transaction uniquely and to permit ordering. Timestamp generation has two properties:

(a) uniqueness (b) monotonic.

Two timestamps generated by the same transaction manager should be monotonically increasing.

Timestamp Rule

Given two conflicting operations Oij and Okl belonging, respectively, to transactions Ti and Tk, Oij is executed before Okl if and only if ts(Ti) < ts(Tk). In this case Ti is said to be the older transaction and Tk is said to be the younger one.

A) Generation of Time-Stamps

Each site generates a unique local timestamp using either a logical counter or the local clock. Global timestamp is generated by concatenating the unique local timestamp with the site identifier, which is also unique. Site identifier is put in the least significant position to ensure that the global timestamp generated in one site are not always greater than those generated in another site.

If a site generates local timestamps at a rate faster than that of the other sites, then all timestamps generated by the fast site will be larger than those generated by other sites.

In order to ensure that local timestamps are generated fairly across the system, the logical clock can be implemented as a counter which is incremented after a new local timestamp is generated. To ensure that the various logical clocks are synchronized, we require that every site advances its logical clock whenever a transaction Ti with timestamp <x, y> visits a site and x is greater than the current value of logical clock of the site. The new value of the logical clock should become one more than the time given the last timestamp visiting the site.

B) Algorithms

Basic Timestamp Ordering Algorithms

A transaction which contains an operation that is rejected by a scheduler is restarted by the transaction manager with a new timestamp. This ensures that the transaction has a chance to execute in its next try. Since the transactions never wait while they hold access rights to data items, the basic TO algorithm never causes deadlocks. However, the penalty of deadlock freedom is potential restart of a transaction numerous times.

When an accepted operation is passed on to the data processor, the scheduler needs to refrain from sending another incompatible, but acceptable operation to the data processor until the first is processed and acknowledged. The main disadvantage of basic timestamp method is the great number of restarts which it causes.

Pre-writes are issued by transactions instead of write operations; they are buffered and not applied directly to the database. Only when the transaction is committed, are the corresponding write operations applied to the database. Thus if pre-writes of a transaction have been accepted (buffered), at transaction commit the corresponding writes will not be rejected. Buffering of an operation means that the operation is neither executed, nor rejected; instead, it is recorded together with its timestamp for subsequent execution, and it is ensured that this execution will be possible at a later time.

The basic timestamp mechanism applies the following rules:

1) Each transaction receives a timestamp when it is initiated at its site of origin.

2) Each read or write operation which is required by a transaction has the timestamp of the transaction.

3) For each data item x, the largest timestamp of a read operation and the largest timestamp of a write operation are recorded; they will be indicated as RTM(x) and WTM(x).

4) Let TS be the timestamp of a ‘pre-write’ operation Pi on data item x.

If TS < RTM(x) or TS < WTM(x), then the operation is rejected and the issuing transaction is restarted with a new timestamp; otherwise, the pre-write Pi and its timestamp TS are buffered.

5) Let TS be the timestamp of a read operation Ri on data item x. If TS < WTM(x), the operation is rejected. However, if TS > WTM(x), then Ri is executed only if there is no pre-write operation P(x) pending on data item x having a timestamp TS(P) < TS. If there is one or more pre-write operation P(x) with TS(P) < TS, Ri is buffered until the transaction which has issued P(x) commits.

The reason why Ri is buffered is that the write operation W(x) corresponding to the pre-write P(x) cannot be rejected; therefore we must avoid TS(W) < RTM(x). But TS(W) = TS(P), because they are issued by the same transaction; we must avoid applying Ri since the value of RTM(x) would be set equal to the value of TS, thus making W(x) impossible. The read operation Ri will be executed and eliminated from the buffer when no more pre-writes with a smaller timestamp than Ri are pending on x.

6) Let TS be the timestamp of a write operation Wi on data item x. This operation is never rejected; however, it is possibly buffered if there is a pre-write operation P(x) with a timestamp TS(P) < TS, for the same reason which has been stated for buffering read operations. Wi will be executed and eliminated from the buffer when all pre-writes with smaller timestamps have been eliminated from the buffer.

Note that the use of pre-writes is equivalent to applying exclusive locks on data items for the time interval between the pre-write and the commitment (write) or abort of the issuing instruction.

10.16.2 Conservative Timestamp Ordering Algorithms

The basic TO algorithm tries to execute an operation as soon as it is accepted; it is therefore "aggressive" and "progressive". Conservative algorithms, on the other hand, delay each operation until there is an assurance that no operation with a smaller timestamp can arrive at that scheduler. If this condition can be guaranteed, the scheduler will never reject an operation. However, this delay introduces the possibility of deadlocks. Operations of each transaction are buffered until an ordering can be established so that rejections are not possible, and they are executed in that order.

Assume that each scheduler maintains one queue for each transaction manager in the system. The scheduler at site i stores all the operations that it receives from the transaction manager at site j in queue Qij. Scheduler i has one such queue for each j.

When an operation is received from a transaction manager, it is placed in its appropriate queue in increasing timestamp order. The schedulers at each site execute the operations from these queues in increasing timestamp order. This scheme will reduce the number of restarts, but it will not guarantee that they will be eliminated completely.

Extremely conservative timestamp ordering schedule actually executes transactions serially at each site. The conservative timestamp method is based on the following requirements and rules:

1) Each transaction is executed at one site only and does not activate remote programs. It can only issue read or write requests to remote sites.

2) A site i must receive all the read requests from a different site j in timestamp order. Similarly, a site i must receive all the write requests from a different site j in timestamp order.

3) Assume that a site i has at least one buffered read and one buffered write operation from each other site of the network. Because of requirement 2, site i knows that there are no older requests which can arrive from any site. The concurrency controller at site i behave therefore in the following way:

a. For a read operation R that arrives at site i:

If there is some write operation W buffered at site i such that TS(R) > TS(W), then R is buffered until these writes are executed, else R is executed.

b. For a write operation W that arrives at site i:

If there is some read operation R buffered at site i such that TS(W) > TS(R) or there is some write operation W’ buffered at site i such that TS(W) > TS(W’), then W is buffered until these operations are executed, else W is executed.

10.16.3 Optimistic Algorithms

Optimistic approach is based on assumption that conflicts are rare. So instead of suspending or rejecting conflicting operations, like in 2-phase-locking and time-stamping, a transaction is always executed to completion. A validation check is performed on the write operations of the instructions, which are stored in local copies of the data. If the validation test is not passed, the transaction is restarted. The validation test verifies if the execution of the instruction is serializable.

Each transaction is considered to consist of three phases:

1. The Read Phase: During this phase a transaction reads data items from the database, performs computations, and determines new values for data items of its write-set;

however, these values are not written in the database. Note that the read phase contains almost the whole execution of the transaction.

2. The Validation Phase: During this phase a test is performed to see whether the application of the updates to the database which have been computed by the transaction would cause a loss of consistency or not.

3. The Write Phase: During this phase the updates are applied to the database if validation phase has returned a positive result; otherwise the transaction is restarted.

Our discussion emphasizes concepts rather than implementation details as most of the current work on optimistic methods concentrates on centralized rather than distributed DBMSs, and no commercial or prototype DBMS has implemented these algorithms.

Scheme 1: Validation using Timestamps on Data Items and Transactions

This algorithm assumes a fully redundant database that is a copy of each data item is stored at each site. Since the database is fully redundant, each transaction is executed completely at its site of origin. Each transaction receives a unique timestamp when it starts execution. During the read phase of transaction, the updates are written into an ‘update list’. This ‘update list’ contains

a) The data items of the read-set with their timestamps

b) The new values of the data items of its write-set.

c) The timestamp of the transaction itself.

The validation phase consists of checking that updates can be applied at all sites.

Scheme 2: Validation using Timestamps on Transactions only

A major problem with these algorithms is the higher storage cost. To validate a transaction, the optimistic mechanism has to store the read and write sets of several other terminated transactions.

Another problem is starvation. If the validation phase of a long transaction fails, in subsequent trials it is still possible that the validation will fail repeatedly.


1. What are various Concurrency Control Algorithms? List

2. What are the benefits of Optimistic Algorithms?

3. Explain Conservative Timestamp Ordering Algorithm.

10.17 Deadlock Handling

Two major approaches of deadlock handling are deadlock prevention and deadlock detection.

10.17.1 Deadlock Prevention

Deadlock prevention may result in unnecessary waiting and rollback. Furthermore, certain deadlock-prevention techniques may require more sites to be involved in the execution of a transaction than would otherwise be the case.

10.17.2 Deadlock Detection

The main problem in distributed systems is how to maintain wait-for-graph.

Common techniques for dealing with this issue require that each site keep a local wait-for-graph. The nodes of the graph correspond to all the transactions (local as well as non-local) that are currently either holding or requesting any of the items local to that site. For example, figure below depicts a system consisting of two sites, each maintaining its local wait-for-graph. Note that transactions T2

http://resources.smude.edu.in/slm/wp-content/uploads/2010/07/clip-image00253.jpg

Figure 10.1: Local wait-for-graphs

and T3 appear in both graphs, indicating that the transactions have requested items at both sites.

These local wait-for-graphs are constructed in the usual manner for transactions and data items. When a transaction Ti on site S1 needs a resource in site 2, a request message is sent by Ti to site S2. If the resource is held by transaction Tj, an edge Ti–>Tj is inserted in the local wait-for-graph of site S2.

Clearly, if any local wait-for-graph has a cycle, deadlock has occurred.

On the other hand, the fact that there are no cycles in any of the local wait-for-graphs does not mean that there are no deadlocks, as the union of above wait-for-graphs shows a deadlock. The global wait-for-graph for the above example as shown below:

Figure 10.2: Global wait-for-graph

Some of the approaches taken to solve this problem are:

1. Centralized Approach

2. Hierarchical Approach

3. Fully Distributed Approach

10.17.3 Centralized Deadlock Detection



A selected site (also called Control Site) is chosen at which a centralized deadlock detector generates a global wait-for-graph and discovers cycles in it. The global wait-for-graph can be constructed under the following conditions:

1) Whenever a new edge is inserted in or removed from one of the local wait-for-graphs.

2) Periodically, when a number of changes have occurred in a local wait-for-graph.

3) Whenever the coordinator needs to invoke the cycle-detection algorithm.

All other sites periodically send their local wait-for-graphs to the central site. Each site detects its local deadlocks and resolves them independently.

Centralized deadlock detection algorithms are conceptually simple and are easy to implement. Deadlock resolution is simple in these algorithms as the control site has complete information about deadlock cycle and it can optimally resolve the deadlock. Centralized deadlock detection has following drawbacks:

i) It is vulnerable to the failures of sites where the centralized detector runs.

ii) It may require large communication costs, because the centralized detector may be located far from some other site(s) in the network. A deadlock which is not local does not necessarily involve all other sites of the network. It is possible and very likely in many applications that a deadlock involves only a few sites which are very close to one another. In this case, they could discover the deadlock without communicating with a distant central site.

iii) It may cause bottleneck at the central site as all other sites will send their communication to the central site periodically.

iv) There are possibilities to detect a false deadlock.

v) The central deadlock detector site may have the main task of detecting the deadlock only as it has to periodically get the local wait for graphs from the other sites and construct global wait for graph.

vi) Heavy memory and storage requirements are imposed on the central machine to store the WFGs of other sites.

10.17.4 Hierarchical Deadlock Detection

Hierarchical deadlock detection method is an attempt to reduce communication costs found in the centralized deadlock detection schemes.

Sites are logically arranged in hierarchical fashion, and a site is responsible for detecting deadlocks involving only its children sites. These algorithms take advantage of access patterns that are localized to a cluster of sites to optimize performance.

The performance of a hierarchical deadlock detection mechanism depends on the choice of the hierarchy. This choice reflects the network topology and the pattern of access requests to the different sites of the network.

10.18 Distributed Deadlock Detection

All controllers share equally the responsibility for detecting deadlock. Every site constructs a wait-for-graph that represents a part of the total graph, depending upon dynamic behavior of the system. Many sites participate in the detection of a global cycle. Deadlock detection is initiated only when a waiting process is suspected to be a part of a deadlock cycle.

Various algorithms have been proposed for distributed deadlock detection. These can be classified under the following heads:

a) Path-Pushing Algorithm

b) An Edge-Chasing algorithm

c) Diffusion Computation

d) Global State Detection

Distributed deadlock detection algorithms are difficult to design due to lack of globally shared memory. More over the proof of correctness of these algorithms is also difficult. Several sites may initiate deadlock detection for the same deadlock.

10.18.1 False Deadlocks

The delay which is associated with the transmission of messages which transfer information for deadlock detection can cause the detection of "false deadlocks". Suppose for example that the deadlock detector receives the information that a transaction Ti is waiting for a transaction Tj at a given time. Assume that after some time, Tj releases the resource which was requested by Ti and requests a resource held by Ti. If the deadlock detector receives the information that Tj requests a resource held by Ti before receiving the information that Ti is not blocked by Tj any more, a false cycle of length 2 is detected.

Another case of false deadlock occurs when a transaction Tj, which blocks a transaction Ti, aborts autonomously for some reason which is independent of deadlock detection and at almost the same time Ti requires a resource which was held by Tj. Also in this case it is possible


1. What do you mean by a Deadlock? What are the conditions for deadlock to occur?

2. Explain the Deadlock prevention and Detection.

3. What do you mean by False Deadlocks?

10.19 Summary

This Unit began with providing a vis-à-vis comparison of distributed Databases with centralized one and covers the differences between them, benefits of former over the later. Then it provided a detailed coverage of transaction related issues. It also gave a detailed coverage on Concurrent transactions, various issued related to it and available techniques & algorithms to deal with the concurrent transactions. Finally unit concluded with providing sufficient input on deadlocks and related topics.

10.20 Terminal Questions

1. What are differences in Centralized and Distributed Database Systems? List the relative advantages of data distribution.

2. What are Homogeneous distributed database management systems? How are they different from Heterogeneous database systems?

3. What are differences in Global and Local Transactions in distributed database system? What are the roles of Transaction Manager and Transaction Coordinator in managing transactions in distributed database?

4. What are the basic failure types in distributed environment, which might cause a distributed transaction failure?

5. Describe the Two-Phase Commit Protocol. How does the protocol ensure consistency in distributed databases?

6. What are Commit Protocols? Explain, how Two-Phase Commit Protocol responds to following types of failures:-

i) Failure of Participating Site,

ii) Failure of Coordinator.

7. Describe distributed approach to deadlock detection in databases.

8. What do you mean by Distributed Deadlock Detection? What are different strategies for distributed deadlock detection What are phantom deadlocks? Explain under what circumstances phantom deadlock may be detected

References:

1. Database Management Systems, Raghu Ramakrishnan (Author), Johannes Gehrke, Tata McGraw Hill

2. Database System Concepts, Avi Silberschatz, Henry F. Korth, S. Sudarshan, McGraw-Hill

3. Data Base System Concepts, Navathe, PHI

4. M. Atkinson, R. Morrison. Orthogonally Persistent Object Systems. The VLDB Journal 4(3), 319-401, 1995

5. R.G.G.Cattell, D.K.Barry (Eds.): The Object Data Standard: ODMG 3.0. Morgan Kaufmann 2000.

Copyright © 2009 SMU

Powered by Sikkim Manipal University

.

Date post:	25-Nov-2014
Category:	Documents
Upload:	narasimhamurthysn
View:	113 times
Download:	0 times

MC0077 - Unit 10

Documents