question bank-10-13batch[1]

8/7/2019 question bank-10-13batch[1]

http://slidepdf.com/reader/full/question-bank-10-13batch1 1/30

SSEC\MCA\DBMS Question Bank\2010-2013 Batch

Two Marks

Unit III

1. What do you mean by an index?

An index can be viewed as a collection of data entries, with an efficient way to locate all dataentries with search key value k . Each such data entry, k ∗, contains enough information to

enable us to retrieve (one or more) data records with search key value k .

2. What are the different ways in which an entry can be made in an index?

A data entry k* allows us to retrieve one or more data records with key value k . The three main

alternatives:

• A data entry k* is an actual data record (with search key value k ).

• A data entry is a <k, rid> pair, where rid is the record id of a data record with

search key value k .

• A data entry is a <k, rid-list> pair, where rid-list is a list of record ids of datarecords with search key value k .

3. Differentiate between clustered and unclustered index.

When a file is organized so that the ordering of data records is the same as or close to the

ordering of data entries in some index, the index is clustered. An index that is not clustered is

called an unclustered index;

4. Differentiate between dense and sparse index.

An index is said to be dense if it contains (at least) one data entry for every search key valuethat appears in a record in the indexed file.3 A sparse index contains one entry for each page

of records in the data file.

5. What do you mean by fully inverted and inverted file?

A data file is said to be inverted on a field if there is a dense secondary index on this field. A

fully inverted file is one in which there is a dense secondary index on each field that does not

appear in the primary key.

6. Define primary and secondary indices.

An index on a set of fields that includes the primary key is called a primary index. An index

that is not a primary index is called a secondary index. A primary index is guaranteed not to

contain duplicates, but an index on other (collections of) fields can contain duplicates. Thus, in

general, a secondary index contains duplicates.

7. What do you mean by a composite search key or concatenated keys?

The search key for an index can contain several fields; such keys are called composite search

keys or concatenated keys. As an example, consider a collection of employee records, with

fields name, age, and sal , stored in sorted order by name. Example of composite key would be

a composite index with key <age, sal>, a composite index with key <sal, age>, an index withkey age, and an index with key sal .

V. R. Kanagavalli Page 1




8. Differentiated between a range query and an equality query?

If the search key is composite, an equality query is one in which each field in the search key

is bound to a constant. For example, retrieving all data entries with age = 20 and sal = 10. The

hashed file organization supports only equality queries, since a hash function identifies the

bucket containing desired records only if a value is specified for each field in the search key.

A range query is one in which not all fields in the search key are bound to constants.

An example of a range query retrieving all data entries with age < 30 and sal > 40.

9. What is the advantage of tree structured indexes?

Tree-structured indexes are ideal for range selections, and also support equality selections quite

efficiently.

10.Define ISAM. What is the disadvantage of ISAM?

ISAM is a static tree-structured index in which only leaf pages are modified by inserts and

deletes. If a leaf page is full, an overflow page is added. Unless the size of the dataset and the

data distribution remain approximately the same, overflow chains could become long and

degrade performance.

11.Define a B+ Tree and its order

A B+ tree is a dynamic, height-balanced index structure that adapts gracefully to changing datacharacteristics. Each node except the root has between d and 2d entries. The number d is called

the order of the tree. Each non-leaf node withm index entries has m+1 children pointers. The

leaf nodes contain data entries. Leaf pages are chained in a doubly linked list.

12. How the B+ Tree does handles insertion and deletion of data in it?

During insertion, nodes that are full are split to avoid overflow pages. Thus, an insertion might

increase the height of the tree.

During deletion, a node might go below the minimum occupancy threshold. In this case, the

entries can be either redistributed from adjacent siblings, or the node can be merged with a

sibling node. A deletion might decrease the height of the tree.

13. What is the purpose of key compression in B+ Tree?

Key compression is the technique used in B+ Trees search key values in index nodes are

shortened to ensure a high fan-out.

14. List out some of the characteristics of a B+ Tree.

• Operations (insert, delete) on the tree keep it balanced.

• A minimum occupancy of 50 percent is guaranteed for each node.

• Searching for a record requires just a traversal from the root to the appropriate leaf.





15. What do you mean by the height of the B+ Tree?

The length of a path from the root to any leaf (because the B+ tree is balanced) is referred to as

the height of the tree.

16. Define a B+ Tree.

The B+ tree search structure is a balanced tree in which the internal nodes direct the search and

the leaf nodes contain the data entries. In order to retrieve all leaf pages efficiently, they are

linked using page pointers. The leaf pages can be traversed in either direction as they are

organized as a doubly linked list.

17. Give the format of an index page.

Where the Pi are pointers to the data entries that is having values in the range of K i and K i+1

and P0 points to the data entries that has key value less than K 1 and Pm points to the data

entries that has key values greater than K m.

18.Give the format of a one level index structure.

19. Give the format of an ISAM Index structure.





20. Explain the page allocation in ISAM





21. What is the drawback of ISAM

ISAM is a static structure and suffers from the problem that long overflow chains can develop

as the file grows, leading to poor performance.

22. Explain hash based indexes.

Hash-based indexes are designed for equality queries. A hashing function is applied to a search

field value and returns a bucket number . The bucket number corresponds to a page on disk that

contains all possibly relevant records.

23.Explain static hashing technique.

A Static Hashing index has a fixed number of primary buckets. During insertion, if the

primary bucket for a data entry is full, an overflow page is allocated and linked to the primary

bucket. The list of overflow pages at a bucket is called its overflow chain. Static Hashing can

answer equality queries with a single disk I/O, in the absence of overflow chains. As the file

grows, however, Static Hashing suffers from long overflow chains and performance

deteriorates.

24. Define dynamic hashing. /Define Extendible hashing

Extendible Hashing is a dynamic index structure that extends Static Hashing by introducing a

level of indirection in the form of a directory. Usually the size of the directory is 2d for some

d , which is called the global depth of the index. The correct directory entry is found by looking

at the first d bits of the result of the hashing function. The directory entry points to the page ondisk with the actual data entries. If a page is full and a new data entry falls into that page, data

entries from the full page are redistributed according to the first l bits of the hashed values. The

value l is called the local depth of the page.

25. What do you mean by skewed data and collision in hashing? (Or) What are the drawbacks in

hashing technique?

If the data is not distributed normally over the available domain for the data, then the data is

said to be skewed. Collisions are data entries with the same hash value.

26.Explain the linear hashing.

Linear Hashing avoids a directory by splitting the buckets in a round-robin fashion. Linear

Hashing proceeds in rounds. At the beginning of each round there is an initial set of buckets.

Insertions can trigger bucket splits, but buckets are split sequentially in order. Overflow pages

are required, but overflow chains are unlikely to be long because each bucket will be split at

some point.

27. Compare Extendible hashing and linear hashing.

Extendible and Linear Hashing are closely related. Linear Hashing avoids a directory structureby having a predefined order of buckets to split. The disadvantage of Linear Hashing relative

to Extendible Hashing is that space utilization could be lower, especially for skewed





distributions, because the bucket splits are not concentrated where the data density is highest,

as they are in Extendible Hashing. A directory-based implementation of Linear Hashing can

improve space occupancy, but it is still likely to be inferior to Extendible Hashing in extreme

cases.





Unit IV

28. What do you mean by an external sorting algorithm.

An external sorting algorithm sorts a file of arbitrary length using only a limited amount of

main memory.

29. Briefly explain the two-way merge sort algorithm.

The two-way merge sort algorithm is an external sorting algorithm that uses only three buffer

pages at any time. Initially, the file is broken into small sorted files called runs of the size of

one page. The algorithm then proceeds in passes. In each pass, runs are paired and merged into

sorted runs twice the size of the input runs. In the last pass, the merge of two runs results in a

sorted instance of the file. The number of passes is _log 2N_ +1, where N is the number of pages

in the file.

30. Briefly explain the external merge sort algorithm.

The external merge sort algorithm improves upon the two-way merge sort if there are B > 3

buffer pages available for sorting. The algorithm writes initial runs of B pages each instead of

only one page. In addition, the algorithm merges B−1 runs instead of two runs during the

merge step. The number of passes is reduced to , where N 1 = The

average length of the initial runs can be increased to 2 *B pages, reducing N 1 to N 1 =

31. What is the advantage of blocked I/O?

In blocked I/O several consecutive pages (called a buffer block ) are read/ written through asingle request. Blocked I/O is usually much cheaper than reading or writing the same number

of pages through independent I/O requests.

32. What do you mean by double buffering?

In double buffering , each buffer is duplicated. While the CPU processes tuples in one buffer,

an I/O request for the other buffer is issued.

33. Differentiate external sorting and clustered B+ tree index.

If the file to be sorted has a clustered B+ tree index with a search key equal to the fields to be

sorted by, then we can simply scan the sequence set and retrieve the records in sorted order.

This technique is clearly superior to using an external sorting algorithm. If the index isunclustered, an external sorting algorithm will almost certainly be cheaper than using the

index.

34. What is the need for sorting records? (or) explain the advantage of sorting the records.

• Users may want answers in some order; for example, by increasing age.

• Sorting records is the first step in bulk loading a tree index.

• Sorting is useful for eliminating duplicate copies in a collection of records.

• A widely used algorithm for performing a very important relational algebra operation,

• called join, requires a sorting step

35. What do you mean by a run?

Each sorted subfile is called as a run in a external merge sort.





36. Define access path.

The alternative ways to retrieve tuples from a relation are called access paths. An access path

is either (1) a file scan or (2) an index plus a matching selection condition.

37.When does a general selection condition match an index? What is a primary term in a selection

condition with respect to a given index?

An index is said to match a selection condition if the index can be used to retrieve just the

tuples that satisfy the condition. The general format is attr op value, where op is one of the

comparison operators <, ≤, =, _ =, ≥ , or >. An index matches such a selection if the index search

key is attr and either (1) the index is a tree index or (2) the index is a hash index and op is

equality.

38. Define selectivity of an access path.

The selectivity of an access path is the number of pages retrieved (index pages plus datapages) if this access path is used to retrieve all desired tuples. If a relation contains an index

that matches a given selection, there are at least two access paths, namely, the index and a scan

of the data file.

39. Define most selective access path of a query.

The most selective access path is the one that retrieves the fewest pages; using the most

selective access path minimizes the cost of data retrieval.

40. Write the index nested loops join algorithm.

In a nested loops join, the join condition is evaluated between each pair of tuples from R and S .

41. Write the block nested loops join algorithm.

42. Define conjunctive and disjunctive selections in a query.

General selection conditions can be expressed in conjunctive normal form, where each

conjunct consists of one or more terms. Conjuncts that contain V are called disjunctive.

43. What do you mean by an index only scan?

It is a hash-based implementation first partitions the file according to a hash function on the

output attributes. Two tuples that belong to different partitions are guaranteed not to be

duplicates because they have different hash values. In a subsequent step each partition is read

into main memory and within-partition duplicates are eliminated. If an index contains all

output attributes, tuples can be retrieved solely from the index. This technique is called an

index-only scan.





44. Compare and contrast nested loops join, block nested loops join operations

In a nested loops join, the join condition is evaluated between each pair of tuples from R and S .

A block nested loops join performs the pairing in a way that minimizes the number of disk

accesses

45. Write in short about the difference between sort-merge join and hash join.

A sort-merge join sorts R and S on the join attributes using an external merge sort and

performs the pairing during the final merge step. A hash join first partitions R and S using a

hash function on the join attributes. Only partitions with the same hash values need to be

joined in a subsequent step. A hybrid hash join extends the basic hash join algorithm by

making more efficient.

46.How does hybrid hash join improve upon the basic hash join algorithm?

Hybrid hash join is that we avoid writing the first partitions of R and S to disk during the

partitioning phase and reading them in again during the probing phase.

47. Define histogram and its variants

A histogram is a data structure that approximates a data distribution by dividing the value

range into buckets and maintaining summarized information about each bucket. In an

equiwidth histogram, the value range is divided into subranges of equal size. In an equidepthhistogram, the range is divided into subranges such that each subrange contains the same

number of tuples.

48. When do we say two algebraic expressions to be equivalent?

Two relational algebra expressions are equivalent if they produce the same output for allpossible input instances. Several relational algebra equivalences allow a relational algebra

expression be modified to obtain an expression with a cheaper plan.

49. Enumerate the steps in optimizing a relational algebra expression.

Optimizing a relational algebra expression involves two basic steps:

• Enumerating alternative plans for evaluating the expression. Typically, an optimizer

considers a subset of all possible plans because the number of possible plans is very

large.

• Estimating the cost of each enumerated plan, and choosing the plan with the least

estimated cost.

50. How is the cost estimated for an evaluation plan?

There are two parts to estimating the cost of an evaluation plan for a query block:

1. For each node in the tree, we must estimate the cost of performing the corresponding

operation. Costs are affected significantly by whether pipelining is used or temporary relations

are created to pass the output of an operator to its parent.

2. For each node in the tree, we must estimate the size of the result, and whether it is sorted.

51. Define Reduction Factor.

Reduction factor is associated with each with each term in the WHERE clause. It is the ratio of

the (expected) result size to the input size considering only the selection represented by the

term. The actual size of the result can be estimated as the maximum size times the product of the reduction factors for the terms in the WHERE clause. Of course, this estimate reflects the





—unrealistic, but simplifying—assumption that the conditions tested by each term are

statistically independent.

52. How do we estimate the size of the final result of a query?

The size of the final result of a query is estimated by taking the product of the sizes of the

relations in the FROM clause and the reduction factors for the terms in the WHERE clause.Similarly, the size of the result of each operator in a plan tree is estimated by using reduction

factors, since the subtree rooted at that operator’s node is itself a query block.

53. What does the rule of cascading projections state?

The rule for cascading projections says that successively eliminating columns from a relation

is equivalent to simply eliminating all but the columns retained by the final projection:

54.Differentiate between rule based optimizers and randomized plan generatorsRule-based optimizers use a set of rules to guide the generation of candidate plans, and

randomized plan generation, which uses probabilistic algorithms such as simulated annealing to explore a large space of plans quickly, with a reasonable likelihood of finding a

good plan.

55. Write in short about parametric query optimization and multiple-query optimization.

Parametric query optimization, which seeks to find good plans for a given query for each of

several different conditions that might be encountered at run-time; and multiple-query

optimization, in which the optimizer takes concurrent execution of several queries into

account.

56. What are the problems caused by redundancy?

Redundant storage: Some information is stored repeatedly.

Update anomalies: If one copy of such repeated data is updated, an inconsistency is created

unless all copies are similarly updated.

Insertion anomalies: It may not be possible to store some information unless some other

information is stored as well.

Deletion anomalies: It may not be possible to delete some information without losing some

other information as well.

57. Define lossless join and dependency preservation property.

The lossless-join property enables us to recover any instance of the decomposed relation fromcorresponding instances of the smaller relations. The dependency preservation property

enables us to enforce any constraint on the original relation by simply enforcing some

contraints on each of the smaller relations.

58. Define functional dependency

A functional dependency (FD) is a kind of IC that generalizes the concept of a key. Let R be a

relation schema and let X and Y be nonempty sets of attributes in R. We say that an instance r of R satisfies the FD X → Y if the following holds for every pair of tuples t 1 and t 2 in r :

If t 1.X = t 2.X , then t 1.Y = t 2.Y

59. Define super key.





If X → Y holds, where Y is the set of all attributes, and there is some subset V of X such that V

→ Y holds, then X is a superkey;





60. Define Armstrong’s Axioms.

The following three rules, called Armstrong’s Axioms, can be applied repeatedly to infer all

FDs implied by a set F of FDs. We use X , Y , and Z to denote sets of attributes over a relationschema R:

Reflexivity:

Augmentation: If X → Y , then XZ → YZ for any Z .

Transitivity: If X → Y and Y → Z , then X → Z .

61.Define Attribute Closure of an attribute X

Attribute closure X + with respect to F (set of FDs),is defined as the set of attributes A such

that X → A can be inferred using the Armstrong Axioms.

62.Write the algorithm for finding Attribute Closure of an attribute X.

63. When do we say relation in First normal form.

A relation is in first normal form if every field contains only atomic values, that is, not lists or

sets.

64. Define fully functional dependency and partial dependent.

Full functional dependency indicates that if A and B are attributes of a relation, B is fully

functionally dependent on A if B is functionally dependent on A, but not on any proper subset

of A.

A functional dependency AàB is partially dependent if there is some attributes that can be

removed from A and the dependency still holds.

65. Define second normal form.

Second normal form (2NF) is a relation that is in first normal form and every non-primary-key

attribute is fully functionally dependent on the primary key.

66. Define Third normal form.

Third normal form (3NF) requires that there are no functional dependencies of non-key

attributes on something other than a candidate key. A table is in 3NF if all of the non-primary

key

67. Differentiate between BCNF and third normal form.

A relation is in BCNF, if and only if, every determinant is a candidate key. The difference

between 3NF and BCNF is that for a functional dependency A à B, 3NF allows this dependency in a

relation if B is a primary-key attribute and A is not a candidate key, whereas BCNF insists that for thisdependency to remain in a relation, A must be a candidate key.





68.Define Multi Valued Dependency.

Multi-valued dependency (MVD) represents a dependency between attributes (for example, A,

B and C) in a relation, such that for each value of A there is a set of values for B and a set of value for C. However, the set of values for B and C are independent of each other.

A multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A à>

B in relation R is defined as being trivial if

• B is a subset of A

or

• A U B = R

A MVD is defined as being nontrivial if neither of the above two conditions is satisfied.

69. Define fourth normal form/Codd normal form

Codd normal form and contains no nontrivial multi-valued dependencies.

70. Define fifth normal form.

A relation that has no join dependency. Fifth normal form is satisfied when all tables are

broken into as many tables as possible in order to avoid redundancy. Once it is in fifth

normal form it cannot be broken into smaller relations without changing the facts or the

meaning.

71. Define DKNF

The relation is in DKNF when there can be no insertion or deletion anomalies in the

database.

72. Define database workload

A database workload description includes the following elements:

1. A list of queries and their frequencies, as a fraction of all queries and updates.

2. A list of updates and their frequencies.

3. Performance goals for each type of query and update.

73. What are the details to be collected for queries and updates in a database workshop.

For each query in the workload,

• Which relations are accessed.

• Which attributes are retained (in the SELECT clause).

• Which attributes have selection or join conditions expressed on them (in the WHERE

clause) and how selective these conditions are likely to be.

Similarly, for each update in the workload,

• Which attributes have selection or join conditions expressed on them (in the WHERE

• clause) and how selective these conditions are likely to be.

•The type of update (INSERT, DELETE, or UPDATE) and the updated relation.

• For UPDATE commands, the fields that are modified by the update.





74. What are the guidelines regarding indices in physical database design?

There are guidelines that help to decide whether to index, what to index, whether to use a

multiple-attribute index, whether to create an unclustered or a clustered index, and whether to

use a hash or a tree index. Indexes can speed up queries but can also slow down update

operations.75. What is a DBMS benchmark? Give examples.

A DBMS benchmark tests the performance of a class of applications or specific aspects of a

DBMS to help users evaluate system performance. Well-known benchmarks include TPC-A,

TPC-B, TPC-C, and TPC-D.

76.What do you mean by physical database tuning?(Or) State the need for database tuning?

After an initial physical design, continuous database tuning is important to obtain best possible

performance. Using the observed workload over time, we can reconsider our choice of indexes

and our relation schema. Other tasks include periodic reorganization of indexes and updating

the statistics in the system catalogs.

77.Write a short note on co clustering.

Co-clustering:

• It can speed up joins, in particular key–foreign key joins corresponding to 1:N

relationships.

• A sequential scan of either relation becomes slower.

• Similarly, a sequential scan of all Assembly tuples is also slower.

• Inserts, deletes, and updates that alter record lengths all become slower.

78. Define access control mechanism. Mention the types of the same.

An access control mechanism is a way to control the data that is accessible to a given user.The two different types are discretionary and mandatory access control.

79. What are the main objectives of DBMS Security? Explain with example.

1. Secrecy: Information should not be disclosed to unauthorized users. For example, a student

should not be allowed to examine other students’ grades.

2. Integrity: Only authorized users should be allowed to modify data. For example, students

may be allowed to see their grades, yet not allowed (obviously!) to modify them.

3. Availability: Authorized users should not be denied access. For example, an instructor who

wishes to change a grade should be allowed to do so.

80. Define Discretionary Access Control.Discretionary access control is based on the concept of access rights, or privileges, and

mechanisms for giving users such privileges. A privilege allows a user to access some data

object in a certain manner (e.g., to read or to modify).

81. Define Mandatory Access Control.

Mandatory access control is based on systemwide policies that cannot be changed by

individual users. In this approach each database object is assigned a security class, each user is

assigned clearance for a security class, and rules are imposed on reading and writing of

database objects by users. The DBMS determines whether a given user can read or write a

given object based on certain rules that involve the security level of the object and the

clearance of the user. These rules seek to ensure that sensitive data can never be ‘passed on’ toa user without the necessary clearance.





82.Write the general format of GRANT Command. What is the use of GRANT Option in the

format?The GRANT command gives users privileges to base tables and views. The syntax of this command is

as follows: GRANT privileges ON object TO users [ WITH GRANT OPTION ]. If a user has a

privilege with the grant option, he or she can pass it to another user (with or without the grantoption) by using the GRANT command.

83. What are the privileges granted to the user through a GRANT Command?

SELECT: The right to access (read) all columns of the table specified as the object, including columns added later through ALTER TABLE commands.

INSERT (column-name): The right to insert rows with (non-null or nondefault) values in the

named column of the table named as object.

The privileges UPDATE (column-name) and UPDATE are similar.

DELETE: The right to delete rows from the table named as object.

REFERENCES (column-name): The right to define foreign keys (in other tables) that refer to

the specified column of the table object. REFERENCES without a column name specifieddenotes this right with respect to all columns, including any that are added later.

84. Define Privilege Descriptor.

The privilege descriptor specifies the following: the grantor of the privilege, the grantee who

receives the privilege, the granted privilege (including the name of the object involved), and

whether the grant option is included. When a user creates a table or view and ‘automatically’

gets certain privileges, a privilege descriptor with system as the grantor is entered into this

table.

85. Define authorization Graph.

Authorization graph is a node in which the nodes are users—technically, they areauthorization ids—and the arcs indicate how privileges are passed. There is an arc from (the

node for) user 1 to user 2 if user 1 executed a GRANT command giving a privilege to user 2;

the arc is labeled with the descriptor for the GRANT command. A GRANT command has no

effect if the same privileges have already been granted to the same grantee by the same

grantor.

86. Define multilevel and polyinstantiation.

The presence of data objects that appear to have different values to users with different

clearances is called polyinstantiation. A multilevel table, which is a table with the surprising

property that users with different security clearances will see a different collection of rows

when they access the same table.

87. Describe Covert Channel.

Even if a DBMS enforces the mandatory access control scheme discussed above, information

can flow from a higher classification level to a lower classification level through indirect

means, called covert channels.

88. What is the responsibility of the database administrator?

1. Creating new accounts: Each new user or group of users must be assigned an authorization

id and a password. Note that application programs that access the database have the same

authorization id as the user executing the program.

2. Mandatory control issues: If the DBMS supports mandatory control—some customized

systems for applications with very high security requirements (for example, military data)





provide such support—the DBA must assign security classes to each database object and

assign security clearances to each authorization id in accordance with the chosen security

policy.

3. The DBA is also responsible for maintaining the audit trail, which is essentially the log of

updates with the authorization id (of the user who is executing the transaction) added to each

log entry.

89. Define Audit Trail.

Audit trail is essentially the log of updates with the authorization id (of the user who is

executing the transaction) added to each log entry. This log is just a minor extension of the log

mechanism used to recover from crashes.

90. Define Statistical Databases and what is the security issue in the statistical databases?

A statistical database is one that contains specific information on individuals or events but is

intended to permit only statistical queries. Security in such databases poses problems because

it is possible to infer protected information (such as an individual sailor’s rating) from answers

to permitted statistical queries. Such inference opportunities represent covert channels that cancompromise the security policy of the database.

Unit V

91. Define transaction.

A transaction is defined as any one execution of a user program in a DBMS and differs from an

execution of a program outside the DBMS (e.g., a C program executing on Unix) in important

ways. (Executing the same program several times will generate several transactions.)

92. State the ACID Properties . (Or) Explain ACID

Atomicity. Either all operations of the transaction are properly reflected in the database or

none are.

Consistency. Execution of a transaction in isolation preserves the consistency of the database.

Isolation. Although multiple transactions may execute concurrently, each transaction must be

unaware of other concurrently executing transactions. Intermediate transaction results must be

hidden from other concurrently executed transactions.

That is, for every pair of transactions T i and T j, it appears to T i that either T j, finished

execution before T i started, or T j started execution after T i finished.

Durability. After a transaction completes successfully, the changes it has made to the

database persist, even if there are system failures.

93. What are the different states of a transaction?

Active – the initial state; the transaction stays in this state while it is executing

Partially committed – after the final statement has been executed.

Failed -- after the discovery that normal execution can no longer proceed.

Aborted – after the transaction has been rolled back and the database restored to its state prior

to the start of the transaction. Two options after it has been aborted:

restart the transaction; can be done only if no internal logical error





kill the transaction

Committed – after successful completion.

94. Explain the relationship between the different states of a transaction.

95. What is the function of the recovery management of the database?The recovery-management component of a database system implements the support for

atomicity and durability.

96. Explain the shadow-database scheme in short.

The shadow-database scheme:

a. assume that only one transaction is active at a time.

b. a pointer called db_pointer always points to the current consistent copy of the database.

c. all updates are made on a shadow copy of the database, and db_pointer is made to

point to the updated shadow copy only after the transaction reaches partial commit and

all updated pages have been flushed to disk.

d. in case transaction fails, old consistent copy pointed to by db_pointer can be used, and

the shadow copy can be deleted.

97. Define Schedule.

A schedule is a list of actions (reading, writing, aborting, or committing) from a set of

transactions, and the order in which two actions of a transaction T appear in a schedule must

be the same as the order in which they appear in T. Intuitively, a schedule represents an actual

or potential execution sequence.

98. Define complete schedule.

A schedule that contains either an abort or a commit for each transaction whose actions are

listed in it is called a complete schedule. A complete schedule must contain all the actions of

every transaction that appears in it.

99. Define Serial schedule.

If the actions of different transactions are not interleaved—that is, transactions are executed

from start to finish, one by one—the schedule is called a serial schedule.

100.What is the advantage of concurrent execution?

Multiple transactions are allowed to run concurrently in the system. Advantages are:





a. increased processor and disk utilization, leading to better transaction throughput:

one transaction can be using the CPU while another is reading from or writing to the

disk

b. reduced average response time for transactions: short transactions need not wait

behind long ones.

101. Define Concurrency Control Schemes.

Concurrency control schemes – mechanisms to achieve isolation; that is, to control the

interaction among the concurrent transactions in order to prevent them from destroying the

consistency of the database

102. What is the meaning of a serializable schedule?

A serializable schedule over a set S of committed transactions is a schedule whoseeffect on any consistent database instance is guaranteed to be identical to that of some

complete serial schedule over S.

103. What are the different types of anamolies or conflicts that can occur while interleaving

the transactions?

a. Reading Uncommitted Data (WR Conflicts)

b. Unrepeatable Reads (RW Conflicts)

c. Overwriting Uncommitted Data (WW Conflicts)

104. What do you mean by a recoverable schedule? What is the advantage of the same?

A recoverable schedule is one in which transactions commit only after (and if!) all

transactions whose changes they read commit. If transactions read only the changes of

committed transactions, not only is the schedule recoverable, but also aborting a

transaction can be accomplished without cascading the abort to other transactions.Such a

schedule is said to avoid cascading aborts.

105. What do you mean by conflict serializable schedules?

Two schedules are said to be conflict equivalent if they involve the (same set of) actions

of the same transactions and they order every pair of conflicting actions of two committedtransactions in the same way. A schedule is conflict serializable if it is conflict

equivalent to some serial schedule.

106. What do you mean by a precedence graph? Where is it used?

The precedence graph for a schedule S contains:- A node for each committed transaction in S.- An arc from Ti to T j if an action of Ti precedes and conflicts with one of T j’s actions.

* Strict2PL checks allows only the schedules for which the precedence graph acyclic.





107. What are the rules of strict 2Phase locking?

STRICT 2 PHASE LOCKING Rules

(1) If a transaction T wants to read (respectively, modify) an object, it first requests a

shared (respectively, exclusive) lock on the object.

(2) All locks held by a transaction are released when the transaction is completed.

108. What is the difference between strict 2phase locking and 2 phase locking?

The strict 2 phase locking releases the locks only when the transaction is completedwhereas the 2 phase locking For 2PL the 2nd rule is replaced by “ A transaction cannot

request additional locks once it releases any lock.”

Thus, every transaction has a ‘growing’ phase in which it acquires locks, followed by a

'shrinking’ phase in which it releases locks

109. Define View serializable schedules.

If Ti reads the initial value of object A in S1, it must also read the initial value of A in S2.2. If Ti reads a value of A written by Tj in S1, it must also read the value of A written by

Tj in S2.

3. For each data object A, the transaction (if any) that performs the final write on A in S1must also perform the final write on A in S2.

A schedule is view serializable if it is view equivalent to some serial schedule.

Every conflict serializable schedule is view serializable, although the converse is not

true.

110. Define latches

Latches:

- Short duration locks that are set before reading or writing a page to ensure atomic

operation

- Unset immediately after the physical read/write operation is completed

111. Define Convoys.

Convoys: It is the queue of transactions that is formed for want of lock to be released by

another transaction that is put on hold by a preemptive OS during its process scheduling

112. Differentiate between lock upgradation and downgrading.

Lock upgrade request – to upgrade shared lock to exclusive lock (e.g., UPDATE

operation)

Downgrading – The transaction initially obtains exclusive locks and then downgrades to

shared locks.





113. What do you mean by an update lock?

Update lock – it is compatible with shared locks but not other update and exclusive locks.

If the object need not be updated, then it is downgraded to shared lock

114. Define Deadlock. How can it be prevented?A cycle of transactions waiting for locks to be released is called a deadlock.

Each transaction is given a timestamp when it starts up. The lower the timestamp, the

higher is the transaction’s priority;

If a transaction Ti requests a lock and transaction Tj holds a conflicting lock, the lock

manager can use one of the following policies

• Wait-die: If Ti has higher priority, it is allowed to wait; otherwise it is aborted.

• Wound-wait: If Ti has higher priority, abort Tj; otherwise Ti waits.

In the wait-die scheme, lower priority transactions can never wait for higher priority

transactions. In the wound-wait scheme, higher priority transactions never wait for lower

priority transactions. In either case no deadlock cycle can develop.

115. What is a waits-for graph? Give examples.

It is maintained by the lock manager to detect deadlock cycles where the Nodes denote

active transaction, and an arc from Ti to Tj denotes that Ti is waiting for Tj to release a

lock. A cycle in the waits-for graph indicates a deadlock.

116. How is a deadlock resolved?

A deadlock is resolved by aborting a transaction that is on a cycle and releasing its

locks. There are various choices for deciding which transaction has to be aborted. A

transaction

- with the fewest locks

- that has done the least work

- That is farthest from its completion

117. Define timeout mechanism.





If a transaction has been waiting too long for a lock, it is assumed to be in a deadlock

cycle and so aborted.

118. What do you mean by conservative 2PL?

Conservative 2PL can also prevent deadlocks. Under Conservative 2PL, a transaction

obtains all the locks that it will ever need when it begins, or blocks waiting for these locks

to become available.

119. What are the rules to be followed in implementing concurrency control in B+ Trees?

1. The higher levels of the tree only serve to direct searches, and all the ‘real’ data is in

the leaf levels (in the format of one of the three alternatives for data entries).

2. For inserts, a node must be locked (in exclusive mode, of course) only if a split can

propagate up to it from the modified leaf.

120. Define multiple-granularity locking.

Multiple-granularity Locking allows to efficiently set locks on objects that contain

other objects. The idea is to exploit the hierarchical nature of the ‘contains’

relationship. A database contains a set of files, each file contains a set of pages, and

each page contains a set of records. This containment hierarchy can be thought of as atree of objects, where each node contains all its children.

121. Define Intention shared and intention exclusive locks.

Intention shared (IS) and intention exclusive (IX) locks.

• To lock a node in S (respectively X) mode, a transaction must first lock all its

ancestors in IS (respectively IX) mode.

• SIX lock that is logically equivalent to holding an S lock and an IX lock. A transaction

can obtain a single SIX lock (which conflicts with any lock that conflicts with either S or IX) instead of an S lock and an IX lock.

122. Define Lock Escalation.

This is the approach for deciding the level of granularity locking by obtaining fine

granularity locks (e.g., at the record level) and after the transaction requests a certain

number of locks at that granularity, to start obtaining locks at the next higher granularity

(e.g., at the page level).





123. What are the basic premises of optimistic concurrency control?

The basic premise is that most transactions will not conflict with other transactions, and

the idea is to be as permissive as possible in allowing transactions to execute.

Transactions proceed in three phases:

1. Read: The transaction executes, reading values from the database and writing to a

private workspace.

2. Validation: If the transaction decides that it wants to commit, the DBMS checks

whether the transaction could possibly have conflicted with any other concurrently

executing transaction. If there is a possible conflict, the transaction is aborted; its private

workspace is cleared and it is restarted.

3. Write: If validation determines that there are no possible conflicts, the changes to

data objects made by the transaction in its private workspace are copied into the database.

124. Define timestamp based concurrency control.

Each transaction can be assigned a timestamp at startup, and it is ensured, at execution

time, that if action ai of transaction Ti conflicts with action aj of transaction Tj, ai occurs

before aj if TS(Ti) < TS(Tj). If an action violates this ordering, the transaction is aborted

and restarted.

125. Define Thomas write rule and justify the same.

Ignoring outdated writes is called the Thomas Write Rule. If the Thomas Write Rule

is not used, that is, T is aborted in case (2) above, the timestamp protocol,

like 2PL, allows only conflict serializable schedules.

If the ThomasWrite Rule is used, some serializable schedules are permitted that are not

conflict serializable, as illustrated by the following schedule.

126. What is the purpose of multiversion concurrency control?





This protocol represents yet another way of using timestamps, assigned at startup time, to

achieve serializability.

The goal is to ensure that a transaction never has to wait to read a database object, and

the idea is to maintain several versions of each database object, each with a write

timestamp, and to let transaction Ti read the most recent version whose timestamp

precedes TS(Ti).

127. Describe the responsibilities of a transaction manager.

The recovery manager of a DBMS is responsible for ensuring transaction atomicity and

durability. It ensures atomicity by undoing the actions of transactions that do not commit and

durability by taking sure that all actions of committed transactions survive system

crashes, (e.g., a core dump caused by a bus error) and media failures (e.g., a disk is

corrupted).

128. Explain the various conflicts in short.

Three types of conflicting actions lead to three different anomalies. In a write-read (WR)

conflict , one transaction could read uncommitted data from another transaction. Such a read is

called a dirty read . In a read-write (RW) conflict , a transaction could read a data object twice

with different results. Such a situation is called an unrepeatable read . In a write-write (WW)

conflict , a transaction overwrites a data object written by another transaction. If the first

transaction subsequently aborts, the change made by the second transaction could be lost

unless a complex recovery mechanism is used.

129. What are the functions of recovery manager?

The recovery manager of a DBMS is responsible for ensuring two important properties of

transactions: atomicity and durability. It ensures atomicity by undoing the actions of

transactions that do not commit and durability by making sure that all actions of committed

transactions survive system crashes, (e.g., a core dump caused by a bus error) and media

failures (e.g., a disk is corrupted).

130. What is the meaning of steal and force approaches?

If changes made by a transaction can be propagated to disk before the transaction hascommitted, then a steal approach is used. If all changes made by a transaction are immediately

forced to disk after the transaction commits, a force approach is said to be used.

131. Define ARIES recovery algorithm.

ARIES is a recovery algorithm that is designed to work with a steal, no-force approach.

When the recovery manager is invoked after a crash, restart proceeds in three phases:

1. Analysis: Identifies dirty pages in the buffer pool (i.e., changes that have not been written to

disk) and active transactions at the time of the crash.

2. Redo: Repeats all actions, starting from an appropriate point in the log, and restores the

database state to what it was at the time of the crash.





3. Undo: Undoes the actions of transactions that did not commit, so that the database reflects

only the actions of committed transactions.

132. What are the three main principles of ARIES?

There are three main principles behind the ARIES recovery algorithm:Write-ahead logging: Any change to a database object is first recorded in the log; the record

in the log must be written to stable storage before the change to the database object is written

to disk.

Repeating history during Redo: Upon restart following a crash, ARIES retraces all actions of

the DBMS before the crash and brings the system back to the exact state that it was in at the

time of the crash. Then, it undoes the actions of transactions that were still active at the time of

the crash (effectively aborting them).

Logging changes during Undo: Changes made to the database while undoing a transaction

are logged in order to ensure that such an action is not repeated in the event of repeated

(failures causing) restarts.

133. Define a log.

The log, sometimes called the trail or journal, is a history of actions executed by the DBMS.

Physically, the log is a file of records stored in stable storage, which is assumed to survive

crashes; this durability can be achieved by maintaining two or more copies of the log on

different disks (perhaps in different locations), so that the chance of all copies of the log being

simultaneously lost is negligibly small.

134. What do you mean by a log tail?

The most recent portion of the log, called the log tail, is kept in main memory and is

periodically forced to stable storage. This way, log records and data records are written to disk

at the same granularity (pages or sets of pages).

135. What is the purpose of log record?

Every log record is given a unique id called the log sequence number (LSN). As with any

record id, we can fetch a log record with one disk access given the LSN. Further, LSNs should

be assigned in monotonically increasing order; this property is required for the ARIES

recovery algorithm. If the log is a sequential file, in principle growing indefinitely, the LSN

can simply be the address of the first byte of the log record.

136. Define the rules of Write Ahead Logging.

The Write-Ahead Logging Protocol:

1. Must force the log record for an update before the corresponding data page gets to

disk.

2. Must write all log records for a transaction before commit .

#1 guarantees Atomicity.

#2 guarantees Durability.

137. What are the contents of the update log record?





The pageid indicates the page id of the modified page; the length in bytes and the offset of the

change are also included. The before-image is the value of the changed bytes before the

change; the after-image is the value after the change. An update log record that contains both

before- and after-images can be used to redo the change and to undo it.

138. Define Compensation Log Record. (Or) What is the purpose of Compensation Log

Record? (Or) What are the contents of compensation Log Record?

A compensation log record (CLR) is written just before the change recorded in an update logrecord U is undone. (Such an undo can happen during normal system execution when a

transaction is aborted or during recovery from a crash.) A compensation log record C describes

the action taken to undo the actions recorded in the corresponding update log record and is

appended to the log tail just like any other log record. The compensation log record C also

contains a field called undoNextLSN, which is the LSN of the next log record that is to be

undone for the transaction that wrote update record U ; this field in C is set to the value of

prevLSN in U .

139. Differentiate between Compensation Log Record and Update Record.

Unlike an update log record, a CLR describes an action that will never be undone, that is, wenever undo an undo action. The reason is simple: an update log record describes a change

made by a transaction during normal execution and the transaction may subsequently be

aborted, whereas a CLR describes an action taken to rollback a transaction for which the

decision to abort has already been made. Thus, the transaction must be rolled back, and the

undo action described by the CLR is definitely required.

140. What are the contents of Transaction Table?

Transaction table: This table contains one entry for each active transaction. The entry

contains in general, the transaction id, relations accessed by the transaction, attributes related

to the transaction, list of locks held by the transaction, type of the transaction, the status, and a

field called lastLSN, which is the LSN of the most recent log record for this transaction. Thestatus of a transaction can be that it is in progress, is committed, or is aborted.





141. What do you mean by a Dirty page table?

Dirty page table contains one entry for each dirty page in the buffer pool, that is, each page

with changes that are not yet reflected on disk. The entry contains a field recLSN, which isthe LSN of the first log record that caused the page to become dirty. This LSN identifies

the earliest log record that might have to be redone for this page during restart from a

crash.

142. What are the phases of restart in ARIES Recovery algorithm?

143. What do you mean by a checkpoint? (Or) How does the ARIES recovery algorithm use

the checkpoints? What is the purpose of checkpoint?

Checkpoints are nothing but snapshots of DBMS. Checkpointing in ARIES has three steps.

First, a begin checkpoint record is written to indicate when the checkpoint starts. Second, an

end checkpoint record is constructed, including in it the current contents of the transaction

table and the dirty page table, and appended to the log. The third step is carried out after the

end checkpoint record is written to stable storage: A special master record containing the

LSN of the begin checkpoint log record is written to a known place on stable storage. While

the end checkpoint record is being constructed, the DBMS continues executing transactions

and writing other log records; the only guarantee we have is that the transaction table and dirtypage table are accurate as of the time of the begin checkpoint record .

144. What are the steps in analysis phase of crash recovery?

The Analysis phase performs three tasks:

1. It determines the point in the log at which to start the Redo pass.

2. It determines (a conservative superset of the) pages in the buffer pool that were dirty at the

time of the crash.

3. It identifies transactions that were active at the time of the crash and must be undone.

145. What do you mean by repeating paradigm? (Or) How does ARIES differ from other crash recovery algorithms?





During the Redo phase, ARIES reapplies the updates of all transactions, committed or

otherwise. Further, if a transaction was aborted before the crash and its updates were undone,

as indicated by CLRs, the actions described in the CLRs are also reapplied. This repeating

history paradigm distinguishes ARIES from other proposed WAL based recovery algorithms

and causes the database to be brought to the same state that it was in at the time of the crash.

146. What are the steps in redo phase of crash recovery?

The Redo phase begins with the log record that has the smallest recLSN of all pages in the

dirty page table constructed by the Analysis pass because this log record identifies the oldest

update that may not have been written to disk prior to the crash. Starting from this log record,

Redo scans forward until the end of the log. For each redoable log record (update or CLR)

encountered, Redo checks whether the logged action must be redone. The action must be

redone unless one of the following conditions holds:

• The affected page is not in the dirty page table, or

• The affected page is in the dirty page table, but the recLSN for the entry is greater thanthe LSN of the log record being checked, or

• The pageLSN (stored on the page, which must be retrieved to check this condition) is

greater than or equal to the LSN of the log record being checked.

147. What is the purpose of goal phase of crash recovery?

The goal of this phase is to undo the actions of all transactions that were active at the time of

the crash, that is, to effectively abort them. This set of transactions is identified in the

transaction table constructed by the Analysis phase.

148. What do you mean by Loser Transactions?

Transactions that were active at the time of crash are called as loser transactions. All actions of

losers must be undone, and further, these actions must be undone in the reverse of the order in

which they appear in the log.

149. What is the sequence of actions in Undo phase of crash recovery?

The set of lastLSN values for all loser transactions is called as ToUndo. Undo repeatedly

chooses the largest (i.e., most recent) LSN value in this set and processes it, until ToUndo is

empty. To process a log record:

1. If it is a CLR, and the undoNextLSN value is not null , the undoNextLSN value is added to

the set ToUndo; if the undoNextLSN is null , an end record is written for the transaction

because it is completely undone, and the CLR is discarded.

2. If it is an update record, a CLR is written and the corresponding action is undone, as

described in Section 20.1.1, and the prevLSN value in the update log record is added to the setToUndo.

When the set ToUndo is empty, the Undo phase is complete. Restart is now complete, and the

system can proceed with normal operations.

150. Write in short the working sequence of ARIES algorithm.

After a system crash, the Analysis, Redo, and Undo phases are executed. The redo phase

repeats history by transforming the database into its state before the crash. The Undo phase

undoes actions by loser transaction, transactions that are aborted since they were active at the

time of the crash. ARIES handles subsequent crashes during system restart by writing

compensating log records (CLRs) when undoing actions of aborted transaction. CLRs indicate

which actions have already been undone and prevent undoing the same action twice.

151. How do we recover from media failure?





To be able to recover from media failure without reading the complete log copy of the database

is taken periodically. The procedure of copying the database is similar to creating a

checkpoint.







Unit V

24. Explain the ACID properties of transaction. Explain the usefulness of each.

25. Draw the state diagram of a transaction and explain

26. Explain the concept of deadlock handling with deadlock prevention, detection and recovery.

27. Describe the concurrency control based on locking.

28. Discuss the concurrency control without locking.

29. Write in detail about schedules and its significance in concurrency control.

30. Write in detail about the working principle of ARIES algorithm.

31. Write in detail about the Log Table and its significance in crash recovery.

32. Explain the importance of checkpoints in crash recovery.

33. Explain the sequence of actions in redo and undo phases of crash recovery.

34. Explain the crash recovery process in detail.

Date post:	08-Apr-2018
Category:	Documents
Upload:	kanagavalli-narayanan
View:	217 times
Download:	0 times

question bank-10-13batch[1]

Documents