Download - CS 606 advanced database technology

NOORUL ISLAM COLLEGE OF ENGINEERING,KUMARACOIL

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING

B.E.SIXTH SEMESTER

2 MARKS & 16 MARKS

CS606- ADVANCED DATABASE TECHNOLOGY

Prepared By,J.E.Judith

Lecturer/CSENICE

TWO MARKS

UNIT I

1.Define ER model?

The entity-relationship model (or ER model) is a top down approach to

database design that begins by identifying the important data called entities and

relationship between the data. The ER model was first proposed by Peter Pin-Shan Chen.

2.Define Entity type?

A group of object with same property which are identify by the enterprise as

having an independence existence. In an ER model, we diagram an entity type as a

rectangle containing the type name, such as student

ER diagram notation for entity student

3.Define Entity occurrence?

A uniquely identifiable object of an entity type is known as entity occurrence.

Entity occurrence is similar to entity.

4.Define relationship type?

A relationship type is a set of meaningful associations among entity types.

For example, the student entity type is related to the team entity type because each

student is a member of a team.

ER diagram notation for relationship type, Member Of

5.Define relationship occurrence?

A uniquely identifiable association which includes one occurrence from each

participating entity type.

6.Define degree of relationship?

Student

Student MemberOf Team

The degree of a relationship type is the number of entity types that participate. If

two entity types participate, the relationship type is binary. A role name indicates the

purpose of an entity in a relationship.

7.Define recursive relationship with diagrammatic representation?

A recursive relationship is one in which the same entity participates more than

once in the relationship. The supervision relationship is a recursive relationship because

the same entity, a particular team, participates more than once in the relationship, as a

supervisor and as a supervisee.

8.What are the types of attribute?

The types of attributes are

1. Simple and composite attribute

2. Single-valued and multi-valued attribute

Simple and composite attribute

Attributes that can’t be divided into subparts are called Simple or

Atomic attributes. The attribute composed of single component with independent system.

Ex: position and salary attribute of staff entity.

The attribute composed of multiple components each with an

independent existence. Composite attributes can be divided into smaller subparts.

For example, take Name attributes. We can divide it into sub-parts like First name,

Middle name, and Last-named.

Single-valued and multi-valued attribute

Attributes that can have single value at a particular instance of time

are called single valued. A person can’t have more than one age value. Therefore, age of

a person is a single-values attribute.

A multi-valued attribute can have more than one value at one time.

For example, degree of a person is a multi-valued attribute since a person can have more

than one degree.

9.Define candidate key?

Minimal set of attributes that uniquely identifies each occurrence of an entity

type is known as primary key. For example: Branch number attribute is the candidate key

for branch entity type.

10.Define primary key?

The candidate key that is selected to uniquely identify each occurrence of an

entity type is called primary key. Primary keys may consist of a single attribute or

multiple attributes in combination.

11.Differentiate strong and weak entity type?

An entity type that is not existence dependent on some other entity type

called strong entity type. For example, the entity type student is strong because its

existence does not depend on some other entity type.

An entity type that is existence dependent on some other entity type is

called weak entity type. For example, a child entity is a weak entity because it relies on

the parent entity in order for it to exist.

12.Define query processing?

Query processing transforms the query written in high level languages into a

correct and efficient execution strategy expressed in a low level language ant to execute

the strategy to retrieve the required data.

13.Define query optimization?

Query optimization means converting a query into an equivalent form

which is more efficient to execute. It is necessary for high level relation queries and it

provides an opportunity to DBMS to systematically evaluate alternative query execution

strategies and to choose an optimal strategy.

14.What are the phases of query processing?

The phases are

1) Query Decomposition.

2) Query Optimization.

3) Code Generation.

4) Runtime Query Execution.

15.Define query decomposition and what are its stages?

The query decomposition is the firs phase of query processing whose

aims are to transform a high-level query into a relational algebra query and to check

whether that query is syntactically and semantically correct.

Different stages are

1) Analysis

2) Normalization

3) Semantic analysis

4) Simplification

5) Query restructuring.

16.Define conjunctive and disjunctive normal form?

Conjunctive normal form

Conjunctive normal form means sequence of conjuncts connected

with an AND operator. These conjuncts contain one or more terms connected

by OR operator.

Disjunctive normal form

Disjunctive normal form means sequence of disjuncts connected

with an OR operator. These disjuncts contain one or more terms connected by

AND operator.

17.Differentiate Dynamic vs Static form optimization?

Dynamic optimization

Query has to be passed, validated and optimized before it can be executed.

All information required to select an optimum strategy is up to date.

Static optimization

Query is passed, validated and optimized only once.

Runtime overhead is reduced.

18.What are the problems caused by concurrency control.

The process of managing simultaneous operations on the database without

having them to interfere with one another is called as concurrency control. The

problems caused by concurrency control are

i. Lost update problem

ii. Uncommitted dependency problem

iii. Inconsistent analysis problem

19.Define 3NF and BCNF

Third Normal Form (3NF):

A relation that is in 1NF and 2NF, and in which no non-primary key

attribute is transitively dependent on the primary key.

Boyce-Codd Normal Form (BCNF):

A relation is in BCNF, if and only if, every determinant is a candidate key.

20. Define Timestamp?

Timestamp is a unique identifier created by the DBMS that indicates the relative

starting time of a transaction. Time stamping is a concurrency control protocol that orders

transaction in such a way that older transaction with smaller imestamp will get priority in

the event of conflict.

21.What are the properties of transaction?

The four basic properties of transactions are called as ACID properties.

A - atomicity

C - consistency

I - isolation

D - durability

ATOMICITY:

The all or nothing property. A transaction is an indivisible unit that is

either performed in its entirely or not performed at all.

CONSISTENCY:

A trasaction must transform the database from one consistent state to

another consisient state.

ISOLATION:

Transactions execute independently on one another. In other words, the

partial effects of incomplete transaction should not be visible to other transactions.

DURABILITY:

The effects of successfuly completed transaction are permenently recorded

in the db and must not be lost because of subsequent failure.

22.Define concurrency control?

The process of managing simultaneous operations on the db without having them

interface with each other.

23.What are the problems caused by concrrency control?

The problems caused by concurrency control are

1. Lost update problem,

2. Uncommited dependency problem,

3. Inconsistent analysis problem.

LOST UPDATE:

An apparentlty successfully completed update operation by one user can

be overriden by another user. This is known as the lost update problem.

UNCOMMITED DEPENDNCY:

An uncommited dependency problem occurs when one transaction is

allowed to see the intermediate results of another transaction before it has committed.

INCONSISTENT ANALYSIS:

A problem of inconsistent analysis occurs when a transaction reads several

values from the db but a second transaction updates some of them during the execution

of the first.

24.Define serial schedule?

A schedule where the operations of each transaction are executed consequently

without any interleaved operations from other transactions.

25.Define serializable?

If a set of trasaction execute concurrently , we say that the schedule(nonserial) is

correct if it produces the same results as some serial execution. Such a schedule is called

serializable.

26.Define the conservative and optimistic concurrency control methods?

CONSERVATIVE METHOD:

This approach causes the transaction to be delayed in case they conflict

with other transaction at some time in future. Locking and timestamping are essentially

conservative approaches.

OPTIMISTIC METHOD:

This approach is based on the premise that conflict is rare so they allow

transaction to proceed unsynchronized and only check for conflicts at the end, when a

transaction commits.

27.Define shared and exclusive lock?

SHARED LOCK:A transaction has a shared lock on a data item it can only read

the item but cant update.

EXCUSIVE LOCK:A transaction has exclusive lock on a data item it cannot both

read and update the data.

28.Define 2PC?

A transaction follows two phase locking protocol if all locking operation precede the first

unlock operation in the transaction.

29.Define ignore obsolete write rule?

The transaction T asks to write an item(x) whose value already been written by an

younger transaction, that is ts(T)<write_timestamp(x). This means that a later trasaction

has alreadyupdated the value of the item, and the value that the older transaction is

writing must be based on an obsolete value of an item. In this case, the write operation

can be safely ignored. This is sometimes knows as the ignore obsolete write rule, and

allows greater consistency.

30.List out different db recovery facilities?

A DBMS should provide the following facilities to assist with recovery.

1. A backup mechanism, which makes periodic backup copies of the db.

2. Logging facilities, which keep track of the current state of transactions

and db changes.

3. A checkpoint facility, which enables updates to the db that are in

progress be made permenent.

4. a recovery manager, which allows the system to restore the db to an

consistent

state following a failure.

31.What is the need for db tuning?

The need for tuning a db are,

1. Existing tables may be joinned.

2. For a given set of tables, ther may be an alternative design choice.

32.Define normalization?

Normalization is a bottom up approach to a db design that begins by examining

the relationship between attributes. It is a validation technique. It supports a db designer

by presenting a series of tests, which can be applied to individual relations so that the

relational schema can be normalized to a specific form to prevent possible occurence of

update anomalies.

33.What is flattening the table?

We remove the repeated groups by entering the appropiate data in the empty

columns of rows containing the repeated data. In other words we fill in the blanks by

duplicating the nonrepeating data where required. This approach is called as flattening

the table.

UNIT II

1.Define parallel DBMS.

A DBMS running across multiple processors and disk that is designed to

execute operations in parallel whenever possible inorder to improve performance.

2.What are the different parallel db architectures?

Shared memory

Shared disc

Shared nothing

hierarchial

3.Differentiate interquery and intraquery parallelism.

Interquery parallelism:

Different queries or transactions execute in parallel with one another.It

increases scaleup and throughput.

Intraquery parallelism:

It refers to the execution of a single query in parallel on multiple processors

and disk.It is important for speeding up long running queries.

4.Differentiate intraoperation parallelism and interoperation parallelism.

Intraoperation parallelism:

Speed up processing of a query by parallelising the execution of each

individual operation.

Interoperation parallelism:

Speed up processing of a query by executing in parallel the different

operations in a query expression.

2 types,

pipelined parallelism

independent parallelism

5.Define distributed DBMS.

The software system that permits the management of distributed database

and makes the distribution transparent to the user.

6.What is the fundamental principle of distributed DBMS?

The fundamental principle of DDBMS is to make the distributed system

transparent to the user that is to make the distributed system appear like a centralised

system.

7.List any four advantages and disadvantages of DDBMS.

Advantages:

capacity and incremental growth

reliability and availability

efficiency and flexibility

sharing

Disadvantages:

managing and controlling is complex

less security because data is at different sites.

8.Define homogenous and heterogenous DDBMS.

Homogenous DDBMS:

In all sites the same DBMS product will be used.It is easier to design and manage.

advantage:Easy communication,possible to add more sites,provides increased

performance.

Heterogenous DDBMS:

Sites may run different DBMS product which need not be based on same data

model.Translations are required for communication between different DBMS.Data may

be required from another site that may have different hardware,different DBMS

product,different hardware and different DBMS product.

9.What are the major components of DDBMS?

There are four major components in DDBMS,

(1)Local DBMS component(LDBMS)

(2)Data Communication component(DC)

(3)Global System Catalog(GSC)

(4)Distributed DBMS component

10.What are the correctness rules for fragmentation?

Any fragment should follow the correctness rules.There are 3 correctness

rules.They are,

(1)Completeness

(2)Reconstruction

(3)Disjointness

11. Define multiple copy consistency problem?

Multiple copy consistency problem is the problem occurs when there is more than

one copy of a data item in different locations. To maintain consistency of the global

database, when a replicated data item is updated at one site all other copies of the data

item must also be updated. If a copy is not updated, the database becomes inconsistent.

12. Define distributed serializability?

If the schedule of transaction execution at each site is serializable, then the global

schedule is also serializable provided local serialization orders are identical. This is called

distributed serializability.

13. What are the different types of locking protocols in DDBMS?

The different types of locking protocols employed to ensure serializability in

DDBMS are centralized 2PL, primary copy 2PL, distributed 2PL and majority locking.

14. What are the types of deadlock detection in DDBMS?

There are three common methods for deadlock detection in DDBMSs:

centralized, hierarchical and distributed deadlock detection.

15. What is the general approach for timestamping in DDBMS?

The general approach for timestamping in DDBMS is to use the concatenation of

the local timestamp with a unique identifier, <local timestamp, site identifier>. The site

identifier is placed in the least significant position to ensure that events can be ordered

according to their occurrence as opposed to their location.

16. What are the phases of 2PC protocol?

The two phases of 2PC protocol are:

a voting phase and

a decision phase.

17. Define cooperative termination protocol?

Cooperative termination protocol is defined as the termination protocol which

blocks the participant without any information. However the participant could

contact each of the other participants attempting to find one that knows the decision.

18. What is the use of election protocols?

If the participants detect the failure of the coordinator they can elect a new site to

act as coordinator by using election protocols. This protocol is relatively efficient.

19. Define 3PC?

The three phase commit is an alternative non blocking protocol. It is non blocking

for all site failures, except in the event of the failure of all sites. The basic idea of 3PC

is to remove the uncertainty period for participants that have voted COMMIT from the

coordinator. 3PC introduces a third phase, called pre-commit, between voting and the

global decision.

20. Define Distributed Query Processing?

The process of converting high level language query into low level language with

effective execution strategy depends in order to achieve good performance is called query

processing. In distributed query processing the query was distributed and processed in

different locations.

21. Write the differences between locking and non-locking protocols?

Locking protocol Non-locking protocol

1. In this Locking guarantees that the

concurrent execution is equivalent to some

serial execution of those transactions.

2. This involves checking for deadlock at

each local level and at the global level.

3. This protocol does not involve

generation of timestamps.

1. In this timestamping guarantees that the

concurrent execution is equivalent to a

specific serial execution of those

transactions, corresponding to the order of

the timestamps.

2. This does not involve checking for

deadlock at any level.

3. This protocol involves the generation of

unique timestamps both globally and locally

UNIT III

1.Define OODM?

OODM- Object Oriented Data Model

A (Logical)data Model that captures the semantics of objects supported in

object-oriented programming.

2. Define OODB?

OODB-Object Oriented Database

A persistent and sharable collection of objects defined by an OODM

3. Define OODMS?

OODBMS- Object Oriented Database Management System

OODBMS-The Manager of OODB

OO refers to abstract DB plus Inheritence & object identify.

It is the Combination OO capability and DB capability.

4 . What are the types of OID?

They are 2 types of OID

Logical OID

Physical OID

5. Define pointer swizzling or object faulting?

To achieve the required performance, the OOBMS must be able to convert

OID to and from in memory pointer. This conversion technique is known pointer

swizzing or object faulting.

6. What is the aim of pointer swizzling ?

The aim of pointer swizzling is to optimize access to objects. As we have

just mentioned, reference between objects are normally represented using OIDs.

7. List the classification of pointer swizzling ?

Classification or technique for pointer swizzling:

Copy vs in place swizzling

Eager vs lazy swizzling

Direct vs indirect swizzling

8. Define persistent object ?

The object that exist even after the session is over is called

Persistent object.

There is 2 types of objects

Persistent

Transient

9. Define transient object ?

The Transient object is defined as Lact only for the invocation of program.

The Object’s memory is allocated and Deallocated by the programming

language’s at the run-time system.

10. List the scheme for implementing persistence within OODBMS?

Persistent scheme

There are 3schemes for implementing persistence in OODBMS

Check pointing

Serialization

Explicit paging

11. List the two methods for creating or updating persistent objects using explicit paging?

Reachability based method

Allocation based method

12. What are the fundamental principles of orthogonal persistence ?

It is based on 3 fundamental principles

Persistence independence

Data type orthoganality

Transitive persistence

13. Define nested transaction model ?

A transaction is views as a collection of related subtransaction each of

which may also containany number of subtransaction.

14. Define sagas ?

A sequence of flat transaction that can be interleaved with other

transaction.

Sagas is based on the use of Compensative transaction.

DBMS guarantees that either all the transaction in a Sagas are

Sucessfully completed or compensative Transaction are run to

recover from partial exection.

15. How the Concurrency Control is implemented in OODBMS?

Concurrency control protocol is used in Multiversion control protocol.

Hence,by using this the concurrency is implemented.

16.List the basic architecture for client server DBMS?

3 basic architecture for client server DBMS is

Object Server

Page Server

Database Server

17. Define POSTGRES?

POSTGRES is the reaserch system designers of INGRES that attempts to extend

the relational mode with abstract datatype procedure and rules.

18.What is a GEMSTONE?

Gemstone is a product which extend an existing object-oriented

programming language with database capability.

It extend 3 Languague such as Smalltalk, C++ or Java.

19.What is OQL?

OQL – Object Query Languague

An OQL is a function that delivers an object whose type may be infrrred

from the operator contributing to the query expression.

OQL is Used for both associative and navigational access.

20. Advantage and Disadvantage of OODBMS?

Adv:

Enriched modeling capabilities

Extensibility

Removal of impedance mismatch

Improved performance

Disadv:

Lack of Universal Data model

Lack of Experience

Lack of standards

Complexity

UNIT IV

1.Define Data Mining.

The process of extracting valid, previously unknown comprehensible

and actionable information from large databases and using it to make crucial business

decisions.

2.List the different steps in data mining.

Data cleaning

Data integration

Data selection

Data transformation

Data mining

Pattern evaluation

Knowledge presentation

3.Define Classification.

It is used to establish a specific, predetermined class for each record

in a database from a finite set of possible class values.

4. Define Clustering.

Clustering can be considered the most important unsupervised learning problem

A cluster is therefore a collection of objects which are “similar” between them and are

“dissimilar” to the objects belonging to other clusters.

5.Define data warehousing.

A subject oriented, integrated, time variant and non volatile collection of data in

support of the management’s decision making process.

6.Define web database.

A database that is used for web applications that use an architecture called three

tier architecture. It has web browser,web server, database server.

7.Define mobile database.

A database that is portable and physically separate from a centralized database

server but is capable of communicating with that server from remote sites allowing the

sharing of corporate data.

8.Define upflow.

Upflow means adding value to the data in the datawarehouse through summarizing,

packaging and distribution of data.

9.Define downflow.

Downflow means archiving and backing up the data in the warehouse.

10.What are the different groups of end user access tools?

Reporting and query tools.

Application development tools.

Executive information system tools.

Online analytical processing tools.

Data mining tools.

11.What are the four main operations associated with data mining techniques.

1. Predictive modeling.

2. Database segmentation.

3. Link analysis.

4. Deviation detection.

12.Define outliers.

Outliers which express deviation from some previously known

expectations and norms.

13.List the benefits of data warehousing.

1. Potential high returns on investment.

2. Competitive advantage.

3. Increased productivity of corporate decision makers.

14.Define XML.

The basic object is XML in the XML document.Two main structuring concepts

are used to construct an XML document:elements and attributes.Attributes in XML

provide additional information that describes elements.

15.What are the uses of DTD?

DTD give an overview of XML schema. It specifies the elements and their nested

structures.

16.Define data mart.

Data marts generally are targeted to a subset of the organization, such as a

department and are more tightly focused.

17.Define client/server model.

Client server model is a two-tier architecture. It consists of 2 tiers namely client

and server. Here the client performs presentation service and the server performs data

service. The client is called fat-client because client require more resources.

18.List the use of data mining tools.

Data preparation.

Selection of data mining operation.

To provide scalability and improve performance.

Facilities for visualization of result.

19.Define OLAP.

OLAP is a term used to describe the analysis of complex data from the

datawarehouse.OLAP tools use distributed computing capabilities for analysis that

require more storage and processing power.

20.List the problems of data warehousing.

Project management is an important and challenging consideration

that should not be underestimated.

Administration of a data warehouse is an intensive enterprise,

Proportional to the size and complexity of the data warehouse.

21.List some examples of data mining application.

Marketing.

Finance.

Manufacturing.

Health care.

Unit-V

1.Define deductive database.

A deductive database includes capabilities to define (deductive) rules, which

can deduce or infer additional information from the facts that are stored in a database.

Because part of the theoretical foundation for some deductive database systems is

mathematical logic, such rules are often referred to as logic databases.

2.Define spatial database.

Spatial databases provide concepts for databases that keep track of objects in a

multi dimensional space.

3.Define multimedia database.

Multimedia provide features that allow users to store and query different types of

multimedia information, which includes images (such as photos or drawing), videoclips

(such as movies, newsreels, or home videos), audio clips (such as songs, phone messages,

or speeches), and documents (such as books or articles).

4.List the different spatial query language.

The different spatial query languages are

1. Range query

2. nearest neighbor query

3. Spatial joins or overlays.

5. Define inference engine.

An inference engine (or deductive mechanism) within the system can deduce new

facts from the database by interpreting these rules. The model used for deductive

databases is closely related to the relational data model, and particularly to the domain

relational calculus formalism. It is related to the field of logic programming and the

prolog language.

6.Example for spatial database.

Example for spatial database is cartographic databases that store maps

include two dimensional spatial descriptions of their objects - from countries and states to

rivers, cities, roads, seas and so on. These applications are also knowns as Geographical

Information Systems(GIS), and are used in areas such as environmental, emergency, and

battle management. Other databases, such as meterological databases for weather

information, are three dimensional, - since temperatures and other Meterological

information are related to three dimensional spatial points.

7. Define active database.

Active databases which provide additional functionality for specifying

active rules. These rules can be automatically triggered by events that occur, such as

database updates or certain times being reached, and can initiate certain actions that have

been specified in the rule declaration to occur if certain conditions are met.

8. Example for multimedia database.

For example, one may want to locate all video clips in a video database that

include a certain person, say Bill Clinton. One may also want to retrieve video clips

based on certain activities included in them, such as a video clips where a soccer goal is

scored by a certain player or team.

9. Define Quad trees.

Quad trees generally divide each space or subspace into equally sized areas, and

proceed with the subdivisions of each subspace to identify the positions of various

objects.

10. What are the two main methods of defining the truth values of predicates in actual

datalog programs?

There are two main methods of defining the truth values of predicates in actual

datalog programs that is,

1.Fact-defined predicates (or relations)

2. Rule-defined predicates (or views).

11. What is Fact-defined predicates?

Fact-defined predicates (or relations) are defined by listing all the combinations of

values (the tuples) that make the predicate true. These corresponds to base relations

whose contents are stored in a database system.

12. What is Rule-defined predicates?

Rule-defined predicates (or views) are defined by being the head of one or more

Datalog rules; they correspond to virtual relations whose contents can be inferred by the

inference engine.

13. What is the use of relational operations?

It is straightforward to specify many operations of the relational algebra in the

form of Datalog rules that define the result of applying these operations on the database

relations (fact predicates). This means that relational queries and views can easily be

specified in Datalog.

14. What are the characteristics of Nature of Multimedia Applications?

Applications may be categorized based on their data management characteristics

as follows:

1. Repository applications

2. Presentation applications

3. Collaborative work using multimedia information.

15. What are the terms included in multimedia information systems?

Multimedia Information Systems are complex, and embrace a large set of issues,

including the following:

1. Modeling

2. Design

3. Storage

4. Queries and retrieval

5. Performance

16. What are the different characteristics of Hypermedia links or hyperlinks?

1. Links can be specified with or without associated information, and they may

have large descriptions associated with them.

2. Links can start from a specific point within a node or from the whole node.

3. Links can be directional or nondirectional when they can be traversed in either

direction.

17. What are the applications of multimedia database?

1. Documents and records management

2. Knowledge dissemination

3. Education and training

4. Marketing, advertising, retailing, entertainment, and travel

5. Real-time and monitoring.

18. What are the three main possibilities for rule consideration?

There are the three main possibilities for rule consideration:

1. Immediate consideration

2. Deferred consideration

3. Detached consideration

19. What is Horn Clauses?

In Datalog, rules are expressed as a restricted form of clauses called Horn

Clauses, in which a clause can contain at most one positive literal.

20. What are the two alternatives for interpreting the theoretical meaning of rules?

There are two main alternatives for interpreting the theoretical meaning of rules:

1. Proof-theoretic

2. Model-theoretic.

16 MARKS

UNIT I

1. Explain the different phases in query processing.

Query processing are the activities involved in retrieving data from

database. The different phases involved in query processing are

i. Query decomposition

ii. Query optimization

iii. Code generation

iv. Runtime query execution

Query decomposition:

It transforms high level language into relational algebra expression and

check whether the query is syntactically correct or not.

The different stages of query processing are

Analysis

Normalization

Semantic analysis

Simplification

Query restructure

Query optimization:

Query optimization is the activity of choosing an efficient execution

strategy of processing a query. It is of two types:

1. Dynamic query optimization

2. Static Query optimization

Heuristical approach to query processing:

It uses transformation rules to convert one relational

algebra expression into an equivalent form that is more efficient.

Transformation rules for relational algebra operation

Heuristical processing strategy

2. Explain the heuristical approach to query optimization.

It uses transformation rules to convert one relational algebra

expression into an equivalent form that is more efficient.

Transformation rules for relational algebra operation

- write 12 rules

Heuristical processing strategy

- write 5 strategies

3. Explain the problems caused by concurrency control.

The process of managing simultaneous operations on the database without

having them to interfere with one another is called as concurrency control. The

problems caused by concurrency control are

iv. Lost update problem

v. Uncommitted dependency problem

vi. Inconsistent analysis problem

i. Lost update problem:

In this, successfully completed update operation by one

user can be overridden by another user.

ii. Uncommitted dependency problem(Dirty read problem):

It occurs when one transaction is allowed to see

intermediate results of another transaction before it has

committed.

iii. Inconsistent analysis problem:

It occurs when a transaction reads several values from

database but second transaction updates some of them

during the execution of first.

4. Explain the different steps of using locks and how concurrency control

problems can be prevented using 2PL?

Steps of using locks:

Any transaction that needs to access a data item must first lock the

item.

If the item is not already locked by another transaction, the lock

will be granted.

If the item is currently locked, the DBMS determines whether the

request is compatible with the existing lock.

A transaction continues to hold a lock until it explicitly releases it

either during execution or when it terminates.

Two-Phase Locking (2PL):

A transaction follows 2PL protocol if all locking operations

precede the first unlock operation in the transaction. The two

phases are

Growing phase

Shrinking phase

Preventing the lost update problem using 2PL

Preventing the uncommitted dependency problem using 2PL

Preventing the inconsistence analysis problem using 2PL

5. Explain the basic timestamp ordering protocol and Thomas write rule.

Timestamp is a unique identifier created by the DBMS that indicates the

relative starting time of a transaction.

Time stamping is a concurrency control protocol that orders transaction in such a way

that older transaction with smaller timestamp will get priority in the event of conflict. The

basic timestamp ordering protocol works as follows:

1. The transaction Tissues a read(x)

a) ts(T)<write_timestamp(x)

Transaction T is aborted restarted with new timestamp.

b) ts(T)>=write_timestamp(x)

Read operation can proceed and set

read_timestamp(x)=max(ts(T),read_timestamp(x))

2. The transaction Tissues a write(x)

a) ts(T)<read_timestamp(x)

Transaction T is roll backed and restarted with new timestamp.

b) ts(T)<write_timestamp(x)


c) Otherwise,

Write operation can proceed and set write_timestamp(x)=ts(T).

Thomas Write Rule:

a) ts(T)<read_timestamp(x)


b) ts(T)<write_timestamp(x)

Ignore the write operation.

c) Otherwise,

Write operation can proceed and set write_timestamp(x)=ts(T).

5. Explain 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, 6NF with eg.

First Normal Form (1NF):

A relation in which the intersection of each row and column contains one

only one value.

Second Normal Form (2NF):

A relation that is in 1NF and every non-primary key attribute is fully

functionally dependent on the primary key.

Third Normal Form (3NF):

A relation that is in 1NF and 2NF, and in which no non-primary key

attribute is transitively dependent on the primary key.

Boyce-Codd Normal Form (BCNF):

A relation is in BCNF, if and only if, every determinant is a candidate key.

Fourth Normal Form (4NF):

A relation is in BCNF and contains no nontrivial multi-valued

dependencies.

Fifth Normal Form (5NF):

A relation that has no join dependency.

UNIT II

1.Explain the types of fragmentation.

Fragmentation definition – 3 Correctness rule – completeness – Reconstruction-

Disjointness – 4 types - Horizontal fragmentation explanation with eg – Vertical

fragmentation explanation with eg– Mixed fragmentation explanation with eg – Derived

fragmentation explanation with eg.

2.Explain the different types of locking protocols in DDBMS.

Locking protocols definition – Ensure serializability – 4 types of locking

protocols – Centralized – lock manager is centralized – 5 messages - adv and disadv –

Primary copy 2 PL – Many lock managers are available – adv and disadv – Distributed 2

PL – Lock manager is distributed in every site - adv and disadv – Majority locking- adv

and disadv.

3.Explain the distributed deadlock management.

Distributed deadlock management definition – transaction waiting for another

transaction – Wait For Graph(WFG) – Local Wait For Graph – Combined Wait For

Graph – Handling deadlocks – Centralized definition with advantage and disadvantage

– Hierarchical definition with advantage and disadvantage - Distributed definition with

advantage and disadvantage.

4.Explain the reference architecture for DDBMS and Component architecture for

DDBMS

Reference Architecture

Diagram - Global external schema, fragmentation schema, allocation schema,

global conceptual schema, local mapping schema

Component Architecture

Diagram – 4 major components – Local DBMS (LDBMS) – Data

Communication Components(DC) – Global System Catalog – Distributed DBMS

5.Explain the phases of 2 PC.

2 PC – Blocking protocol – co ordinator and participant definition – Two phases

of 2 PC – Voting phase – explanation - Decision phase – explanation- Procedure for

coordinator and procedure for participants

7.Give notes on Distributed Transaction Management

Distributed transaction management definition – Modules – Transaction manager,

Scheduler, Recovery Manager, Buffer Manager – Locking Manager - Data

Communication component – Procedure to execute a Global transaction initiated at sight

S1.- Distributed concurrency control – Concurrency control problem – Distributed

serializability.

8.Explain about 3 PC

3 PC definition – Non blocking protocol – Coordinator – states of coordinator –

Initial, Waiting, Decided – Participant – states of participant – Initial, Prepared,

Precommit, abort, commit.

UNIT III

1.Explain the schemas for implementing persistance?

*dbms must provide for the storage of persistant objects

*there are 3 schemas for implementing persistance

*they are

-check pointing

-serialization

-explicit paging

check pointing:

*copy all the part of the program's address space to secondary storage

*complete address space is saved and the program can be restarted from the check

point.

*in other cases only the program's heap can be saved

drawbacks:

*can be used only the program that created

*large amount of data is not used

serialization:

*implements persistance by copying the closure of the data items to the disk

*reading back this flattened data structure produces new copy of the original data.

*called seialization or pickling or in a didtributed computing context marshalling.

drawbacks:

*does not preserve object identity

*it is not incremental

explicit paging:

*involves the application programmer explicitly paging objects between

application heap and the persistant store

*reachability-based persistance means that an object will persist if its reachable

from a persistant root object

*programmer does not need to decide at object creation time whether the object

should be persistant

*after creation an object can become persistant by adding it to the reachability

tree

*allocation based persistance means that an object is made persistance only if it is

explicitly declared as such within the application program

*byclass -Aclass is statically declared to be persistant and all the instances ofthe

class are made persistant wen they are created

*alternatively a class may be a sub class of a system supplied persistant class.

2.Explain the classification of pointer swizzling techniques?

*The action of converting object identifiers to main memory pointers,and back

again

*the aim of pointer swizzling is to optimize access to objects

*if we read an object from secondary storage intothe data base cache,we should

be able to locate any referenced objects on data *storage using their oids

*want to record those objects held in main memory pointers

*pointer swizzling attempts to provide a more efficient strategy by storing the

main memory pointers

no swizzling:

earliest implementation of faulting objects

objects are faulted into memory by the underlying objects manager

a handle is passed back to the application

the system maintain some sort of lookup tables so that the object virtual memory

pointers can be located and then used to access the object

application tends to access the object only once

moss proposed the analytical model of avaluating tha conditions in which

swizzling is appropriate

classification of pointer swizzling:

can be classified according to three dimensions

copy vs inplace swizzling

eager vs lazy swizzling

direct vs indirect swizzling

copy vs inplace swizzling:

data can either be copied into the application's local object cache or it can be

accessed in place within the object manager's database cache.

only modified objecs have to be swizzled back o their oids

in inplace technique maqy have to unswizzle an entire page of an object if one

object on the page is modified

with the copy approach,every object may be exxplicitly copied into the local

object cache.

eager vs lazy swizzling:

Moss and Eliot define eager swizzling

all data pages used by the application before any objects can be accessed

kemper and kossman provide a more relaxed definition

restricting the swizzling to all persistant oids within the object application wishes

to access

lazy swizzling involves less over head when an object is faulted into memory, but

it does mean that two different types of pointer must be handled for every object

access

direct vs indirect swizzling:

possible only for a swizzled pointer to refer an object that is no longer in virtual

memory

the virtual memory pointer of the referenced object is placed directly in the

swizzled pointer

in indirect swizzzling the virtual memory pointer is placed in an intermediate

object which act as a place holder for the actual object

3.Explain the locking protocols?

Centrlalized 2PL:

* with the centralized 2PL protocol there is a single site that maintains all locking

informaton

* there is only one scheduler, or lock manager, for the whole of the distributed DBMS

that can grand and release locks.

* the transaction coordinator at site S1 divides the transaction into a number of

subtransactions,using tinformation held in the global system catalog

* the coordinator has responsibility for ensuring that consistencyis maintained

* if the data item is replicated,the coordinator must ensure that all copies of the data

item are updated

* thus the coordinator requests exclusively locks on all copies

* the local transactions managers involved in the global transaction request and release

locks

* the advantage of centralized 2PL is that the implementation is straightforward

* deadlock detecion is nomoredifficult than that of a centralized DBMS

* the disadvantages with centralization in a distributed DBMS are bottlenecks and

lower reliability

* for example,a global update operation that has agents (subtransactions) at n sites may

require a minimum of 2n+3 messages with a centralized lock manager:

-1 lock request;

-1 lock grant message;

-n update messages;

-n acknowledgements;

-1 unlock request.

Primary copy 2PL

* the lock managers to a number of sites

* each lock manager is then responsible for managing the locks for a set of data items

* for each replicated data item one copy is chosen as the primary copy

* the other copies are called slave copies

* choose as the primary site is flexible

* and the site that is choosen to manage the locks for a primary copy need not hold the

primary copy of the item

* the protocol is a straightforward extension of centrlized 2PL

* in order to send the lock request to the appropriate lock manager

* reading out of-date values

* the disadvantages of this approach are that dead lock handling is more complex

* lock requests for a specific primary copy can be handled only by one site

* backup sites to hold locking information

* lower communication costs and better performance

Distributed 2PL

* othewise distributed 2PL implements a Read-One-Write-All(ROWA) replica controll

protocol

* deadlock handling is more complex

-n lock requests messages

-n lock grant messages

-n update messages

-n acknowledgements

-n unlock requests

Majority Locking

* lock all copies of a replicated item before an update

* the disadvantages are that the protocol is more complicated

it recieves a ma jority tha it ha srecieve a lock and inform all the sites that it has recieved

the lock there should be atleast (n+1)/2 messages for lock request and (n+1)/2 messages

for unlock request

4.Explain the strategies for developing ooadbms

*extend ab existing object oriented programming language with database capabilities

add traditional database capabilities languages used are small talk,c++ or java this

approach is taken by gemstone

*provide extendible object oriented dbms libraries class libaries are provided that support

persistance,aggregation,datatypes,transactions,concurrency approach is taken by ontos

versant and object store

*embed object oriented database lanuage constructs in a conventional host language

how sql can be embedded in conventional host language o2 is used

*extend an existing database language with object oriented capabilities this approach is

being pursued by both rdbms and oodbms vendors ontos and versant is used

*develop a novel database datamodel/data language starts from the beginning and

develop an entirly new database approach is taken by sim(semantic information

manager)

5) Give notes on (i)nested transaction model (ii)sagas transaction model (iii)multilevel

transaction model

(i)nested transaction model:-

Introduced by moss

The completed transaction is depicted as a tree or a hierarchy of

subtransaction

Top level transaction can have no. of child transaction and child

transaction can also have nested transaction

Eg:-

transaction has to commit from bottumup

Transaction abort at one level doesnot affect the transaction progress at

a higher level

Instead a parent is allowed to perform its own recovery

Different ways for recovery:-

abort the transaction

ignore the failure in which case subtransaction is non vital

retry the subtransaction

run an alternative subtransaction called contingency or compensative

subtransaction

Advantages:-

modularity

granularity

intra transaction parallelism

intra transaction recovery

(ii)sagas:-

a sequence of flat transaction that can be interleaved with other

transaction

sagas is bassed on the use of compensative transaction

dbms gurantees that either all the transactions in the sagas are

successfully completed or compensating transaction are run to

recover from partial execution

if we have a saga comprising of a sequence of n transaction

T1,T2.....Tn with the compensating transaction c1,c2,....cn

final outcome may be T1,T2...Tn if transaction completes

successfully or T1,T2...Ti,Ci-1,C2,C1 if fails

relaxes isolation property

difficult to define compensative transaction in advance

(iii)multilevel transaction model:-

2 types

(i)closed nested transaction

(ii)open nested transaction

6)Explain ODMG model?

ODMG-object data management group

Superset of OM(object model) which enables both design and

implementation to be ported between the complaint system

Basic modelling primitives:-

(i)objects

(ii)literals

(i)objects:-

Described by 4 charcteristics

(i)object structure

(ii)object identifier

(iii)object name

(iv)object lifetime

(ii)literals:-

Decomposed into

(i)atomic

(ii)collections

(iii)structured

(iv)null

Two types of collection (i) ordered (ii)unordered

5 different builtin collection types

(i)set

(ii)bag

(iii)list

(iv)array

(v)dictionary

7)Explain the features of postgres?

Postgres is a research database system designed to be a successor to the ingres

RDBMS

Objectives:-

(i)to provide better support for complex objects

(ii)to provide user extensibility for data types,operators & access methods

(iii) to provide active db facilities and inferencing support

(iv)to simplify the dbms code for cash recovery

(v)to make as few changes as possible to the relational model

Postgres extends the relational model to include the following mechanism

(i)abstract datatypes

(ii)data of type procedure

(iii)rules

GEMSTONE:

Single database process monitor process or stone

Gem process incorporate a data management kernel and interprete for

ODAL

Gemstone configuration may include multiple hosts in the network and

client simultaneously database on multiple host

User interface on each gem process is provided by a program running as a

separate process on the host machine

Aspecial purpose Gemstone interface the OPAC programming

environment is provided for application development

The Gemstone architecture is a client-server architecture

The database allocates object identifiers and performs transaction

FEATURES:

GemStone is a highly scalable client-multiserver database for commercial

applications.

GemStone's features include:

Server Smalltalk

Concurrent Support for Multiple Languages

Flexible multi-user transaction control

Object-level security

Dynamic schema and object evolution

Production Services

Scalability

Legacy Gateways

Developer Tools

Database Administration Tools

MANAGEMENT FUNCTIONS:

Gemstone sessions access the database directly,thus eliminating a

potential performance bottleneck

It provide facilities necessary to allow external application programs

written in c or smalltalk to access Gemstone

(Gemstone objectNamed:’objectname’)as local object

LocalBolt<-(Gemstone objectNamed:’Bolt’)as local object

LocalBolt remoteperform: #PortName localBolt GemstonepartName

The first message has selector remote perform with the literal part name.

UNIT – IV

1.Explain the operation associated with data mining techniques?

There are 4 main operations associated with data mining.They are:

Predictive Modeling

Database Segmentation

Link Analysis

Deviation Detection

Predictive Modeling:

The model is developed using Supervised learning approach. There are 2

phases:

i) Training

ii) Testing

There are 2 steps associated with predictive modeling:

i) Classification

ii) Value prediction

Classification:

This is used to establish a specific,predetermined class for each record in a database from

a finite set of possible claa values.

2 specifications of classification

i) Tree induction

ii) Neural induction

Tree induction:

Creates decision tree that predicts customers who have rented for more than 2 years

and are over 25 years old.

Neural induction:

Constructing neural network contains a collection of nodes input,processing,output.

Value Prediction:

Used to estimate a continuous numerical value that is associated with the database

records.

2 techniques:

i) Linear regression

ii) Non linear regression

Data Segmentation:

Partition a database inro an unknown number of segments or clusters of similar

records.

Link analysis:

There are 3 specialization of link analysis,

i) Associaton Discovery

ii) Sequential pattern discovery

iii) Time sequence discovery

Deviation Detection:

This is a source of truediscovery because it identifies outliers.

2.Describe DataWareHouse architecture.

The components are:

Operational data:

Source of he data for the data warehouse is supplied from mainframe operational

data.

Mainframe operational source:

Held in hierarchial and network database.

Departmental operational data:

The data that is available in relational DBMS.

Private data:

Data available in private database.

External System:

Data held in external system.Eg.Internet

Operatonal DataStore(ODS):

Data from different sources are stored in ODS which is mainly used for

analysis the data.

Load Manager:

It get the data directly from operational data sources or from ODS.

Warehouse Manager:

Managing the datawarehouse.Management of data in the datawarehouse.

Query Manager:

Management of user queries.Query profiles are used to determine which indexes

and aggregation are appropriate.

Detailed Data:

This area of warehouse stores all the detailed data in the database schema.

Purpose of summary information:

It is to speed up the performance of query.

Archive/Backupdata:

Both detailed and summarized data are stored in back updata that is used for

recovery.

Metadata:

The area of warehouse stores all the metadata ie) data about data.

End user access tools:

Users interact with the database house using end user access tools.

Datawarehouse and DataFlows:

There are 5 preimary data flows:

Inflow

Outflow

Upflow

Downflow

Metaflow

3.Explain Mobile DataBase with Diagram.

A database that is portable and physically separate from a centralised database

server but is capable of communicating with that server from remote sites allowing the

sharing of corporate data.

Architecture of Mobile Database Environment:

Components:

1) Corporated database server and DBMS:

Manages and stores corporate data and provides corporate application.

2) Remoted database and DBMS:

Manages and stores mobile data and provide mobile application.

3) Mobile db platform:

Contains laptops,PDA, or other internet access devices.

2 way communication link between corporate and mobile DBMS

Issues associated with mobile database:

i) Management of mobile database

ii) Communicaton between mobile and corporate database

Additional functionality required by mobile DBMS:

1) Ability to communicate with centralized db server through modes such as wireless.

2) Ability to replicate data available on mobile device or centralized server.

3) Synchronize data on the centralized db server and mobile device.

4) Capture data from different sources

5) Manage data on the mobile device

6) Analyse the data on mobile device

7) Create customized mobile application

4.Explain Web DBMS Architecture.

Web is developed by Cern in1990.

Hyper links are used for moving from one page to another.Website is a collection of

webpages.

Web as a database application platform:

Integrate web with dbms.

Requirements for web dbms integration:

1) Security

2) Independent connectivity

3) Ability to interface to the database

4) A connectivity solution takes advantage of all the features of an organisations

DBMS

5) Open architecture support

6) It must provide scalability

7) Multiple HTTP request

8) Support for session and application based authentication

9) Acceptable performance

10) Minimal administration overhead

Web DBMS Architecture:

Traditionally it uses 2-tier client-server architecture.

Where,

Tier1-client

Tier2-server

Task performed by client is Presentation Service.

Task performed by server is data Service.

Two problems in 2-tier:

i) A fat client

ii) A significant client site administration overhead.

3-tier Architecture:

There are 3 layers.

Tier1-client

Tier2-business logic and data processing layer

Tier3-DBMS

Client is known as Thin client

Applicaton layer-performs the main processing.

Advantage of 3-tier architecture:

1) Less expensive hardware

2) Application maintenance is centralized

3) Load balancing is easier

3-tier architecture can be extended to n-tier architecture.Applicaton layer is divided into

i) application server ii) Web server

5.Approaches to integrate web and DBMS.

1) Scripting languages such as VB script and java script

2) Using CGI

3) Using HTTP cookies

4) Extensions to web server such as Ntescape API,Microsoft Internet

Information Server API

5) Java and JDBC,SQLJ,Servlets,JSP

6) Microsofts web solution platform with ASP and ActiveX data objects

7) Oracles internal platform

UNIT V

1.Explain the deductive database concepts.

Deductive database:

A deductive database includes capabilities to define (deductive) rules, which

can deduce or infer additional information from the facts that are stored in a database.

Because part of the theoretical foundation for some deductive database systems is

mathematical logic, such rules are often referred to as logic databases.

Inference engine:

An inference engine (or deductive mechanism) within the system can deduce new

facts from the database by interpreting these rules. The model used for deductive

databases is closely related to the relational data model, and particularly to the domain

relational calculus formalism. It is related to the field of logic programming and the

prolog language.

Horn Clauses

In Datalog, rules are expressed as a restricted form of clauses called Horn

Clauses, in which a clause can contain at most one positive literal.

Use of relational operations:

It is straightforward to specify many operations of the relational algebra in the

form of Datalog rules that define the result of applying these operations on the database

relations (fact predicates). This means that relational queries and views can easily be

specified in Datalog.

2.Explain multimedia database.

Multimedia database:

Multimedia provide features that allow users to store and query different types of

multimedia information, which includes images (such as photos or drawing), videoclips

(such as movies, newsreels, or home videos), audio clips (such as songs, phone messages,

or speeches), and documents (such as books or articles).

Multimedia Information Systems are complex, and embrace a large set of issues,

including the following:

1. Modeling

2. Design

3. Storage

4. Queries and retrieval

5. Performance

Applications of multimedia database:

1. Documents and records management

2. Knowledge dissemination

3. Education and training

4. Marketing, advertising, retailing, entertainment, and travel

5. Real-time and monitoring.

3. Explain in detail about spatial database.

Spatial database:

Spatial databases provide concepts for databases that keep track of objects in a

multi dimensional space.

Different spatial query language.

The different spatial query languages are

1. Range query

2. nearest neighbor query

Quad trees:

Quad trees generally divide each space or subspace into equally sized areas, and proceed

with the subdivisions of each subspace to identify the positions of various objects.

4.Explain in detail about active databases.

Active databases:

Active databases which provide additional functionality for specifying active

rules. These rules can be automatically triggered by events that occur, such as database

updates or certain times being reached, and can initiate certain actions that have been

specified in the rule declaration to occur if certain conditions are met.

There are the three main possibilities for rule consideration:

1. Immediate consideration

2. Deferred consideration

3. Detached consideration

Triggers- A technique for specifying certain types of active rules.