NOORUL ISLAM COLLEGE OF ENGINEERING,KUMARACOIL
DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING
B.E.SIXTH SEMESTER
2 MARKS & 16 MARKS
CS606- ADVANCED DATABASE TECHNOLOGY
Prepared By,J.E.Judith
Lecturer/CSENICE
TWO MARKS
UNIT I
1.Define ER model?
The entity-relationship model (or ER model) is a top down approach to
database design that begins by identifying the important data called entities and
relationship between the data. The ER model was first proposed by Peter Pin-Shan Chen.
2.Define Entity type?
A group of object with same property which are identify by the enterprise as
having an independence existence. In an ER model, we diagram an entity type as a
rectangle containing the type name, such as student
ER diagram notation for entity student
3.Define Entity occurrence?
A uniquely identifiable object of an entity type is known as entity occurrence.
Entity occurrence is similar to entity.
4.Define relationship type?
A relationship type is a set of meaningful associations among entity types.
For example, the student entity type is related to the team entity type because each
student is a member of a team.
ER diagram notation for relationship type, Member Of
5.Define relationship occurrence?
A uniquely identifiable association which includes one occurrence from each
participating entity type.
6.Define degree of relationship?
Student
Student MemberOf Team
The degree of a relationship type is the number of entity types that participate. If
two entity types participate, the relationship type is binary. A role name indicates the
purpose of an entity in a relationship.
7.Define recursive relationship with diagrammatic representation?
A recursive relationship is one in which the same entity participates more than
once in the relationship. The supervision relationship is a recursive relationship because
the same entity, a particular team, participates more than once in the relationship, as a
supervisor and as a supervisee.
8.What are the types of attribute?
The types of attributes are
1. Simple and composite attribute
2. Single-valued and multi-valued attribute
Simple and composite attribute
Attributes that can’t be divided into subparts are called Simple or
Atomic attributes. The attribute composed of single component with independent system.
Ex: position and salary attribute of staff entity.
The attribute composed of multiple components each with an
independent existence. Composite attributes can be divided into smaller subparts.
For example, take Name attributes. We can divide it into sub-parts like First name,
Middle name, and Last-named.
Single-valued and multi-valued attribute
Attributes that can have single value at a particular instance of time
are called single valued. A person can’t have more than one age value. Therefore, age of
a person is a single-values attribute.
A multi-valued attribute can have more than one value at one time.
For example, degree of a person is a multi-valued attribute since a person can have more
than one degree.
9.Define candidate key?
Minimal set of attributes that uniquely identifies each occurrence of an entity
type is known as primary key. For example: Branch number attribute is the candidate key
for branch entity type.
10.Define primary key?
The candidate key that is selected to uniquely identify each occurrence of an
entity type is called primary key. Primary keys may consist of a single attribute or
multiple attributes in combination.
11.Differentiate strong and weak entity type?
An entity type that is not existence dependent on some other entity type
called strong entity type. For example, the entity type student is strong because its
existence does not depend on some other entity type.
An entity type that is existence dependent on some other entity type is
called weak entity type. For example, a child entity is a weak entity because it relies on
the parent entity in order for it to exist.
12.Define query processing?
Query processing transforms the query written in high level languages into a
correct and efficient execution strategy expressed in a low level language ant to execute
the strategy to retrieve the required data.
13.Define query optimization?
Query optimization means converting a query into an equivalent form
which is more efficient to execute. It is necessary for high level relation queries and it
provides an opportunity to DBMS to systematically evaluate alternative query execution
strategies and to choose an optimal strategy.
14.What are the phases of query processing?
The phases are
1) Query Decomposition.
2) Query Optimization.
3) Code Generation.
4) Runtime Query Execution.
15.Define query decomposition and what are its stages?
The query decomposition is the firs phase of query processing whose
aims are to transform a high-level query into a relational algebra query and to check
whether that query is syntactically and semantically correct.
Different stages are
1) Analysis
2) Normalization
3) Semantic analysis
4) Simplification
5) Query restructuring.
16.Define conjunctive and disjunctive normal form?
Conjunctive normal form
Conjunctive normal form means sequence of conjuncts connected
with an AND operator. These conjuncts contain one or more terms connected
by OR operator.
Disjunctive normal form
Disjunctive normal form means sequence of disjuncts connected
with an OR operator. These disjuncts contain one or more terms connected by
AND operator.
17.Differentiate Dynamic vs Static form optimization?
Dynamic optimization
Query has to be passed, validated and optimized before it can be executed.
All information required to select an optimum strategy is up to date.
Static optimization
Query is passed, validated and optimized only once.
Runtime overhead is reduced.
18.What are the problems caused by concurrency control.
The process of managing simultaneous operations on the database without
having them to interfere with one another is called as concurrency control. The
problems caused by concurrency control are
i. Lost update problem
ii. Uncommitted dependency problem
iii. Inconsistent analysis problem
19.Define 3NF and BCNF
Third Normal Form (3NF):
A relation that is in 1NF and 2NF, and in which no non-primary key
attribute is transitively dependent on the primary key.
Boyce-Codd Normal Form (BCNF):
A relation is in BCNF, if and only if, every determinant is a candidate key.
20. Define Timestamp?
Timestamp is a unique identifier created by the DBMS that indicates the relative
starting time of a transaction. Time stamping is a concurrency control protocol that orders
transaction in such a way that older transaction with smaller imestamp will get priority in
the event of conflict.
21.What are the properties of transaction?
The four basic properties of transactions are called as ACID properties.
A - atomicity
C - consistency
I - isolation
D - durability
ATOMICITY:
The all or nothing property. A transaction is an indivisible unit that is
either performed in its entirely or not performed at all.
CONSISTENCY:
A trasaction must transform the database from one consistent state to
another consisient state.
ISOLATION:
Transactions execute independently on one another. In other words, the
partial effects of incomplete transaction should not be visible to other transactions.
DURABILITY:
The effects of successfuly completed transaction are permenently recorded
in the db and must not be lost because of subsequent failure.
22.Define concurrency control?
The process of managing simultaneous operations on the db without having them
interface with each other.
23.What are the problems caused by concrrency control?
The problems caused by concurrency control are
1. Lost update problem,
2. Uncommited dependency problem,
3. Inconsistent analysis problem.
LOST UPDATE:
An apparentlty successfully completed update operation by one user can
be overriden by another user. This is known as the lost update problem.
UNCOMMITED DEPENDNCY:
An uncommited dependency problem occurs when one transaction is
allowed to see the intermediate results of another transaction before it has committed.
INCONSISTENT ANALYSIS:
A problem of inconsistent analysis occurs when a transaction reads several
values from the db but a second transaction updates some of them during the execution
of the first.
24.Define serial schedule?
A schedule where the operations of each transaction are executed consequently
without any interleaved operations from other transactions.
25.Define serializable?
If a set of trasaction execute concurrently , we say that the schedule(nonserial) is
correct if it produces the same results as some serial execution. Such a schedule is called
serializable.
26.Define the conservative and optimistic concurrency control methods?
CONSERVATIVE METHOD:
This approach causes the transaction to be delayed in case they conflict
with other transaction at some time in future. Locking and timestamping are essentially
conservative approaches.
OPTIMISTIC METHOD:
This approach is based on the premise that conflict is rare so they allow
transaction to proceed unsynchronized and only check for conflicts at the end, when a
transaction commits.
27.Define shared and exclusive lock?
SHARED LOCK:A transaction has a shared lock on a data item it can only read
the item but cant update.
EXCUSIVE LOCK:A transaction has exclusive lock on a data item it cannot both
read and update the data.
28.Define 2PC?
A transaction follows two phase locking protocol if all locking operation precede the first
unlock operation in the transaction.
29.Define ignore obsolete write rule?
The transaction T asks to write an item(x) whose value already been written by an
younger transaction, that is ts(T)<write_timestamp(x). This means that a later trasaction
has alreadyupdated the value of the item, and the value that the older transaction is
writing must be based on an obsolete value of an item. In this case, the write operation
can be safely ignored. This is sometimes knows as the ignore obsolete write rule, and
allows greater consistency.
30.List out different db recovery facilities?
A DBMS should provide the following facilities to assist with recovery.
1. A backup mechanism, which makes periodic backup copies of the db.
2. Logging facilities, which keep track of the current state of transactions
and db changes.
3. A checkpoint facility, which enables updates to the db that are in
progress be made permenent.
4. a recovery manager, which allows the system to restore the db to an
consistent
state following a failure.
31.What is the need for db tuning?
The need for tuning a db are,
1. Existing tables may be joinned.
2. For a given set of tables, ther may be an alternative design choice.
32.Define normalization?
Normalization is a bottom up approach to a db design that begins by examining
the relationship between attributes. It is a validation technique. It supports a db designer
by presenting a series of tests, which can be applied to individual relations so that the
relational schema can be normalized to a specific form to prevent possible occurence of
update anomalies.
33.What is flattening the table?
We remove the repeated groups by entering the appropiate data in the empty
columns of rows containing the repeated data. In other words we fill in the blanks by
duplicating the nonrepeating data where required. This approach is called as flattening
the table.
UNIT II
1.Define parallel DBMS.
A DBMS running across multiple processors and disk that is designed to
execute operations in parallel whenever possible inorder to improve performance.
2.What are the different parallel db architectures?
Shared memory
Shared disc
Shared nothing
hierarchial
3.Differentiate interquery and intraquery parallelism.
Interquery parallelism:
Different queries or transactions execute in parallel with one another.It
increases scaleup and throughput.
Intraquery parallelism:
It refers to the execution of a single query in parallel on multiple processors
and disk.It is important for speeding up long running queries.
4.Differentiate intraoperation parallelism and interoperation parallelism.
Intraoperation parallelism:
Speed up processing of a query by parallelising the execution of each
individual operation.
Interoperation parallelism:
Speed up processing of a query by executing in parallel the different
operations in a query expression.
2 types,
pipelined parallelism
independent parallelism
5.Define distributed DBMS.
The software system that permits the management of distributed database
and makes the distribution transparent to the user.
6.What is the fundamental principle of distributed DBMS?
The fundamental principle of DDBMS is to make the distributed system
transparent to the user that is to make the distributed system appear like a centralised
system.
7.List any four advantages and disadvantages of DDBMS.
Advantages:
capacity and incremental growth
reliability and availability
efficiency and flexibility
sharing
Disadvantages:
managing and controlling is complex
less security because data is at different sites.
8.Define homogenous and heterogenous DDBMS.
Homogenous DDBMS:
In all sites the same DBMS product will be used.It is easier to design and manage.
advantage:Easy communication,possible to add more sites,provides increased
performance.
Heterogenous DDBMS:
Sites may run different DBMS product which need not be based on same data
model.Translations are required for communication between different DBMS.Data may
be required from another site that may have different hardware,different DBMS
product,different hardware and different DBMS product.
9.What are the major components of DDBMS?
There are four major components in DDBMS,
(1)Local DBMS component(LDBMS)
(2)Data Communication component(DC)
(3)Global System Catalog(GSC)
(4)Distributed DBMS component
10.What are the correctness rules for fragmentation?
Any fragment should follow the correctness rules.There are 3 correctness
rules.They are,
(1)Completeness
(2)Reconstruction
(3)Disjointness
11. Define multiple copy consistency problem?
Multiple copy consistency problem is the problem occurs when there is more than
one copy of a data item in different locations. To maintain consistency of the global
database, when a replicated data item is updated at one site all other copies of the data
item must also be updated. If a copy is not updated, the database becomes inconsistent.
12. Define distributed serializability?
If the schedule of transaction execution at each site is serializable, then the global
schedule is also serializable provided local serialization orders are identical. This is called
distributed serializability.
13. What are the different types of locking protocols in DDBMS?
The different types of locking protocols employed to ensure serializability in
DDBMS are centralized 2PL, primary copy 2PL, distributed 2PL and majority locking.
14. What are the types of deadlock detection in DDBMS?
There are three common methods for deadlock detection in DDBMSs:
centralized, hierarchical and distributed deadlock detection.
15. What is the general approach for timestamping in DDBMS?
The general approach for timestamping in DDBMS is to use the concatenation of
the local timestamp with a unique identifier, <local timestamp, site identifier>. The site
identifier is placed in the least significant position to ensure that events can be ordered
according to their occurrence as opposed to their location.
16. What are the phases of 2PC protocol?
The two phases of 2PC protocol are:
a voting phase and
a decision phase.
17. Define cooperative termination protocol?
Cooperative termination protocol is defined as the termination protocol which
blocks the participant without any information. However the participant could
contact each of the other participants attempting to find one that knows the decision.
18. What is the use of election protocols?
If the participants detect the failure of the coordinator they can elect a new site to
act as coordinator by using election protocols. This protocol is relatively efficient.
19. Define 3PC?
The three phase commit is an alternative non blocking protocol. It is non blocking
for all site failures, except in the event of the failure of all sites. The basic idea of 3PC
is to remove the uncertainty period for participants that have voted COMMIT from the
coordinator. 3PC introduces a third phase, called pre-commit, between voting and the
global decision.
20. Define Distributed Query Processing?
The process of converting high level language query into low level language with
effective execution strategy depends in order to achieve good performance is called query
processing. In distributed query processing the query was distributed and processed in
different locations.
21. Write the differences between locking and non-locking protocols?
Locking protocol Non-locking protocol
1. In this Locking guarantees that the
concurrent execution is equivalent to some
serial execution of those transactions.
2. This involves checking for deadlock at
each local level and at the global level.
3. This protocol does not involve
generation of timestamps.
1. In this timestamping guarantees that the
concurrent execution is equivalent to a
specific serial execution of those
transactions, corresponding to the order of
the timestamps.
2. This does not involve checking for
deadlock at any level.
3. This protocol involves the generation of
unique timestamps both globally and locally
UNIT III
1.Define OODM?
OODM- Object Oriented Data Model
A (Logical)data Model that captures the semantics of objects supported in
object-oriented programming.
2. Define OODB?
OODB-Object Oriented Database
A persistent and sharable collection of objects defined by an OODM
3. Define OODMS?
OODBMS- Object Oriented Database Management System
OODBMS-The Manager of OODB
OO refers to abstract DB plus Inheritence & object identify.
It is the Combination OO capability and DB capability.
4 . What are the types of OID?
They are 2 types of OID
Logical OID
Physical OID
5. Define pointer swizzling or object faulting?
To achieve the required performance, the OOBMS must be able to convert
OID to and from in memory pointer. This conversion technique is known pointer
swizzing or object faulting.
6. What is the aim of pointer swizzling ?
The aim of pointer swizzling is to optimize access to objects. As we have
just mentioned, reference between objects are normally represented using OIDs.
7. List the classification of pointer swizzling ?
Classification or technique for pointer swizzling:
Copy vs in place swizzling
Eager vs lazy swizzling
Direct vs indirect swizzling
8. Define persistent object ?
The object that exist even after the session is over is called
Persistent object.
There is 2 types of objects
Persistent
Transient
9. Define transient object ?
The Transient object is defined as Lact only for the invocation of program.
The Object’s memory is allocated and Deallocated by the programming
language’s at the run-time system.
10. List the scheme for implementing persistence within OODBMS?
Persistent scheme
There are 3schemes for implementing persistence in OODBMS
Check pointing
Serialization
Explicit paging
11. List the two methods for creating or updating persistent objects using explicit paging?
Reachability based method
Allocation based method
12. What are the fundamental principles of orthogonal persistence ?
It is based on 3 fundamental principles
Persistence independence
Data type orthoganality
Transitive persistence
13. Define nested transaction model ?
A transaction is views as a collection of related subtransaction each of
which may also containany number of subtransaction.
14. Define sagas ?
A sequence of flat transaction that can be interleaved with other
transaction.
Sagas is based on the use of Compensative transaction.
DBMS guarantees that either all the transaction in a Sagas are
Sucessfully completed or compensative Transaction are run to
recover from partial exection.
15. How the Concurrency Control is implemented in OODBMS?
Concurrency control protocol is used in Multiversion control protocol.
Hence,by using this the concurrency is implemented.
16.List the basic architecture for client server DBMS?
3 basic architecture for client server DBMS is
Object Server
Page Server
Database Server
17. Define POSTGRES?
POSTGRES is the reaserch system designers of INGRES that attempts to extend
the relational mode with abstract datatype procedure and rules.
18.What is a GEMSTONE?
Gemstone is a product which extend an existing object-oriented
programming language with database capability.
It extend 3 Languague such as Smalltalk, C++ or Java.
19.What is OQL?
OQL – Object Query Languague
An OQL is a function that delivers an object whose type may be infrrred
from the operator contributing to the query expression.
OQL is Used for both associative and navigational access.
20. Advantage and Disadvantage of OODBMS?
Adv:
Enriched modeling capabilities
Extensibility
Removal of impedance mismatch
Improved performance
Disadv:
Lack of Universal Data model
Lack of Experience
Lack of standards
Complexity
UNIT IV
1.Define Data Mining.
The process of extracting valid, previously unknown comprehensible
and actionable information from large databases and using it to make crucial business
decisions.
2.List the different steps in data mining.
Data cleaning
Data integration
Data selection
Data transformation
Data mining
Pattern evaluation
Knowledge presentation
3.Define Classification.
It is used to establish a specific, predetermined class for each record
in a database from a finite set of possible class values.
4. Define Clustering.
Clustering can be considered the most important unsupervised learning problem
A cluster is therefore a collection of objects which are “similar” between them and are
“dissimilar” to the objects belonging to other clusters.
5.Define data warehousing.
A subject oriented, integrated, time variant and non volatile collection of data in
support of the management’s decision making process.
6.Define web database.
A database that is used for web applications that use an architecture called three
tier architecture. It has web browser,web server, database server.
7.Define mobile database.
A database that is portable and physically separate from a centralized database
server but is capable of communicating with that server from remote sites allowing the
sharing of corporate data.
8.Define upflow.
Upflow means adding value to the data in the datawarehouse through summarizing,
packaging and distribution of data.
9.Define downflow.
Downflow means archiving and backing up the data in the warehouse.
10.What are the different groups of end user access tools?
Reporting and query tools.
Application development tools.
Executive information system tools.
Online analytical processing tools.
Data mining tools.
11.What are the four main operations associated with data mining techniques.
1. Predictive modeling.
2. Database segmentation.
3. Link analysis.
4. Deviation detection.
12.Define outliers.
Outliers which express deviation from some previously known
expectations and norms.
13.List the benefits of data warehousing.
1. Potential high returns on investment.
2. Competitive advantage.
3. Increased productivity of corporate decision makers.
14.Define XML.
The basic object is XML in the XML document.Two main structuring concepts
are used to construct an XML document:elements and attributes.Attributes in XML
provide additional information that describes elements.
15.What are the uses of DTD?
DTD give an overview of XML schema. It specifies the elements and their nested
structures.
16.Define data mart.
Data marts generally are targeted to a subset of the organization, such as a
department and are more tightly focused.
17.Define client/server model.
Client server model is a two-tier architecture. It consists of 2 tiers namely client
and server. Here the client performs presentation service and the server performs data
service. The client is called fat-client because client require more resources.
18.List the use of data mining tools.
Data preparation.
Selection of data mining operation.
To provide scalability and improve performance.
Facilities for visualization of result.
19.Define OLAP.
OLAP is a term used to describe the analysis of complex data from the
datawarehouse.OLAP tools use distributed computing capabilities for analysis that
require more storage and processing power.
20.List the problems of data warehousing.
Project management is an important and challenging consideration
that should not be underestimated.
Administration of a data warehouse is an intensive enterprise,
Proportional to the size and complexity of the data warehouse.
21.List some examples of data mining application.
Marketing.
Finance.
Manufacturing.
Health care.
Unit-V
1.Define deductive database.
A deductive database includes capabilities to define (deductive) rules, which
can deduce or infer additional information from the facts that are stored in a database.
Because part of the theoretical foundation for some deductive database systems is
mathematical logic, such rules are often referred to as logic databases.
2.Define spatial database.
Spatial databases provide concepts for databases that keep track of objects in a
multi dimensional space.
3.Define multimedia database.
Multimedia provide features that allow users to store and query different types of
multimedia information, which includes images (such as photos or drawing), videoclips
(such as movies, newsreels, or home videos), audio clips (such as songs, phone messages,
or speeches), and documents (such as books or articles).
4.List the different spatial query language.
The different spatial query languages are
1. Range query
2. nearest neighbor query
3. Spatial joins or overlays.
5. Define inference engine.
An inference engine (or deductive mechanism) within the system can deduce new
facts from the database by interpreting these rules. The model used for deductive
databases is closely related to the relational data model, and particularly to the domain
relational calculus formalism. It is related to the field of logic programming and the
prolog language.
6.Example for spatial database.
Example for spatial database is cartographic databases that store maps
include two dimensional spatial descriptions of their objects - from countries and states to
rivers, cities, roads, seas and so on. These applications are also knowns as Geographical
Information Systems(GIS), and are used in areas such as environmental, emergency, and
battle management. Other databases, such as meterological databases for weather
information, are three dimensional, - since temperatures and other Meterological
information are related to three dimensional spatial points.
7. Define active database.
Active databases which provide additional functionality for specifying
active rules. These rules can be automatically triggered by events that occur, such as
database updates or certain times being reached, and can initiate certain actions that have
been specified in the rule declaration to occur if certain conditions are met.
8. Example for multimedia database.
For example, one may want to locate all video clips in a video database that
include a certain person, say Bill Clinton. One may also want to retrieve video clips
based on certain activities included in them, such as a video clips where a soccer goal is
scored by a certain player or team.
9. Define Quad trees.
Quad trees generally divide each space or subspace into equally sized areas, and
proceed with the subdivisions of each subspace to identify the positions of various
objects.
10. What are the two main methods of defining the truth values of predicates in actual
datalog programs?
There are two main methods of defining the truth values of predicates in actual
datalog programs that is,
1.Fact-defined predicates (or relations)
2. Rule-defined predicates (or views).
11. What is Fact-defined predicates?
Fact-defined predicates (or relations) are defined by listing all the combinations of
values (the tuples) that make the predicate true. These corresponds to base relations
whose contents are stored in a database system.
12. What is Rule-defined predicates?
Rule-defined predicates (or views) are defined by being the head of one or more
Datalog rules; they correspond to virtual relations whose contents can be inferred by the
inference engine.
13. What is the use of relational operations?
It is straightforward to specify many operations of the relational algebra in the
form of Datalog rules that define the result of applying these operations on the database
relations (fact predicates). This means that relational queries and views can easily be
specified in Datalog.
14. What are the characteristics of Nature of Multimedia Applications?
Applications may be categorized based on their data management characteristics
as follows:
1. Repository applications
2. Presentation applications
3. Collaborative work using multimedia information.
15. What are the terms included in multimedia information systems?
Multimedia Information Systems are complex, and embrace a large set of issues,
including the following:
1. Modeling
2. Design
3. Storage
4. Queries and retrieval
5. Performance
16. What are the different characteristics of Hypermedia links or hyperlinks?
1. Links can be specified with or without associated information, and they may
have large descriptions associated with them.
2. Links can start from a specific point within a node or from the whole node.
3. Links can be directional or nondirectional when they can be traversed in either
direction.
17. What are the applications of multimedia database?
1. Documents and records management
2. Knowledge dissemination
3. Education and training
4. Marketing, advertising, retailing, entertainment, and travel
5. Real-time and monitoring.
18. What are the three main possibilities for rule consideration?
There are the three main possibilities for rule consideration:
1. Immediate consideration
2. Deferred consideration
3. Detached consideration
19. What is Horn Clauses?
In Datalog, rules are expressed as a restricted form of clauses called Horn
Clauses, in which a clause can contain at most one positive literal.
20. What are the two alternatives for interpreting the theoretical meaning of rules?
There are two main alternatives for interpreting the theoretical meaning of rules:
1. Proof-theoretic
2. Model-theoretic.
16 MARKS
UNIT I
1. Explain the different phases in query processing.
Query processing are the activities involved in retrieving data from
database. The different phases involved in query processing are
i. Query decomposition
ii. Query optimization
iii. Code generation
iv. Runtime query execution
Query decomposition:
It transforms high level language into relational algebra expression and
check whether the query is syntactically correct or not.
The different stages of query processing are
Analysis
Normalization
Semantic analysis
Simplification
Query restructure
Query optimization:
Query optimization is the activity of choosing an efficient execution
strategy of processing a query. It is of two types:
1. Dynamic query optimization
2. Static Query optimization
Heuristical approach to query processing:
It uses transformation rules to convert one relational
algebra expression into an equivalent form that is more efficient.
Transformation rules for relational algebra operation
Heuristical processing strategy
2. Explain the heuristical approach to query optimization.
It uses transformation rules to convert one relational algebra
expression into an equivalent form that is more efficient.
Transformation rules for relational algebra operation
- write 12 rules
Heuristical processing strategy
- write 5 strategies
3. Explain the problems caused by concurrency control.
The process of managing simultaneous operations on the database without
having them to interfere with one another is called as concurrency control. The
problems caused by concurrency control are
iv. Lost update problem
v. Uncommitted dependency problem
vi. Inconsistent analysis problem
i. Lost update problem:
In this, successfully completed update operation by one
user can be overridden by another user.
ii. Uncommitted dependency problem(Dirty read problem):
It occurs when one transaction is allowed to see
intermediate results of another transaction before it has
committed.
iii. Inconsistent analysis problem:
It occurs when a transaction reads several values from
database but second transaction updates some of them
during the execution of first.
4. Explain the different steps of using locks and how concurrency control
problems can be prevented using 2PL?
Steps of using locks:
Any transaction that needs to access a data item must first lock the
item.
If the item is not already locked by another transaction, the lock
will be granted.
If the item is currently locked, the DBMS determines whether the
request is compatible with the existing lock.
A transaction continues to hold a lock until it explicitly releases it
either during execution or when it terminates.
Two-Phase Locking (2PL):
A transaction follows 2PL protocol if all locking operations
precede the first unlock operation in the transaction. The two
phases are
Growing phase
Shrinking phase
Preventing the lost update problem using 2PL
Preventing the uncommitted dependency problem using 2PL
Preventing the inconsistence analysis problem using 2PL
5. Explain the basic timestamp ordering protocol and Thomas write rule.
Timestamp is a unique identifier created by the DBMS that indicates the
relative starting time of a transaction.
Time stamping is a concurrency control protocol that orders transaction in such a way
that older transaction with smaller timestamp will get priority in the event of conflict. The
basic timestamp ordering protocol works as follows:
1. The transaction Tissues a read(x)
a) ts(T)<write_timestamp(x)
Transaction T is aborted restarted with new timestamp.
b) ts(T)>=write_timestamp(x)
Read operation can proceed and set
read_timestamp(x)=max(ts(T),read_timestamp(x))
2. The transaction Tissues a write(x)
a) ts(T)<read_timestamp(x)
Transaction T is roll backed and restarted with new timestamp.
b) ts(T)<write_timestamp(x)
Transaction T is roll backed and restarted with new timestamp.
c) Otherwise,
Write operation can proceed and set write_timestamp(x)=ts(T).
Thomas Write Rule:
a) ts(T)<read_timestamp(x)
Transaction T is roll backed and restarted with new timestamp.
b) ts(T)<write_timestamp(x)
Ignore the write operation.
c) Otherwise,
Write operation can proceed and set write_timestamp(x)=ts(T).
5. Explain 1NF, 2NF, 3NF, BCNF, 4NF, 5NF, 6NF with eg.
First Normal Form (1NF):
A relation in which the intersection of each row and column contains one
only one value.
Second Normal Form (2NF):
A relation that is in 1NF and every non-primary key attribute is fully
functionally dependent on the primary key.
Third Normal Form (3NF):
A relation that is in 1NF and 2NF, and in which no non-primary key
attribute is transitively dependent on the primary key.
Boyce-Codd Normal Form (BCNF):
A relation is in BCNF, if and only if, every determinant is a candidate key.
Fourth Normal Form (4NF):
A relation is in BCNF and contains no nontrivial multi-valued
dependencies.
Fifth Normal Form (5NF):
A relation that has no join dependency.
UNIT II
1.Explain the types of fragmentation.
Fragmentation definition – 3 Correctness rule – completeness – Reconstruction-
Disjointness – 4 types - Horizontal fragmentation explanation with eg – Vertical
fragmentation explanation with eg– Mixed fragmentation explanation with eg – Derived
fragmentation explanation with eg.
2.Explain the different types of locking protocols in DDBMS.
Locking protocols definition – Ensure serializability – 4 types of locking
protocols – Centralized – lock manager is centralized – 5 messages - adv and disadv –
Primary copy 2 PL – Many lock managers are available – adv and disadv – Distributed 2
PL – Lock manager is distributed in every site - adv and disadv – Majority locking- adv
and disadv.
3.Explain the distributed deadlock management.
Distributed deadlock management definition – transaction waiting for another
transaction – Wait For Graph(WFG) – Local Wait For Graph – Combined Wait For
Graph – Handling deadlocks – Centralized definition with advantage and disadvantage
– Hierarchical definition with advantage and disadvantage - Distributed definition with
advantage and disadvantage.
4.Explain the reference architecture for DDBMS and Component architecture for
DDBMS
Reference Architecture
Diagram - Global external schema, fragmentation schema, allocation schema,
global conceptual schema, local mapping schema
Component Architecture
Diagram – 4 major components – Local DBMS (LDBMS) – Data
Communication Components(DC) – Global System Catalog – Distributed DBMS
5.Explain the phases of 2 PC.
2 PC – Blocking protocol – co ordinator and participant definition – Two phases
of 2 PC – Voting phase – explanation - Decision phase – explanation- Procedure for
coordinator and procedure for participants
7.Give notes on Distributed Transaction Management
Distributed transaction management definition – Modules – Transaction manager,
Scheduler, Recovery Manager, Buffer Manager – Locking Manager - Data
Communication component – Procedure to execute a Global transaction initiated at sight
S1.- Distributed concurrency control – Concurrency control problem – Distributed
serializability.
8.Explain about 3 PC
3 PC definition – Non blocking protocol – Coordinator – states of coordinator –
Initial, Waiting, Decided – Participant – states of participant – Initial, Prepared,
Precommit, abort, commit.
UNIT III
1.Explain the schemas for implementing persistance?
*dbms must provide for the storage of persistant objects
*there are 3 schemas for implementing persistance
*they are
-check pointing
-serialization
-explicit paging
check pointing:
*copy all the part of the program's address space to secondary storage
*complete address space is saved and the program can be restarted from the check
point.
*in other cases only the program's heap can be saved
drawbacks:
*can be used only the program that created
*large amount of data is not used
serialization:
*implements persistance by copying the closure of the data items to the disk
*reading back this flattened data structure produces new copy of the original data.
*called seialization or pickling or in a didtributed computing context marshalling.
drawbacks:
*does not preserve object identity
*it is not incremental
explicit paging:
*involves the application programmer explicitly paging objects between
application heap and the persistant store
*reachability-based persistance means that an object will persist if its reachable
from a persistant root object
*programmer does not need to decide at object creation time whether the object
should be persistant
*after creation an object can become persistant by adding it to the reachability
tree
*allocation based persistance means that an object is made persistance only if it is
explicitly declared as such within the application program
*byclass -Aclass is statically declared to be persistant and all the instances ofthe
class are made persistant wen they are created
*alternatively a class may be a sub class of a system supplied persistant class.
2.Explain the classification of pointer swizzling techniques?
*The action of converting object identifiers to main memory pointers,and back
again
*the aim of pointer swizzling is to optimize access to objects
*if we read an object from secondary storage intothe data base cache,we should
be able to locate any referenced objects on data *storage using their oids
*want to record those objects held in main memory pointers
*pointer swizzling attempts to provide a more efficient strategy by storing the
main memory pointers
no swizzling:
earliest implementation of faulting objects
objects are faulted into memory by the underlying objects manager
a handle is passed back to the application
the system maintain some sort of lookup tables so that the object virtual memory
pointers can be located and then used to access the object
application tends to access the object only once
moss proposed the analytical model of avaluating tha conditions in which
swizzling is appropriate
classification of pointer swizzling:
can be classified according to three dimensions
copy vs inplace swizzling
eager vs lazy swizzling
direct vs indirect swizzling
copy vs inplace swizzling:
data can either be copied into the application's local object cache or it can be
accessed in place within the object manager's database cache.
only modified objecs have to be swizzled back o their oids
in inplace technique maqy have to unswizzle an entire page of an object if one
object on the page is modified
with the copy approach,every object may be exxplicitly copied into the local
object cache.
eager vs lazy swizzling:
Moss and Eliot define eager swizzling
all data pages used by the application before any objects can be accessed
kemper and kossman provide a more relaxed definition
restricting the swizzling to all persistant oids within the object application wishes
to access
lazy swizzling involves less over head when an object is faulted into memory, but
it does mean that two different types of pointer must be handled for every object
access
direct vs indirect swizzling:
possible only for a swizzled pointer to refer an object that is no longer in virtual
memory
the virtual memory pointer of the referenced object is placed directly in the
swizzled pointer
in indirect swizzzling the virtual memory pointer is placed in an intermediate
object which act as a place holder for the actual object
3.Explain the locking protocols?
Centrlalized 2PL:
* with the centralized 2PL protocol there is a single site that maintains all locking
informaton
* there is only one scheduler, or lock manager, for the whole of the distributed DBMS
that can grand and release locks.
* the transaction coordinator at site S1 divides the transaction into a number of
subtransactions,using tinformation held in the global system catalog
* the coordinator has responsibility for ensuring that consistencyis maintained
* if the data item is replicated,the coordinator must ensure that all copies of the data
item are updated
* thus the coordinator requests exclusively locks on all copies
* the local transactions managers involved in the global transaction request and release
locks
* the advantage of centralized 2PL is that the implementation is straightforward
* deadlock detecion is nomoredifficult than that of a centralized DBMS
* the disadvantages with centralization in a distributed DBMS are bottlenecks and
lower reliability
* for example,a global update operation that has agents (subtransactions) at n sites may
require a minimum of 2n+3 messages with a centralized lock manager:
-1 lock request;
-1 lock grant message;
-n update messages;
-n acknowledgements;
-1 unlock request.
Primary copy 2PL
* the lock managers to a number of sites
* each lock manager is then responsible for managing the locks for a set of data items
* for each replicated data item one copy is chosen as the primary copy
* the other copies are called slave copies
* choose as the primary site is flexible
* and the site that is choosen to manage the locks for a primary copy need not hold the
primary copy of the item
* the protocol is a straightforward extension of centrlized 2PL
* in order to send the lock request to the appropriate lock manager
* reading out of-date values
* the disadvantages of this approach are that dead lock handling is more complex
* lock requests for a specific primary copy can be handled only by one site
* backup sites to hold locking information
* lower communication costs and better performance
Distributed 2PL
* othewise distributed 2PL implements a Read-One-Write-All(ROWA) replica controll
protocol
* deadlock handling is more complex
-n lock requests messages
-n lock grant messages
-n update messages
-n acknowledgements
-n unlock requests
Majority Locking
* lock all copies of a replicated item before an update
* the disadvantages are that the protocol is more complicated
it recieves a ma jority tha it ha srecieve a lock and inform all the sites that it has recieved
the lock there should be atleast (n+1)/2 messages for lock request and (n+1)/2 messages
for unlock request
4.Explain the strategies for developing ooadbms
*extend ab existing object oriented programming language with database capabilities
add traditional database capabilities languages used are small talk,c++ or java this
approach is taken by gemstone
*provide extendible object oriented dbms libraries class libaries are provided that support
persistance,aggregation,datatypes,transactions,concurrency approach is taken by ontos
versant and object store
*embed object oriented database lanuage constructs in a conventional host language
how sql can be embedded in conventional host language o2 is used
*extend an existing database language with object oriented capabilities this approach is
being pursued by both rdbms and oodbms vendors ontos and versant is used
*develop a novel database datamodel/data language starts from the beginning and
develop an entirly new database approach is taken by sim(semantic information
manager)
5) Give notes on (i)nested transaction model (ii)sagas transaction model (iii)multilevel
transaction model
(i)nested transaction model:-
Introduced by moss
The completed transaction is depicted as a tree or a hierarchy of
subtransaction
Top level transaction can have no. of child transaction and child
transaction can also have nested transaction
Eg:-
transaction has to commit from bottumup
Transaction abort at one level doesnot affect the transaction progress at
a higher level
Instead a parent is allowed to perform its own recovery
Different ways for recovery:-
abort the transaction
ignore the failure in which case subtransaction is non vital
retry the subtransaction
run an alternative subtransaction called contingency or compensative
subtransaction
Advantages:-
modularity
granularity
intra transaction parallelism
intra transaction recovery
(ii)sagas:-
a sequence of flat transaction that can be interleaved with other
transaction
sagas is bassed on the use of compensative transaction
dbms gurantees that either all the transactions in the sagas are
successfully completed or compensating transaction are run to
recover from partial execution
if we have a saga comprising of a sequence of n transaction
T1,T2.....Tn with the compensating transaction c1,c2,....cn
final outcome may be T1,T2...Tn if transaction completes
successfully or T1,T2...Ti,Ci-1,C2,C1 if fails
relaxes isolation property
difficult to define compensative transaction in advance
(iii)multilevel transaction model:-
2 types
(i)closed nested transaction
(ii)open nested transaction
6)Explain ODMG model?
ODMG-object data management group
Superset of OM(object model) which enables both design and
implementation to be ported between the complaint system
Basic modelling primitives:-
(i)objects
(ii)literals
(i)objects:-
Described by 4 charcteristics
(i)object structure
(ii)object identifier
(iii)object name
(iv)object lifetime
(ii)literals:-
Decomposed into
(i)atomic
(ii)collections
(iii)structured
(iv)null
Two types of collection (i) ordered (ii)unordered
5 different builtin collection types
(i)set
(ii)bag
(iii)list
(iv)array
(v)dictionary
7)Explain the features of postgres?
Postgres is a research database system designed to be a successor to the ingres
RDBMS
Objectives:-
(i)to provide better support for complex objects
(ii)to provide user extensibility for data types,operators & access methods
(iii) to provide active db facilities and inferencing support
(iv)to simplify the dbms code for cash recovery
(v)to make as few changes as possible to the relational model
Postgres extends the relational model to include the following mechanism
(i)abstract datatypes
(ii)data of type procedure
(iii)rules
GEMSTONE:
Single database process monitor process or stone
Gem process incorporate a data management kernel and interprete for
ODAL
Gemstone configuration may include multiple hosts in the network and
client simultaneously database on multiple host
User interface on each gem process is provided by a program running as a
separate process on the host machine
Aspecial purpose Gemstone interface the OPAC programming
environment is provided for application development
The Gemstone architecture is a client-server architecture
The database allocates object identifiers and performs transaction
FEATURES:
GemStone is a highly scalable client-multiserver database for commercial
applications.
GemStone's features include:
Server Smalltalk
Concurrent Support for Multiple Languages
Flexible multi-user transaction control
Object-level security
Dynamic schema and object evolution
Production Services
Scalability
Legacy Gateways
Developer Tools
Database Administration Tools
MANAGEMENT FUNCTIONS:
Gemstone sessions access the database directly,thus eliminating a
potential performance bottleneck
It provide facilities necessary to allow external application programs
written in c or smalltalk to access Gemstone
(Gemstone objectNamed:’objectname’)as local object
LocalBolt<-(Gemstone objectNamed:’Bolt’)as local object
LocalBolt remoteperform: #PortName localBolt GemstonepartName
The first message has selector remote perform with the literal part name.
UNIT – IV
1.Explain the operation associated with data mining techniques?
There are 4 main operations associated with data mining.They are:
Predictive Modeling
Database Segmentation
Link Analysis
Deviation Detection
Predictive Modeling:
The model is developed using Supervised learning approach. There are 2
phases:
i) Training
ii) Testing
There are 2 steps associated with predictive modeling:
i) Classification
ii) Value prediction
Classification:
This is used to establish a specific,predetermined class for each record in a database from
a finite set of possible claa values.
2 specifications of classification
i) Tree induction
ii) Neural induction
Tree induction:
Creates decision tree that predicts customers who have rented for more than 2 years
and are over 25 years old.
Neural induction:
Constructing neural network contains a collection of nodes input,processing,output.
Value Prediction:
Used to estimate a continuous numerical value that is associated with the database
records.
2 techniques:
i) Linear regression
ii) Non linear regression
Data Segmentation:
Partition a database inro an unknown number of segments or clusters of similar
records.
Link analysis:
There are 3 specialization of link analysis,
i) Associaton Discovery
ii) Sequential pattern discovery
iii) Time sequence discovery
Deviation Detection:
This is a source of truediscovery because it identifies outliers.
2.Describe DataWareHouse architecture.
The components are:
Operational data:
Source of he data for the data warehouse is supplied from mainframe operational
data.
Mainframe operational source:
Held in hierarchial and network database.
Departmental operational data:
The data that is available in relational DBMS.
Private data:
Data available in private database.
External System:
Data held in external system.Eg.Internet
Operatonal DataStore(ODS):
Data from different sources are stored in ODS which is mainly used for
analysis the data.
Load Manager:
It get the data directly from operational data sources or from ODS.
Warehouse Manager:
Managing the datawarehouse.Management of data in the datawarehouse.
Query Manager:
Management of user queries.Query profiles are used to determine which indexes
and aggregation are appropriate.
Detailed Data:
This area of warehouse stores all the detailed data in the database schema.
Purpose of summary information:
It is to speed up the performance of query.
Archive/Backupdata:
Both detailed and summarized data are stored in back updata that is used for
recovery.
Metadata:
The area of warehouse stores all the metadata ie) data about data.
End user access tools:
Users interact with the database house using end user access tools.
Datawarehouse and DataFlows:
There are 5 preimary data flows:
Inflow
Outflow
Upflow
Downflow
Metaflow
3.Explain Mobile DataBase with Diagram.
A database that is portable and physically separate from a centralised database
server but is capable of communicating with that server from remote sites allowing the
sharing of corporate data.
Architecture of Mobile Database Environment:
Components:
1) Corporated database server and DBMS:
Manages and stores corporate data and provides corporate application.
2) Remoted database and DBMS:
Manages and stores mobile data and provide mobile application.
3) Mobile db platform:
Contains laptops,PDA, or other internet access devices.
2 way communication link between corporate and mobile DBMS
Issues associated with mobile database:
i) Management of mobile database
ii) Communicaton between mobile and corporate database
Additional functionality required by mobile DBMS:
1) Ability to communicate with centralized db server through modes such as wireless.
2) Ability to replicate data available on mobile device or centralized server.
3) Synchronize data on the centralized db server and mobile device.
4) Capture data from different sources
5) Manage data on the mobile device
6) Analyse the data on mobile device
7) Create customized mobile application
4.Explain Web DBMS Architecture.
Web is developed by Cern in1990.
Hyper links are used for moving from one page to another.Website is a collection of
webpages.
Web as a database application platform:
Integrate web with dbms.
Requirements for web dbms integration:
1) Security
2) Independent connectivity
3) Ability to interface to the database
4) A connectivity solution takes advantage of all the features of an organisations
DBMS
5) Open architecture support
6) It must provide scalability
7) Multiple HTTP request
8) Support for session and application based authentication
9) Acceptable performance
10) Minimal administration overhead
Web DBMS Architecture:
Traditionally it uses 2-tier client-server architecture.
Where,
Tier1-client
Tier2-server
Task performed by client is Presentation Service.
Task performed by server is data Service.
Two problems in 2-tier:
i) A fat client
ii) A significant client site administration overhead.
3-tier Architecture:
There are 3 layers.
Tier1-client
Tier2-business logic and data processing layer
Tier3-DBMS
Client is known as Thin client
Applicaton layer-performs the main processing.
Advantage of 3-tier architecture:
1) Less expensive hardware
2) Application maintenance is centralized
3) Load balancing is easier
3-tier architecture can be extended to n-tier architecture.Applicaton layer is divided into
i) application server ii) Web server
5.Approaches to integrate web and DBMS.
1) Scripting languages such as VB script and java script
2) Using CGI
3) Using HTTP cookies
4) Extensions to web server such as Ntescape API,Microsoft Internet
Information Server API
5) Java and JDBC,SQLJ,Servlets,JSP
6) Microsofts web solution platform with ASP and ActiveX data objects
7) Oracles internal platform
UNIT V
1.Explain the deductive database concepts.
Deductive database:
A deductive database includes capabilities to define (deductive) rules, which
can deduce or infer additional information from the facts that are stored in a database.
Because part of the theoretical foundation for some deductive database systems is
mathematical logic, such rules are often referred to as logic databases.
Inference engine:
An inference engine (or deductive mechanism) within the system can deduce new
facts from the database by interpreting these rules. The model used for deductive
databases is closely related to the relational data model, and particularly to the domain
relational calculus formalism. It is related to the field of logic programming and the
prolog language.
Horn Clauses
In Datalog, rules are expressed as a restricted form of clauses called Horn
Clauses, in which a clause can contain at most one positive literal.
Use of relational operations:
It is straightforward to specify many operations of the relational algebra in the
form of Datalog rules that define the result of applying these operations on the database
relations (fact predicates). This means that relational queries and views can easily be
specified in Datalog.
2.Explain multimedia database.
Multimedia database:
Multimedia provide features that allow users to store and query different types of
multimedia information, which includes images (such as photos or drawing), videoclips
(such as movies, newsreels, or home videos), audio clips (such as songs, phone messages,
or speeches), and documents (such as books or articles).
Multimedia Information Systems are complex, and embrace a large set of issues,
including the following:
1. Modeling
2. Design
3. Storage
4. Queries and retrieval
5. Performance
Applications of multimedia database:
1. Documents and records management
2. Knowledge dissemination
3. Education and training
4. Marketing, advertising, retailing, entertainment, and travel
5. Real-time and monitoring.
3. Explain in detail about spatial database.
Spatial database:
Spatial databases provide concepts for databases that keep track of objects in a
multi dimensional space.
Different spatial query language.
The different spatial query languages are
1. Range query
2. nearest neighbor query
Quad trees:
Quad trees generally divide each space or subspace into equally sized areas, and proceed
with the subdivisions of each subspace to identify the positions of various objects.
4.Explain in detail about active databases.
Active databases:
Active databases which provide additional functionality for specifying active
rules. These rules can be automatically triggered by events that occur, such as database
updates or certain times being reached, and can initiate certain actions that have been
specified in the rule declaration to occur if certain conditions are met.
There are the three main possibilities for rule consideration:
1. Immediate consideration
2. Deferred consideration
3. Detached consideration
Triggers- A technique for specifying certain types of active rules.