Section 05Concepts Of DBMS1 HSQ - DATABASES & SQL And Franchise Colleges 05s Concepts of DBMS By...

Section 05 Concepts Of DBMS 1

HSQ - DATABASES & SQL

And Franchise Colleges

05s Concepts of DBMSBy MANSHA NAWAZ


Distributed Database Management Systems (DDBMS)

• Multiple physically connected sites, where users can access the data from another site.

• A logically interrelated collection of shared data (and a description of this data), physically distributed over a computer network.

• Distributed Databases (2 site example)

Local DDBMS

Local DDBMS

System Messages

Data Exchange


• In a true distributed database, the data itself is located on more than one machine.

• There are various possible approaches, depending on the needs of the application and the degree of emphasis placed on central control versus local autonomy.

• In general, organisations may wish to:

– Reduce data communications costs by putting data at the location where it is most often used,

– Aggregate information from different sources,

– Provide a more robust system (e.g. when one node goes down the others continue working),

– Build in extra security by maintaining copies of the database at different sites.


• Distributed Database systems are not always designed that way originally.

• Traditional systems may develop into Distributed Database Systems as organisational needs become apparent.

• One approach is for a complete central database to be maintained and updated in the normal way.

– Local copies (in whole or part) are sent periodically to remote sites, to be used for fast and cheap retrieval (daily).

– Any local updates have no effect on the central database.

– This approach is only effective consistency between all copies of the database at all times is not crucial.


Types of DDBMS

• Homogeneous: All sites use the same DBMS product.

• Heterogeneous: Distributed database development may involve the linking together of previously separate systems, perhaps running on different machine architectures with different software packages.

– Individual sites manage and update their own databases for standard operational applications, but that information is collected and aggregated for higher-level decision support functions.

– In this case there is no single location where the whole database is stored; it is genuinely split over two or more sites.

• Homogeneous DDBMS are generally based on a Relational Database Management Systems.


DDBMS Overview

• A collection of logically related data.

• The data is split into a number of fragments.

• Fragments may be replicated.

• Fragments / Replicas are allocated to sites.

• Sites are linked by a communications network.

• The data at each site is under the control of a DBMS.

• The DBMS at each site can handle local applications, autonomously.

• Each DBMS participates in at least one global application.


Advantages of a DDBMS Approach

• Organisational structure

– Many organisations are naturally distributed over several locations.

• Shareability and local autonomy

– The geographical distribution of an organisation can be reflected in the distribution of the data.

• Improved availability

– Local data is kept locally - some local technical support required.

• Improved reliability

– The failure of a node or a communication link does not necessarily make the data inaccessible.

• Improved performance

– Local data storage is faster - total storage can be greater.

– Distributed transactions may be faster - a complex issue.

– Less contention for centralised CPU and I/O.

• Modular growth

– It is more easier to handle expansion.


Disadvantages of a DDBMS Approach

• Complexity - more complex than a centralised DBMS

• Cost - Increased complexity means the costs for a DDBMS will be higher

– Experienced staff also required.

• Security issues.

– Security becomes more important and complex

• Integrity control more difficult

– Database integrity refers to the validity and consistency of stored data

• Lack of standards - No standard for DDBMSs available

• Lack of experience - finding experienced staff is difficult.

• Database design more complex - Fragmentation, allocation, and data replication, etc.

• However, to date, general-purpose distributed DBMSs have not been widely accepted


• User sees the system at conceptual level as if it is physically and logically centralised

Global View(Global Schema)

Site A Site B Site C Site D

User viewpoint of a DDBMS


• DDBMSs aim to support:

– Location transparency

– Fragmentation Transparency• Horizontal• Vertical

– Replication Transparency

Replication and Fragmentation overview.


Replication

• Copies of tables (or fragments) duplicated at a number of sites.– The DRDBMS keeps data consistent between sites.

• Increase availability / parallelism.– Parallelism involves complex query optimisation.

• Reduction in data movement and thus comms costs.– Local data stays local.– Large local ‘read only’ transactions are more efficient.

• Increased resilience to failure.– If one site fails the data at that site can be available in a replica on another site.

• Problems of integrity / concurrency etc.– DRDBMS provide facilities to support this aspect.


Horizontal fragmentation• Relations are split into a number of row-subset relations.

– Local sites have their own rows.– Fragments can be replicas of data from other sites.

• The original relation can be re-constructed by the relational union operation.

• Increased localisation of data through horizontal fragmentation.

• Queries that require rows from may fragments (at many sites) are handled transparently by the DRDBMS.

An example would be horizontal fragmentation of an ORDER table fragmented such that rows physically resided at the branch that generated them.

• Queries like .. SELECT * FROM ORDER WHERE date > ’11-OCT-2000”; do not require the user to know where the data actually resides or which fragment / replica is used.


Vertical Fragmentation.

• Tables are vertically split, the resulting tables containing a subset of attributes. (Relational Projection etc.)

• The original relation can be reconstructed by use of the relational join operator.

• A simple example:– using the table SALE from the Winsor & Allsthop Conservatories scenario.– A vertically fragmented replica of all the SALEs is placed at the head office

limited to (sale#, model#,branch#). – Horizontally fragmented sections of the SALE table, including all other

attributes, are kept at the appropriate branches.

• Provides increased localisation of data through vertical fragmentation (and horizontal in the example above).

• Queries that require rows from may fragments (at many sites) are handled transparently by the DRDBMS - the user see a global database only.


More Disadvantages?

• Communication and transfer of data can slow down response rate– 3 sites: A, B & C

– A needs to join 10,000 records at A with 5 at B

• Possible approaches:– Send 10,000 rows to B, join there and send result to A

– Send 5 to A, join there!!

– Send 10,000 from A to C, 5 from B to C, join there, and send result to A• Thus using the CPU power of C if A & B are busy.

• Difference of 1 second vs. ‘a long time’.

• Needs an SQL intelligent optimiser!– Theories of optimising parallel query processing are a favourite research topic. (C.J.

Date et al)


• Primary copy approach

• Concurrency over sites– Global deadlock problem

Ta Tb

Td Tc

Site A

Site B

• A DRBMS must provide a distributed concurrency control mechanism.

Updates with Replication – site fails?


Recovery - brief overview

• Multiple updates and aborts– A DRBMS must provide a distributed recovery control mechanism.

• 2 phase commit used (commit locally, commit globally)


Database Optimisation and Tuning

• Optimisation and Tuning • DBMS Front end features• Database tuning involves ensuring that the database is configured so that it

performs at maximum speed for all applications. In practice this is difficult to achieve because different application programs may have conflicting needs. Further, it is common for databases to be multi-user and so many different applications may be accessing a database at the same time. In the case of a single user database the problems are generally less severe as often only one application will be running at a time. The software has the whole resources of the computer and often such systems only have a limited amount of data to deal with and thus perform quickly anyway.

• Database tuning becomes ever more important as the volume of data in tables grows. For example, if a system has an order table with 500 orders, then searching for a single order will be simple as the whole order table will often be loaded into main memory making sophisticated search techniques largely unnecessary. However, if a multi-user has 400,000 orders stored and up to 40 users (telephone sales operators) using this data then the problem is quite different.


Indexes

• Database indexes are the key method of speeding up database access. In relational databases rows are not stored in any particular order. Thus if a customer table has 80,000 rows and a telesales operator wants to see the account of a customer called 'SMITH' then there are 80,000 rows to search.

SELECT *

FROM customer

WHERE cust_name = 'SMITH';

• There would perhaps be many 'SMITH's returned, every row would need to be searched to find the target rows.

• Creating a secondary index on the attribute 'cust_name' would make a major difference to the speed of this query.

CREATE INDEX nameidx

ON customer (cust_name);

• The index is called a secondary index to distinguish it from a primary index. A primary index is similar but ensures that all values indexed are unique. e.g.

CREATE UNIQUE INDEX cust_primary_idx

ON customer (cust_number);


Indexing Technology• Modern Relational Database Management Systems use powerful indexing

routines generally making use of B+Tree technology. The speed and power of indexing systems in a highly important aspect of developing a competitive RDBMS product. The B+Tree index is fast and flexible. It is excellent at finding exact targets such as 'SMITH' but is also good at finding the results of range queries. For example, finding all the customers whose name starts with 'S‘ SELECT *

FROM customer

WHERE name LIKE 'S%‘

• Use of indexes also provides a more consistent response time for queries. The time to find a particular target row is not dependent on its position in the table. The response time is dependent mainly on the depth of the index and all queries have to navigate the full depth of the index. Thus response time is more even than when, for example, one query into an non-indexed table finds its target in the first of 80,000 rows and the next finds its target in the last of 80,000 rows.

Response Times


Disadvantages of Indexing

• One problem of creating indexes to improve performance of data manipulation queries (SELECT) is that each index is itself often a huge file of information. This file is usually hidden from the users but does consume a lot of space. Further, inserting and deleting new rows in a database table results in the indexes on that table all requiring to be updated. If every attribute and useful combination of attributes in a table has a separate index then the overhead of insertion and deletion will be very large.

General Tuning Approaches• A useful approach to database tuning is to implement the following indexes. This

approach does require analysis of probable query types. However, this is normally done as part of Systems Analysis and Design techniques during the development of software.

• Indexes on:– Primary keys - Primary Indexes– Foreign keys (to speed joining tables) - secondary indexes.– Attributes that are frequently queried yield a small set of rows - secondary indexes.– Attributes that are frequently used for displaying data in a sorted output - secondary indexes.


Other Tuning Approaches• Hashing• Another method of storing data in a way in which it can be found quickly is HASHING. This method works

by taking the data, say a customer number, and applying some mathematical formula to the value.

• The outcome of this formula is than used as a physical address in a file for the location where this record will be stored. So the record for the Customer with the code 'C99762' is stored in a calculated position, to retrieve the record the system runs the calculation again to find where it originally put the record.

• Clustering• In this method the Database Administrator uses DBMS facilities to physically sort database tables based

on likely access. Thus for an ORDER table, perhaps it is useful to physically group all the rows for each CUSTOMER together. Within that grouping ORDERS might also be sorted into date sequence if this is how they are normally retrieved. This also means that simpler, and thus faster, indexing techniques can be used. Each subset of rows that need to be stored together is called a 'cluster'.

• Clustering can beyond this approach, depending on the facilities provided by the DBMS. For example, each cluster of ORDERS belongs to a single CUSTOMER but each order is associated with a number of ORDER LINES in the ORDER LINE table.

• A more powerful clustering approach is to store the ORDER rows for each customer together followed physically by the appropriate ORDER LINES. The ORDER LINES would also be appropriately physically sorted. Thus when SQL queries require the joining of ORDERS and ORDER LINES for a customer then all the data is quickly available.

• Some Database Management Systems allow you to go even further and interleave, for example, individual ORDERS and matching ORDER LINES.


Disadvantages of Physical Clustering

• Any kind of physical clustering is difficult to maintain. As new data is entered it will need physical clustering and that may involve significant database reorganisation. Often a DBMS will offer facilities to optimise clustering during quiet periods (over night etc.).

• Clustering is a powerful way of tuning database performance but will only be useful if a particular type of query that matches the clustering is the overwhelmingly dominant query for this data.

• Would this approach work for the databases supporting ATM's?


Client Server Systems

Server

DBMS

Data

Network Connection

Client 1

Client 2

Client 3

Client 4

App 1

App 2

App 3

App 4

etc.

Typical Client / Server Architecture

• Databases are by their very nature a shared resource. So far we have only made use of single user systems. However, databases in a commercial environment are nearly always a shared resource with, perhaps, many users adding, editing and deleting data. In this situation client/server systems are very effective.


Client

• The client is a machine that provides a 'Front End' to the database. The front end is used to provide a suitable user interface for the users. The front end software might be written in JAVA or Visual BASIC (etc.) perhaps or it might be an SQL interpreter, a report generator or a full database tool like Microsoft Access.

• Thus the front end may be a specially written application written in some language. It also could be a more general purpose interface allowing users to access a remote data base but to configure software (for example write queries) on the local machine.


Database Server

• The database server is also connected to the network and is referred to as the 'Back End'. The back end Database Management System is installed on the server. This does not have to be, in the case of a MS Access front end, any particular Database Management System although Microsoft produce SQL Server for this purpose. Generally any high power DBMS can be configured to operate as the back end. ORACLE is a common choice due to its high performance and depth of technical resource.

• The back end server is generally a dedicated database machine. This implies that it will have far greater data bandwidth than a typical PC. This means that its ability to transfer data to and from its hard disc system is very fast.

• Ultra fast hard disc access is expensive especially if it is needed on many small desktop machines. By placing the database on a specialist machine the technology used can be far more appropriate to the needs of a busy DBMS.


Database Server cont..

• DBMS servers typically have:

• Unix operating systems (not essential)• Very fast disc interfaces• Large main memory• Large dedicated hardware disc cache memory• Built in fast backup facilities• Un-interruptible power supplies (UPS)• Expert management

• To provide these facilities on individual desktop systems would be far more expensive. The other main advantage of the back end database server approach is that it naturally places all the data in a single location making data sharing easier.

• Another advantage is that the huge load handling a large volume of data and the processing requirements of running a complex database management system are removed from the local desktop machines. Further the requirement for relentless back up of data is moved from many machines to one where automatic fast and reliable backup technology can be used.

• Losing commercially sensitive data is a potential disaster for most companies - for example what would happen to a large mail order company that lost details of all current orders, deliveries and outstanding accounts


The following table summarises client/server functions.

Database Client - Desktop PC Database Server

Manages User Interface Accepts database requests (SQL)

Accepts (& validates) user data Processes database requests: Performs integrity checks Handles concurrent access Optimises SQL queries Performs security checks (user access) Provides database recovery from system failures

(crashes)

Processes application program logic

Generates database requests (SQL)

Transmits requests (SQL) to the server

Receives results from server

Formats and displays results according to application software (could be tables, reports or graphical output

May import data into local system for local processing Transmits results of database requests to client

Database physical optimisation

Provides statistical information on database

May import data from foreign systems

Provides facilities for database administrator to optimise and tune database access performance


Summary: Distributed Databases

• Usually homogeneous & relational

• Advantages & Disadvantages (many of both!)

• Transparency: Location, Fragmentation, Replication

Further Client / Server Configurations

• More complex systems are possible where there are several database servers. These may be in different geographic locations and the connection of the network may include elements of both Local Area and Wide Area networks (internet).


End of Lecture

Date post:	01-Jan-2016
Category:	Documents
Upload:	roderick-booker
View:	221 times
Download:	1 times

Section 05Concepts Of DBMS1 HSQ - DATABASES & SQL And Franchise Colleges 05s Concepts of DBMS By...

Documents