Date post: | 17-Jun-2015 |
Category: |
Documents |
Upload: | mangesh-wanjari |
View: | 736 times |
Download: | 9 times |
Lecture December 21, 2011
Distributed Database Management System
By
Mangesh R. WanjariAsst. Professor, Department of CSE
Shri Ramdeobaba College of Engineering and Management, Nagpur
Distributed Database Systems 2Wednesday, Dcember 21, 2011
Evolution of DDBMS
Decentralized database management systems (DDBMS) - Interconnected computer systems- Data/processing functions reside on multiple sites
1970’s: Centralized DBMS1980’s: Social and Technical Changes
- Ad hoc capability required- Decentralized management structure common
1990’s: New forces- Computational capacity of Personal Computers- Internet and the World Wide Web used for data access
and distribution- Data analysis through data mining and data warehousing
Wednesday, Dcember 21, 2011 3
Overview
• What and why?• The Distributed Database Management Systems• The Reference Architecture for Distributed Databases• Data Fragmentation, • Distributed Transparency • Distributed Database Design
Distributed Database Systems
Wednesday, Dcember 21, 2011 4
Definition• A distributed database (DDB) is a collection of
multiple, logically interrelated databases distributed over a computer network.
• A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users.
• Distributed database system (DDBS) = DDB + D–DBMS
Distributed Database Systems
Distributed Database Systems 5Wednesday, Dcember 21, 2011
Features of Distributed Versus Centralized Databases
• Centralized Control• Data Independence• Reduction in Redundancy• Complex Physical Structures and efficient
access• Integrity, Recovery, and Concurrency control• Privacy and Security
Distributed Database Systems 6Wednesday, Dcember 21, 2011
Why Distributed Databases
• Organizational and economic reasons• Interconnection of existing databases• Incremental growth• Reduced communication overhead• Performance considerations• Reliability and availability
Wednesday, Dcember 21, 2011 7
Overview
• What and why?• The Distributed Database Management Systems• The Reference Architecture for Distributed Databases• Data Fragmentation, • Distributed Transparency • Distributed Database Design
Distributed Database Systems
Wednesday, Dcember 21, 2011 8
Introduction
• The traditional database approach keeps all data centrally and then accesses them mostly in a client server model
• in a distributed database system data are distributed over site geographically
• Say for example there are four branches of a bank at different sites
• There will be two types of transaction, one is local transaction and the other is global transaction
• In global transaction case the program has to access data over site, which needs much attention, such as, transaction over network, speed, efficient access, integrity, recovery, concurrency control, privacy, security and a lot of things
Distributed Database Systems
Distributed Database Systems 9Wednesday, Dcember 21, 2011
Centralized Database Management System
Distributed Database Systems 10Wednesday, Dcember 21, 2011
Distributed Processing Environment
Distributed Database Systems 11Wednesday, Dcember 21, 2011
Distributed Database Environment
Wednesday, Dcember 21, 2011 12
Traditional distributed processing architecture
Distributed Database Systems
LANCLIENT
CLIENT
LAN
CLIENT CLIENT
CLIENT CLIENT
LAN
CLIENT
CLIENT
LAN
CLIENT
Mumbai
CLIENT
CLIENT CLIENT
Nagpur
DBM
S
WID
E AREA NE TW
ORK
Delhi Bangalore
CLIENT
CLIENT
CLIENT
CLIENT
Wednesday, Dcember 21, 2011 13
Distributed Database Architecture
Distributed Database Systems
WID
E AREA NETW
ORK
LANCLIENT CLIENT
CLIENT CLIENT
DBM
SLAN
CLIENT CLIENT
CLIENT CLIENT
DBM
S
Bangalore
CLIENT CLIENT
CLIENT
DBM
S
Nagpur
CLIENT
CLIENT CLIENT
CLIENT
DBM
S
Delhi
CLIENT
CLIENT
CLIENT
Mumbai
Wednesday, Dcember 21, 2011 14
Distributed Database Management System
Distributed Database Systems
Components of a Distributed DBMS Possible access methods
Wednesday, Dcember 21, 2011 15
Overview
• What and why?• The Distributed Database Management Systems• The Reference Architecture for Distributed Databases• Data Fragmentation, • Distributed Transparency • Distributed Database Design
Distributed Database Systems
Wednesday, Dcember 21, 2011 16
Reference Architecture for Distributed DBMS
Distributed Database Systems
The reference model has two main parts
1. Site independent schemas2. Site dependent schemas
Distributed Database Systems 17Wednesday, Dcember 21, 2011
FRAGMENTS AND PHYSICAL IMAGES FOR A GLOBAL RELATION
Distributed Database Systems 18Wednesday, Dcember 21, 2011
The most important three features that motivates in designing this architecture are
• Separation of data fragmentation and allocation.
• The control of redundancy.
• The independence from local DBMS.
What is so fascinating about this architecture?
Wednesday, Dcember 21, 2011 19
Overview
• What and why?• The Distributed Database Management Systems• The Reference Architecture for Distributed Databases• Data Fragmentation, • Distributed Transparency • Distributed Database Design
Distributed Database Systems
Distributed Database Systems 20Wednesday, Dcember 21, 2011
Types of Data Fragmentation
• Horizontal Fragmentation• Vertical Fragmentation• Hybrid/Mixed Fragmentation
There are some rules that must be followed when defining fragments:
• Completeness condition• Reconstruction condition• Disjointness condition
Distributed Database Systems 21Wednesday, Dcember 21, 2011
Horizontal Fragmentation
Let a global relation beSUPPLIER (SUM, NAME, CITY)
Here the SUPPLIER contains supplier number, supplier name and the city where the supplier lives. However if the entire supplier comes from Nagpur city (“NGP”) and Mumbai city (“MUM”) then the horizontal fragmentation can be defined in the following way:
SUPPLIER1 = SL CITY = ”NGP” SUPPLIERSUPPLIER2 = SL CITY = ”MUM” SUPPLIER
It is always possible to reconstruct the SUPPLIER global relation through the union operation:
SUPPLIER = SUPPLIER1 UN SUPPLIER2
q1: CITY=“NGP” AND q2: CITY=“MUM”
Distributed Database Systems 22Wednesday, Dcember 21, 2011
Horizontal Fragmentation CNTD..
A1 A2 ………. An1
1
1
2
2
3
3
3
T1
T2
T3
.
.T60
T61
.
.
Tn
A1 A2 ………. An
A1 A2 ………. AnT1
T2
T3
.
.T60
T61
.
.
Tn
Site 1
Site 2
SUPPLY(SNUM, PNUM, DEPTNUM, QUAN)
SUPPLY1 =SUPPLY SJ SNUM =SNUM SUPPLIER1
SUPPLY2 =SUPPLY SJ SNUM =SNUM SUPPLIER2
Derived Horizontal Fragmentation
Distributed Database Systems 23Wednesday, Dcember 21, 2011
VERTICAL FRAGMENTATION
A1 A2 A3 A4
A1 A2 A3 A4
Original Relation
(R) t1
t2
tn
RS1
RS2
t1
t2
tn
t1
t2
tn
SITE1 SITE2
How to Reconstruct:
R=Rs1 Rs2 Rsn
TID –Tuple ID Hidden Attribute to
ensure account and simple join reconstruction
RS1.TID=RS2.TID
Join condition
1
2
n
1
2
n
TID TID
Distributed Database Systems 24Wednesday, Dcember 21, 2011
EMPLOYEE (EMPNUM, SAL, TAX, MGRNUM, DEPTNUM)
A vertical fragmentation of this relation can be defined as
EMPLOYEE1 = PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEEEMPLOYEE2 = PJ EMPNUM, SAL, TAX EMPLOYEE
The fragmentation could, for instance, reflect an organization in which salaries and taxes are managed separately. The reconstruction of relation EMPLOYEE can be obtained as
EMPLOYEE = EMPLOYEE1 JN EMPNUM = EMPNUM EMPLOYEE2
VERTICAL FRAGMENTATION
Distributed Database Systems 25Wednesday, Dcember 21, 2011
MIXED FRAGMENTATION
usa
Europe
A1 A2 A3
A1 A2 A3
A4 A5
A4 A5
A1 A2 A3 A4 A5
(Salary Attributes)
(Benefit Attributes)
Rs1
Rs2
Rs3
Rs4
R
Distributed Database Systems 26Wednesday, Dcember 21, 2011
EMPLOYEE (EMPNUM, NAME, SAL, TAX, MGRNUM, DEPTNUM)
The following is a mixed fragmentation that is obtained by applying the vertical fragmentation of the previous example, followed by a horizontal fragmentation on DEPTNUM:
EMPLOYEE1 = SL DEPTNUM <= 10 PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEEEMPLOYEE2 = SL 10 < DEPTNUM <= 20 PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEEEMPLOYEE3 = SL DEPTNUM > 10 PJ EMPNUM, NAME, MGRNUM, DEPTNUM EMPLOYEEEMPLOYEE4 = PJ EMPNUM, NAME, SAL, TAX EMPLOYEE
The reconstruction of relation EMPLOYEE is defined by the following expression:
EMPLOYEE = UN (EMPLOYEE1, EMPLOYEE2, EMPLOYEE3) JN EMPNUM=EMPNUM PJ EMPNUM, SAL, TAX EMPLOYEE4
MIXED FRAGMENTATION
Wednesday, Dcember 21, 2011 27
Overview
• What and why?• The Distributed Database Management Systems• The Reference Architecture for Distributed Databases• Data Fragmentation, • Distributed Transparency • Distributed Database Design
Distributed Database Systems
Distributed Database Systems 28Wednesday, Dcember 21, 2011
Select NAME into $NAME from SUPPLIER where SNUM=$SNUM
Transparencies as seen by simple applicationFragmentation transparency
Location transparency
Local Mapping transparency
Distributed Database Systems 29Wednesday, Dcember 21, 2011
Transparencies as seen by simple application
No transparency
Distributed Database Systems 30Wednesday, Dcember 21, 2011
Topics left for you from the syllabus
• Distributed database access primitives
• Integrity constraints in Distributed databases
Wednesday, Dcember 21, 2011 31
Overview
• What and why?• The Distributed Database Management Systems• The Reference Architecture for Distributed Databases• Data Fragmentation, • Distributed Transparency • Distributed Database Design
Distributed Database Systems
Distributed Database Systems 32Wednesday, Dcember 21, 2011
Distributed Database Design
Any database design has following issues to be addressed
1. Designing the conceptual schema2. Designing the physical storage
Distribution of data also add to this
3. Designing how to fragment data4. Designing how to allocate fragments to sites
Distributed Database Systems 33Wednesday, Dcember 21, 2011
Distributed Database Design
Objectives of the design of data distribution
• Process locality• Availability and reliability of distributed data• Workload distribution• Storage cost and availability• Distributed database design
Distributed Database Systems 34Wednesday, Dcember 21, 2011
Distributed Database Design
Two approaches to design
1. Top-down approach• start by designing the global schema, and we proceed by designing the
fragmentation of the database, and then allocating the fragments to the sites, creating the physical images
• Suitable for systems which are developed from scratch
2. Bottom-up approach• The selection of a common database model for describing the global
schema of the database.• The translation of each local schema into the common data model.• The integration of the local schemata into a common global schema.
Distributed Database Systems 35Wednesday, Dcember 21, 2011
The Design of Database Fragmentation
Horizontal Fragmentation (Primary)
Let P={p1,p2,…,pn} be a set of simple predicates. In order for P to represent fragment correctly and efficiently, P must be complete and minimal
1. We say that a set P is complete iff any two tuples belonging to the same fragment are referenced with same probability by any application.
2. We say the set P is minimal if all its predicates are relevant.
Distributed Database Systems 36Wednesday, Dcember 21, 2011
The Design of Database Fragmentation
Horizontal Fragmentation (Derived)
• A distributed join is a join between horizontally fragmented relations which is represented by Join Graphs
• Join Graphs• Total• Reduced
• Simple• Partitioned
Derived Fragments : Ri=Si SJF R
Distributed Database Systems 37Wednesday, Dcember 21, 2011
The Design of Database Fragmentation
Vertical Fragmentation
1. Split approach2. Grouping approach
Distributed Database Systems 38Wednesday, Dcember 21, 2011
The Design of Database Fragmentation
Mixed Fragmentation
1. Applying Vertical Fragmentation to Horizontal fragments
2. Applying Horizontal Fragmentation to Vertical fragments
Distributed Database Systems 39Wednesday, Dcember 21, 2011
The Allocation of Fragments
General criteria for fragment allocation
1. Redundant2. Non-Redundant
If replicated complexity is high because3. The degree of replication of each fragment becomes a
variable of the problem.4. Modeling read applications is complicated by the fact that
the applications can now select among several alternative sites for accessing fragments
Distributed Database Systems 40Wednesday, Dcember 21, 2011
For determining the redundant allocation of fragments, either of the following methods can be used:
1. All beneficial sites: In this approach the set of all sites where the benefit of allocation one copy of the fragment is higher than the cost, and allocate a copy of the fragment to each element of this set.
2. Additional replication: Here first the solution of the non replicated problem, and then progressively introduce replicated copies starting from the most beneficial; the process is terminated when no additional replication is beneficial.
The Allocation of Fragments
Distributed Database Systems 41Wednesday, Dcember 21, 2011
Measure of costs and benefits of fragment allocation
Some Definitions
• i is the fragment index• j is the site index• k is the application index• fkj is the frequency of application k at site j• rki is the number of retrieval references of
application k to fragment I• uki is the number of update references of
application k to fragment I• nki = rki - uki
Distributed Database Systems 42Wednesday, Dcember 21, 2011
Measure of costs and benefits of fragment allocation
Horizontal fragmentation:
1. Using the ‘best-fit’ approach for a non-replicated allocation, we place Ri at the site where the number of references to Ri is maximum. The number of local references of Ri at site j is
Bij = ∑k fkj nki
Ri is allocated at site j* such that Bij* is maximum.
2. Using the ‘all beneficial sites’ method for replicated allocation, we place Ri at all sites j where the cost of retrieval references of applications is larger than the cost of update references to Ri from applications at any other site. Bij is evaluated as the difference:
Bij = k fkj rki – C *∑k∑ j≠j fkj’ uki
C is a constant which measures the ratio between the cost of an update and retrieval access
Distributed Database Systems 43Wednesday, Dcember 21, 2011
3. Using the ‘additional replication’ method for replicated allocation, we can measure the benefit of placing a new copy of Ri in terms of increased reliability and availability of the system.
Let di denote the degree of redundancy of Ri, and let Fi denote the benefit of having Ri fully replicated at each site. The following function was introduced to measure this benefit:
β(di) = (1 – 21-di) Fi
Note that β(1) = 0, β(2) = Fi / 2, β(3) = 3 Fi / 4, and so on.We evaluate the benefit of introducing a new copy of Ri at site j by modifying the formula of case 2 as follows:
Bij = k fkj rki – C *∑k∑ j≠j fkj’ uki + β(di)
Measure of costs and benefits of fragment allocation
Distributed Database Systems 44Wednesday, Dcember 21, 2011
References
1. “Distributed databases Principals & Systems”, Stefano Ceri, Ginseppe Pelagatti, McGrawHill Book Company, 1984.
2. ”Database System Concepts”, Abraham Silberschatz, Henry F. Korth, S. Sudarshan, Third Edition,The McGraw Hill Companies, Inc, 1997.
3. Database Systems- Design, Implementation and Management; Peter Rob, Carlos Coronnel; Course Technology; 2000
4. Principles of Distributed Database Systems , M. T. Özsu and P. Valduriez, 3rd edition, Springer, 2011
Distributed Database Systems 45Wednesday, Dcember 21, 2011
Motivation is what gets you started and Habit is what keeps you going…
Thanks a lot for patient listening!!
Questions?
You can reach me atmangeshwanjari[at]gmail.com