Date post: | 19-Jan-2016 |
Category: |
Documents |
Upload: | valentine-carpenter |
View: | 216 times |
Download: | 0 times |
1
Lecture 7
Distributed Data Bases: Principles and Architectures
2
Distributed Data Base (DDB) – Definition
A logically interrelated collection of shared data, physically distributed over a computer network.
Implies data description at two levels: Global (the view of the whole); Local (where data is actually held).
3
DDBMS (Distributed DBMS)
The software system that permits the management of the distributed data base and makes the distribution transparent to users: Transparent – users are unaware of the
underlying local structure: Data requests do not specify distribution sites- but they may notice performance differences (e.g.
if local data has to be moved to another site along a slow line).
4
Characteristics of a DDB
Collection of logically related shared data. Data is split into a number of fragments
(horizontal or vertical – restrict or project). Fragments may be replicated. Fragments / replicas are allocated to sites.
Fragments are in effect views; Replicas are duplicates – only acceptable if
redundancy is controlled.
5
Why distribute?
Natural match of data with location Can have each division, department or office
hold its own data with some degree of autonomy. Autonomy – to have control (self-determination,
self-rule)
Users can decide policies locally (devolved). Still need global DBA to ensure entire system
works.
6
Why distribute? (cont.)
More flexible operation: Improved availability
- one node failure does not bring the whole system down. Improved reliability
- replication ensures that copies of data are still available if a node fails.
Improved performance- accessing most data locally reduces network overheads.
Readily handle expansion - can add new nodes with local schema - followed by simple adjustments to global definition.
7
Problems with Distribution Complexity
Global and local schema must be integrated. Design techniques involve more stages. Replications rigorously handled. Network needs to be robust.
Costs More people effort needed to handle the complexity– although cheaper to buy power with smaller
machines rather than larger ones.
8
Problems with Distribution (cont.)
Security Many more potential access points for would-
be violators. Integrity
Need to ensure that combination of local and global constraints gives the required effect.
Experience Fairly immature technology– not yet translated to standards.
9
Homogeneous and Heterogeneous DDBMS
A homogeneous DDBMS uses the same database product at all sites.
A heterogeneous DDBMS uses different data base products at various sites– may arise from corporate mergers.
10
Degrees of Heterogeneity
Same software, different hardware can be handled fairly easily.
Oracle 9i : Oracle 8i – differences slight. Oracle 9i : SQL Server – same underlying
relational model, different syntax in places. Oracle 9i : Objectivity – object-relational
(SQL-1999) and ODMG respectively, different underlying model.
11
Interoperability
Ability to work with each other. In a loosely coupled environment:
Full details of each system not needed– BUT need to have interfaces for reliably exchanging
messages without error or misunderstanding Solutions:
Standardized specifications; Mediation.
Differences in implementation may still lead to breakdowns in communication.
12
Simple Problem in Interoperability - 1
Two schemas in SQL-1999:
A Bauthor varchar2(50), author_surname varchar2(40), author, initials varchar2(10),title varchar2(300), title varchar2(200),keyword1 varchar2(30), keywd keywordarr;keyword2 varchar2(30);
CREATE TYPE keywordarr AS VARRAY(8) OF varchar2(30);
Note: homogeneous model – both SQL-1999 – but difficulties.
13
Different Standards
e.g. Names: Person (surname, first_name, …) or Person (first_name, surname, …) or Person (name, …)
First two may easily be made equivalent but convention in third needs to be understood. Note also possibilities of A.N.Other, AN
Other, A N Other, etc.
14
Possible Solutions
In schema B, define a function which amalgamates the two parts of author into one value. Will need to look manually at format of author in
schema A. If format inconsistent, need some pre-processing.
Other inconsistencies require decisions: Fixed two entries for keyword vs. array dimension 8. Different name for keyword attribute. Different size for title fields (presumably adopt higher).
In a heterogeneous environment, we need also to relate schema constructions, e.g. is CLASS the same as TABLE?
15
Simple Problem in Interoperability - 2
Homogeneous Models The same information may be held as attribute
name, relation name or a value in different data bases.
e.g. library fines could be held in a dedicated relation
Fine (amount, borrower_id)– or as an attribute of Loan (id, isbn, date_out, fine)– or as a value Charge (1.25, ‘fine’).
16
Architectures for Interoperability 11. Global schema integration
Produces a single new schema (C) for the different information systems with schemas (A, B).
A
C
B
17
Global Schema Integration
Advantages Transparent to end users – appears as a single
information system. Disadvantages
Difficult to perform integration – needs human understanding.
Local autonomy lost. Static – does not evolve automatically. Tightly-coupled.
18
Architectures for Interoperability 2
2. Federated Data Base Systems Less tightly coupled schema than in 1. Each service specifies sharable objects
through an export schema. Common data model. Internal command language. Decentralised control (local autonomy). 5-level architecture for federated system.
e.g. Objectivity as Federated OODBMS
19
FDBMS Terminology
IS – Internal Schema defining layout on disk of a conceptual schema
CS – Conceptual Schema defining logical data base (e.g. relational – tables, attributes, domains).
ES – External Schema defining views on conceptual schema.
20
Federated DBMS – 5-level Architecture
Local CS Local CS
Local IS Local IS
DB
Global CS
Global ESGlobal ES
Local ESLocal ES
DB
21
Federated Data Base:Loosely coupled
Created by users. AE, BE are export schemas. V is a view. A, B are base schemas, retaining autonomy over
those parts not exported.
A B
V
AEBE
22
Federated Data Bases:Tightly coupled
Created by administrators. Global schema integration on all export
schemas. More formal than loosely-coupled. Much effort to resolve semantic
inconsistencies.
23
Federated Data Base Systems – General Advantages
Local autonomy preserved. Not all data needs to be integrated. Provide meta-data structures for views
(external and export schema, data dictionary).
24
Federated Database Systems - Disadvantages by Approach
Loosely coupled Duplication by different users in building views. Updating data defined in views can be difficult.
Tightly coupled Similar to global schema integration:
Complex, difficult to make changes dynamically.Much effort needed to resolve semantic inconsistencies.
25
Multi-Data-Base Language Approach
No attempt at schema integration. All sites maintain complete autonomy. The various schemas can be heterogeneous,
inconsistent w.r.t. services provided, and duplicate information in different ways.
Language (e.g. MSQL) is used to integrate data bases at run time.
Relational model used as Common Data Model.
26
Multi-Data-Base Language Approach
A, B are schemas. MSQL is the run-time language.
A B
MSQL
27
Multi-Data-Base Language Approach – Advantages
No preparatory work to understand semantics of schema.
Dynamic – access latest versions. Very skilled users can succeed in reaching
their goals. Interesting work on multi-data-base
dependencies.
28
An Example Multi-Data-Base Language
MSQL (Multidatabase SQL) Biased towards the relational model. Illustrates problems.
Consider 2 data bases: Each on publications of a computing society; and query: “What is the name, e-mail address, title for
each publication of an author appearing in both of the society’s data bases?”
29
MSQL Schema
Schema 1 (for AIIA Database): Contacts (PersonID, Name, Email, …) Conference (Name, Type, …) Attendees (ID, Conf_ID, Speaker, …) Publ_Papers (P_ID, Title, Author_ID, …)
Schema 2 (for IFIP Database): Member_Socs (Soc_Name, …) Conf (Conf_ID, …) Publ_Papers (P_Ref, Title, Conf_Ref, …) Authors (Name, Email, Paper_ID, …)
Underlined attributes are primary keys.Attributes in italics are foreign keys.
30
MSQL for QueryUSE AIIA, IFIPSELECT Name, Email, TitleFROM Authors,IFIP.Publ_Papers IFIP_Paper,Contacts,AIIA.Publ_papers AIIA_PaperWHERE Authors.Name = Contacts.NameAND Contacts.Person_ID = AIIA_Paper.Author_IDAND Authors.Paper_ID = IFIP_Paper.P_Ref;
The USE clause declares the multi-data-bases which are used as qualifiers in the FROM clause to distinguish tables with the same name (thereafter distinguished by aliasing).Retrieves Name, E-mail address and Title from both data bases.
31
Potential Problems with MSQL
Are names and domains of corresponding attributes the same?
Can use LET command to create equivalences of names, but this does not solve domain incompatibility.
What if one schema is not relational? The E-R model is often used as a neutral schema for translation and comparison of heterogeneous features.
32
Multi-data-base Language – Disadvantages in General
Distribution is not transparent. Users must resolve inconsistencies
themselves. Common language may restrict scope of
heterogeneity (relational bias). Local autonomous systems may change
their schema freely (so existing queries fail).
33
Comparison of Approaches
By coupling: How tightly is the interoperable system
connected to its underlying systems? By adaptability:
How freely can the interoperable system evolve in line with the underlying schema?
By transparency: How much understanding of the interoperable
system do end-users need to have?
34
Comparison of Approaches
Approach Coupling AdaptabilityTransparency
Global Schema Tight Low High
Integration
Federated Medium Medium Medium
Data Bases
Multi-data-base Loose High Low
Languages
35
Summary
Trend:
From Global Schema Integration– through Federated Data Bases– to Multi-data-base Language.
Towards looser coupling, higher adaptability, and lower transparency.
36
Further Reading
Management of Heterogeneous and Autonomous Database Systems
Elmagarmid, AhmedRusinkiewicz, MarekSheth, Amit
Morgan Kaufmann (1999).