+ All Categories
Home > Documents > 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

1 Lecture 7 Distributed Data Bases: Principles and Architectures.

Date post: 19-Jan-2016
Category:
Upload: valentine-carpenter
View: 216 times
Download: 0 times
Share this document with a friend
36
1 Lecture 7 Distributed Data Bases: Principles and Architectures
Transcript
Page 1: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

1

Lecture 7

Distributed Data Bases: Principles and Architectures

Page 2: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

2

Distributed Data Base (DDB) – Definition

A logically interrelated collection of shared data, physically distributed over a computer network.

Implies data description at two levels: Global (the view of the whole); Local (where data is actually held).

Page 3: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

3

DDBMS (Distributed DBMS)

The software system that permits the management of the distributed data base and makes the distribution transparent to users: Transparent – users are unaware of the

underlying local structure: Data requests do not specify distribution sites- but they may notice performance differences (e.g.

if local data has to be moved to another site along a slow line).

Page 4: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

4

Characteristics of a DDB

Collection of logically related shared data. Data is split into a number of fragments

(horizontal or vertical – restrict or project). Fragments may be replicated. Fragments / replicas are allocated to sites.

Fragments are in effect views; Replicas are duplicates – only acceptable if

redundancy is controlled.

Page 5: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

5

Why distribute?

Natural match of data with location Can have each division, department or office

hold its own data with some degree of autonomy. Autonomy – to have control (self-determination,

self-rule)

Users can decide policies locally (devolved). Still need global DBA to ensure entire system

works.

Page 6: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

6

Why distribute? (cont.)

More flexible operation: Improved availability

- one node failure does not bring the whole system down. Improved reliability

- replication ensures that copies of data are still available if a node fails.

Improved performance- accessing most data locally reduces network overheads.

Readily handle expansion - can add new nodes with local schema - followed by simple adjustments to global definition.

Page 7: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

7

Problems with Distribution Complexity

Global and local schema must be integrated. Design techniques involve more stages. Replications rigorously handled. Network needs to be robust.

Costs More people effort needed to handle the complexity– although cheaper to buy power with smaller

machines rather than larger ones.

Page 8: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

8

Problems with Distribution (cont.)

Security Many more potential access points for would-

be violators. Integrity

Need to ensure that combination of local and global constraints gives the required effect.

Experience Fairly immature technology– not yet translated to standards.

Page 9: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

9

Homogeneous and Heterogeneous DDBMS

A homogeneous DDBMS uses the same database product at all sites.

A heterogeneous DDBMS uses different data base products at various sites– may arise from corporate mergers.

Page 10: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

10

Degrees of Heterogeneity

Same software, different hardware can be handled fairly easily.

Oracle 9i : Oracle 8i – differences slight. Oracle 9i : SQL Server – same underlying

relational model, different syntax in places. Oracle 9i : Objectivity – object-relational

(SQL-1999) and ODMG respectively, different underlying model.

Page 11: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

11

Interoperability

Ability to work with each other. In a loosely coupled environment:

Full details of each system not needed– BUT need to have interfaces for reliably exchanging

messages without error or misunderstanding Solutions:

Standardized specifications; Mediation.

Differences in implementation may still lead to breakdowns in communication.

Page 12: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

12

Simple Problem in Interoperability - 1

Two schemas in SQL-1999:

A Bauthor varchar2(50), author_surname varchar2(40), author, initials varchar2(10),title varchar2(300), title varchar2(200),keyword1 varchar2(30), keywd keywordarr;keyword2 varchar2(30);

CREATE TYPE keywordarr AS VARRAY(8) OF varchar2(30);

Note: homogeneous model – both SQL-1999 – but difficulties.

Page 13: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

13

Different Standards

e.g. Names: Person (surname, first_name, …) or Person (first_name, surname, …) or Person (name, …)

First two may easily be made equivalent but convention in third needs to be understood. Note also possibilities of A.N.Other, AN

Other, A N Other, etc.

Page 14: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

14

Possible Solutions

In schema B, define a function which amalgamates the two parts of author into one value. Will need to look manually at format of author in

schema A. If format inconsistent, need some pre-processing.

Other inconsistencies require decisions: Fixed two entries for keyword vs. array dimension 8. Different name for keyword attribute. Different size for title fields (presumably adopt higher).

In a heterogeneous environment, we need also to relate schema constructions, e.g. is CLASS the same as TABLE?

Page 15: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

15

Simple Problem in Interoperability - 2

Homogeneous Models The same information may be held as attribute

name, relation name or a value in different data bases.

e.g. library fines could be held in a dedicated relation

Fine (amount, borrower_id)– or as an attribute of Loan (id, isbn, date_out, fine)– or as a value Charge (1.25, ‘fine’).

Page 16: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

16

Architectures for Interoperability 11. Global schema integration

Produces a single new schema (C) for the different information systems with schemas (A, B).

A

C

B

Page 17: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

17

Global Schema Integration

Advantages Transparent to end users – appears as a single

information system. Disadvantages

Difficult to perform integration – needs human understanding.

Local autonomy lost. Static – does not evolve automatically. Tightly-coupled.

Page 18: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

18

Architectures for Interoperability 2

2. Federated Data Base Systems Less tightly coupled schema than in 1. Each service specifies sharable objects

through an export schema. Common data model. Internal command language. Decentralised control (local autonomy). 5-level architecture for federated system.

e.g. Objectivity as Federated OODBMS

Page 19: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

19

FDBMS Terminology

IS – Internal Schema defining layout on disk of a conceptual schema

CS – Conceptual Schema defining logical data base (e.g. relational – tables, attributes, domains).

ES – External Schema defining views on conceptual schema.

Page 20: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

20

Federated DBMS – 5-level Architecture

Local CS Local CS

Local IS Local IS

DB

Global CS

Global ESGlobal ES

Local ESLocal ES

DB

Page 21: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

21

Federated Data Base:Loosely coupled

Created by users. AE, BE are export schemas. V is a view. A, B are base schemas, retaining autonomy over

those parts not exported.

A B

V

AEBE

Page 22: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

22

Federated Data Bases:Tightly coupled

Created by administrators. Global schema integration on all export

schemas. More formal than loosely-coupled. Much effort to resolve semantic

inconsistencies.

Page 23: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

23

Federated Data Base Systems – General Advantages

Local autonomy preserved. Not all data needs to be integrated. Provide meta-data structures for views

(external and export schema, data dictionary).

Page 24: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

24

Federated Database Systems - Disadvantages by Approach

Loosely coupled Duplication by different users in building views. Updating data defined in views can be difficult.

Tightly coupled Similar to global schema integration:

Complex, difficult to make changes dynamically.Much effort needed to resolve semantic inconsistencies.

Page 25: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

25

Multi-Data-Base Language Approach

No attempt at schema integration. All sites maintain complete autonomy. The various schemas can be heterogeneous,

inconsistent w.r.t. services provided, and duplicate information in different ways.

Language (e.g. MSQL) is used to integrate data bases at run time.

Relational model used as Common Data Model.

Page 26: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

26

Multi-Data-Base Language Approach

A, B are schemas. MSQL is the run-time language.

A B

MSQL

Page 27: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

27

Multi-Data-Base Language Approach – Advantages

No preparatory work to understand semantics of schema.

Dynamic – access latest versions. Very skilled users can succeed in reaching

their goals. Interesting work on multi-data-base

dependencies.

Page 28: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

28

An Example Multi-Data-Base Language

MSQL (Multidatabase SQL) Biased towards the relational model. Illustrates problems.

Consider 2 data bases: Each on publications of a computing society; and query: “What is the name, e-mail address, title for

each publication of an author appearing in both of the society’s data bases?”

Page 29: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

29

MSQL Schema

Schema 1 (for AIIA Database): Contacts (PersonID, Name, Email, …) Conference (Name, Type, …) Attendees (ID, Conf_ID, Speaker, …) Publ_Papers (P_ID, Title, Author_ID, …)

Schema 2 (for IFIP Database): Member_Socs (Soc_Name, …) Conf (Conf_ID, …) Publ_Papers (P_Ref, Title, Conf_Ref, …) Authors (Name, Email, Paper_ID, …)

Underlined attributes are primary keys.Attributes in italics are foreign keys.

Page 30: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

30

MSQL for QueryUSE AIIA, IFIPSELECT Name, Email, TitleFROM Authors,IFIP.Publ_Papers IFIP_Paper,Contacts,AIIA.Publ_papers AIIA_PaperWHERE Authors.Name = Contacts.NameAND Contacts.Person_ID = AIIA_Paper.Author_IDAND Authors.Paper_ID = IFIP_Paper.P_Ref;

The USE clause declares the multi-data-bases which are used as qualifiers in the FROM clause to distinguish tables with the same name (thereafter distinguished by aliasing).Retrieves Name, E-mail address and Title from both data bases.

Page 31: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

31

Potential Problems with MSQL

Are names and domains of corresponding attributes the same?

Can use LET command to create equivalences of names, but this does not solve domain incompatibility.

What if one schema is not relational? The E-R model is often used as a neutral schema for translation and comparison of heterogeneous features.

Page 32: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

32

Multi-data-base Language – Disadvantages in General

Distribution is not transparent. Users must resolve inconsistencies

themselves. Common language may restrict scope of

heterogeneity (relational bias). Local autonomous systems may change

their schema freely (so existing queries fail).

Page 33: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

33

Comparison of Approaches

By coupling: How tightly is the interoperable system

connected to its underlying systems? By adaptability:

How freely can the interoperable system evolve in line with the underlying schema?

By transparency: How much understanding of the interoperable

system do end-users need to have?

Page 34: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

34

Comparison of Approaches

Approach Coupling AdaptabilityTransparency

Global Schema Tight Low High

Integration

Federated Medium Medium Medium

Data Bases

Multi-data-base Loose High Low

Languages

Page 35: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

35

Summary

Trend:

From Global Schema Integration– through Federated Data Bases– to Multi-data-base Language.

Towards looser coupling, higher adaptability, and lower transparency.

Page 36: 1 Lecture 7 Distributed Data Bases: Principles and Architectures.

36

Further Reading

Management of Heterogeneous and Autonomous Database Systems

Elmagarmid, AhmedRusinkiewicz, MarekSheth, Amit

Morgan Kaufmann (1999).


Recommended