+ All Categories
Home > Documents > A Brief History of DB Theory

A Brief History of DB Theory

Date post: 12-Feb-2016
Category:
Upload: shania
View: 39 times
Download: 0 times
Share this document with a friend
Description:
A Brief History of DB Theory. Functional dependencies --- Normalization More dependencies: multivalued, general Universal relations Acyclic hypergraphs Logical query languages: Datalog. Functional Dependencies. - PowerPoint PPT Presentation
47
1 A Brief History of DB Theory • Functional dependencies --- Normalization • More dependencies: multivalued, general • Universal relations • Acyclic hypergraphs • Logical query languages: Datalog
Transcript
Page 1: A Brief History of DB Theory

1

A Brief History of DB Theory• Functional dependencies ---

Normalization• More dependencies: multivalued,

general• Universal relations• Acyclic hypergraphs• Logical query languages: Datalog

Page 2: A Brief History of DB Theory

2

Functional Dependencies• Setting: A relation = table with

column headers (schema ) and a finite number of rows (tuples ).

A B C0 1 20 3 4

Page 3: A Brief History of DB Theory

3

FD X ->Y

• A statement about which instances (finite sets of tuples) are “legal” for a given relation.

• If two tuples agree in all the attributes of set X , then then must also agree in all attributes of set Y.– Common case: X is the key of the

relation, Y is the other attributes.

Page 4: A Brief History of DB Theory

4

Issue #1: Inference• Armstrong’s Axioms

– X ->Y if Y is a subset of X– X ->Y implies XZ ->YZ (XZ , etc. =

“union”)– X ->Y and YZ ->W implies XZ ->W

• Closure test (Bernstein): given FD’s and a possible consequent X ->Y , use the FD’s to see what X determines, and then see if Y is contained therein.

Page 5: A Brief History of DB Theory

5

Issue #2: Redundancy• Certain combinations of FD’s lead to

redundancy and other “anomalies.”• Example: B ->C is the only FD:

A B C0 1 23 1 ?

• The FD lets us predict the value of ?.

Page 6: A Brief History of DB Theory

6

Issue #3: Normalization• Eliminate redundancy by splitting

schemas.• A FD X ->Y allows us to split schema

XYZ into XY and XZ .• Without the FD, the decomposition

would lack a lossless join : the ability to reconstruct the original XYZ relation from the decomposed relations.

Page 7: A Brief History of DB Theory

7

How I Met Moshe & Learned Database Theory

Tsichritzis

Bernstein Beeri

Ullman Vardi

Page 8: A Brief History of DB Theory

8

Why I Like Working With Vardi• Taste, selection of issues.• Best inventor of constructions.• Name comes after mine in

alphabet.

Page 9: A Brief History of DB Theory

9

Lossless Joins• When relations are decomposed,

do the pieces allow reconstruction of the original?– Only way: join the projected relations.– You always get back what you started

with.– Bad case is when you get more.

Page 10: A Brief History of DB Theory

10

ExampleA B C0 1 23 1 4

AB A B C B C0 1 0 1 4 1 23 1 3 1 2 1 4

0 1 23 1 4

Page 11: A Brief History of DB Theory

11

More Kinds of Dependencies• In essence, a dependency is any

predicate that tells whether a given set of tuples is OK for a given schema.

• The language chosen determines what constraints can be expressed and what we can decide about relations.– FD’s are just one, simple example

Page 12: A Brief History of DB Theory

12

Multivalued Dependencies• These occur when a relation tries

to connect one class of objects to independent sets from two other classes..

• Notation: X ->->Y means:– If two tuples agree on X , then we may

swap the Y components and get two tuples that are also in the relation.

Page 13: A Brief History of DB Theory

13

Example• EmpID ->-> Phone

EmpID Addr Phone ProjectIf: 123 a1 p1 j1123 a2 p2 j2Then: 123 a1 p2 j1123 a2 p1 j2

Page 14: A Brief History of DB Theory

14

MVD Origins• Independent work of Delobel,

Fagin, Zaniolo• Inference of MVD’s + FD’s: Beeri,

Fagin, and Howard.

Page 15: A Brief History of DB Theory

15

Generalized Dependencies• Equality-generating dependencies:

– If tuples with this pattern of equal symbols appear in a relation instance, then certain symbols must also be equal.

– Generalizes FD’s.• Tuple-generating dependencies:

– If tuples with this pattern of equal symbols appear in a relation instance, then a certain tuple must also appear.

– Generalizes MVD’s.

Page 16: A Brief History of DB Theory

16

Example: EGD• FD A ->B in schema ABC :

A B Ca b1 c1

(Hypotheses)a b2 c2 b1 = b2

(Conclusion)

Page 17: A Brief History of DB Theory

17

Example: TGD• MVD A ->->B in schema ABC :

A B Ca b1 c1

(Hypotheses)a b2 c2a b1 c2 (Conclusion)

Page 18: A Brief History of DB Theory

18

Full Versus Embedded GD’s• Full GD’s have a conclusion with only

symbols that appear in the hypotheses.• Embedded GD’s may have new symbols

in the conclusion.– Existentially quantified.

• Embedded EGD makes no sense, but embedded TGD’s are quite interesting.

Page 19: A Brief History of DB Theory

19

The Chase for Inferring GD’s• Test whether a GD G follows from given

GD’s.• Start with R = the hypotheses of G .• Apply GD H by mapping the hypotheses

of H to some rows of R .– If so, infer the (mapped) conclusion of H ---

equate two symbols of R or add a tuple to R .• If you eventually infer the conclusion of G

, then G follows, else it does not.

Page 20: A Brief History of DB Theory

20

Example: If A ->B Then A ->->B• R has tuples (a,b1,c1) and

(a,b2,c2). Does it have (a,b1,c2)?• Apply A ->B .

– It’s hypotheses are the same as the two tuples of R , so surely they map.

– Lets us conclude b1=b2.– Thus, since R has (a,b2,c2), It surely

has (a,b1,c2).

Page 21: A Brief History of DB Theory

21

Decision Properties of GD’s• If all dependencies are full, no new

symbols ever appear, so the chase must terminate.– Either we prove the desired

conclusion, or R becomes a relation that provides a counterexample --- it satisfies all the given GD’s, but not the one we were trying to infer.

Page 22: A Brief History of DB Theory

22

Undecidability of Embedded GD’s• First undecidability results involved the

inference of untyped GD’s (symbols may appear in several columns).– Beeri and Vardi– Chandra, Lewis, and Makoswky

• Tough result is undecidability even when GD’s are typed (one column per symbol).– Vardi– Gurevich and Lewis

Page 23: A Brief History of DB Theory

23

History of GD’s• Similar ideas were developed

independently by several people:– Yannakakis and Papadimitriou– Beeri and Vardi– Fagin– Sadri and Ullman

Page 24: A Brief History of DB Theory

24

The Universal Relation• Idea: attributes carry information

regardless of how they are placed in relation schemas.

• One of the cool things about Moshe Vardi is the collection he keeps of re-inventions of this universal relation concept.– Typically touted by some HCI type as a

“wonderful new interface to databases.”

Page 25: A Brief History of DB Theory

25

UR Query Systems Query

UR

StoredRelations

Page 26: A Brief History of DB Theory

26

Example• Stored relations: ES(Emp,Sal),

ED(Emp,Dept), DM(Dept,Mgr)• Universal relation: U(Emp,Sal, Dept,Mgr)• UR query: SELECT Mgr WHERE Emp=a• Translated to: SELECT Mgr FROM ED, DM WHERE Emp=a AND ED.Dept=DM.Dept

• U != join(ES, ED, DM); empty ES proves it.

Page 27: A Brief History of DB Theory

27

Theory of Universal Relations• Some were quite upset by the UR idea.

– Objection: the informal or ad-hoc way queries were translated from UR to stored relations.

• Codd: “Neither the collection of all base relations nor the collection of all views should be cast by the DBMS in the form of a `universal relation’ (in the Stanford University sense) [Vardi, 1988].”

Page 28: A Brief History of DB Theory

28

Some Early Approaches to the UR• Selected joins of stored relations.• Representative instance = pad stored

relations with nulls in missing attributes; then chase with given dependencies.– Contributions by Vassiliou, Honeyman,

Mendelzon, Sagiv, Yannakakis.• Maximal objects = union of joins within

“acyclic hypergraphs.”– Maier, Ullman

Page 29: A Brief History of DB Theory

29

Window Functions• Two stage process:

1. If query involves set of attributes X , compute the window function [X ] = some relation with schema X derived from the UR.2. Apply the query to [X ].

• Different UR definitions give different window functions, e.g., join-based, rep.-instance, maximal-object.

• From Maier, Ullman, Vardi [1984].

Page 30: A Brief History of DB Theory

30

Acyclic Hypergraphs• Several different applications led to the

identification of a class of database schemas (collection of relation schemas) with useful properties.– Sensible connections among stored relations

in maximal-object theory of UR.– Joins of many relations in time polynomial in

the size of the input and output (Yannakakis).– Etc., etc.

Page 31: A Brief History of DB Theory

31

Hypergraphs• Nodes (typically attributes)• (Hyper)edges = sets of any

number of nodes.

A B C D E F

Page 32: A Brief History of DB Theory

32

GYO Reduction• Graham, Yu-Ozsuyoglu, independently

defined acyclic hypergraphs thusly: • Two transformations:

1. Delete a node that is in only one hyperedge.2. Delete a hyperedge contained in another.

• Hypergraph is acyclic iff the result is a single, empty edge.– Limit is unique for any hypergraph.

Page 33: A Brief History of DB Theory

33

Example• GYO steps: delete C; delete D;

delete {B}; delete F; delete {B,E}; delete A; delete B; delete E;

A B C D E F

Page 34: A Brief History of DB Theory

34

History• Acyclic hypergraph idea from Bernstein

and Goodman.– But the definition is rather different.

• Hypergraph version from Fagin, Mendelzon, Beeri, Maier, others.– And don’t forget the GYO folks.

• Fagin defined several related-but-distinct concepts of “acyclic.”– This one is “alpha-acyclic.”

Page 35: A Brief History of DB Theory

35

Logical Query Languages• Collections of Horn-clause rules

without function symbols behave almost like SQL, but can do recursion.

• Conjunctive queries = single Horn-clause rules w/o function symbols have decidable containment (Chandra and Merlin).

Page 36: A Brief History of DB Theory

36

Conjunctive Queries• Example: path(X,Y) :- edge(X,Z) &

edge(Z,Y)• Atoms in the body (hypothesis) refer to

stored relations (EDB or “Extensional Database”).

• Atom in head (conclusion) is the result.• Q1 is contained in Q2 if for every

database D, Q1(D ) is a subset of Q2(D ).

Page 37: A Brief History of DB Theory

37

Containment Mappings• Mapping from the variables of Q2

to the variables of Q1 that:1. Turns the head of Q2 into the head of Q1.2. Turns every atom in the body of Q2 into some atom in the body of Q1.

• Containment mapping exists iff Q1 is contained in Q2.

Page 38: A Brief History of DB Theory

38

ExampleQ1: answer(X,Y) :- e(X,Y) & e(Y,X)

Q2: answer(X,Y) :- e(X,W) & e(W,Z) & e(Z,Y)

Page 39: A Brief History of DB Theory

39

Datalog Programs• Collection of conjunctive queries.• Some predicates are EDB (stored).• Other predicates are IDB

(intensional database; defined by the CQ’s only).

• One IDB predicate is the answer = least fixedpoint of the CQ’s.

Page 40: A Brief History of DB Theory

40

Examplepath(X,Y) :- edge(X,Y)path(X,Y) :- path(X,Z) & edge(Z,Y)

• Value of path is all pairs of nodes such that there is a path from the first to the second according to EDB predicate edge .

Page 41: A Brief History of DB Theory

41

Optimization of Datalog Programs• Problem: often the query asks for

only a fraction of the answer.– e.g., find path(0,Y) for fixed node 0.

• Bottom-up methods (“seminaive”) evaluate the whole least-fixedpoint, throw most away.

• Top-down (goal seeking) can get stuck in left-recursion.

Page 42: A Brief History of DB Theory

42

Linear-Recursion Methods• Recursion is linear if at most one IDB

atom in the body.– Henschen-Naqvi.– Magic-sets for linear (Bancilhon, Maier,

Sagiv, Ullman).– Left- and right-linear special cases

(Naughton).– Conversion of nonlinear to linear

(Ramakrishnan, Sagiv, Ullman, Vardi).

Page 43: A Brief History of DB Theory

43

Magic Sets• Several similar techniques for rewriting or

executing Datalog programs in a way that avoids generation of useless facts.– Rohmer, Lescoeur, and Kerisit (earliest

exposition).– Beeri and Ramakrishnan (reordering of atoms).– Sacca and Zaniolo (simplification of rules).– Dietrich and Warren (tabulation).– Vielle (query-subquery).

Page 44: A Brief History of DB Theory

44

Bounded Recursion• Sometimes a recursive Datalog program

is equivalent to a nonrecursive program.• Sufficient conditions (Naughton,

Ioannidis, Naughton and Sagiv).• Undecidability (Gaifman, Mairson, Sagiv,

and Vardi).• Decidable cases (Vardi, Cosmodakis,

Gaifman, Kanellakis, and Vardi)

Page 45: A Brief History of DB Theory

45

Containment for Datalog• Undecidable if one Datalog program

is contained in another (Shmueli).• NP-complete whether a CQ is

contained in a Datalog program.• Triply exponential whether a

Datalog program is contained in a CQ (Chaudhuri and Vardi).

Page 46: A Brief History of DB Theory

46

What Is It All Good For?• Dependencies and normalization are

now familiar to most CS graduates.– Difference between BCNF and 3NF.– Tradeoffs: lossless joins versus maintaining

dependencies.• Magic-sets, recursion used in IBM’s

DB/2.• Universal-relation interfaces despite

Codd.

Page 47: A Brief History of DB Theory

47

More “What Is It All Good For?”• Information integration . While

logical query languages have not caught on, logic has been used to specify how legacy databases are combined into a uniform whole.– Tsimmis (Papakonstantinou, Vassalos).– Information Manifold (Levy).– Infomaster (Duschka, Genesereth).


Recommended