Post on 13-Mar-2018
transcript
CSCI1270: Introduction to Database Systems
Another Use for FD’s: Schema Design
Schema Design: Approach #1
Schema Design: Approach #2
Schema Design: Approach #3
1. Construct E/R diagram 2. Translate into tables
Subjective: How do we know if any good?
1. Start with universal relation 2. Determine FD’s 3. “Decompose” UR using FD’s as guide
1. Construct E/R diagram to come up with 1st cut design 2. Use FD’s to verify or refine
CSCI1270: Introduction to Database Systems
Decomposition
1. Decomposing the Schema R = (bname, bcity, assets, cname, lno, amt)
R1 = (bname, bcity, assets, cname)
R2 = (cname, lno, amt)
Notation: R = R1 ∪ R2
CSCI1270: Introduction to Database Systems
Decomposition
1000 L-17 Hayes 9M Bkln Dntn 500 L-93 Jones 1.7M Hnck Mianus
2000 L-23 Johnson 9M Bkln Dntn 1000 L-17 Jones 9M Bkln Dntn amt lno cname assets bcity bname
Hayes 9M Bkln Dntn Jones 1.7M Hnck Mianus
Johnson 9M Bkln Dntn Jones 9M Bkln Dntn cname assets bcity bname
1000 L-17 Hayes 500 L-93 Jones 2000 L-23 Johnson 1000 L-17 Jones amt lno cname
2. Decomposing the Instance
R1 =
R =
R2 =
BTW: Not a Good Decomposition
CSCI1270: Introduction to Database Systems
Want to be able to reconstruct big relation by joining smaller ones (Natural join) (i.e.: R1 R2 = R?)
Goals of Decomposition 1. Lossless Joins
2. Dependency Preservation
3. Redundancy Avoidance
Want to minimize the cost of global integrity constraints based on FD’s (i.e.: Avoid big joins in assertions)
Summary: LJ: Information loss DP: Efficiency (time) RA: Efficiency (space), update anomalies
Avoid unnecessary data dupl. (motivation for decomposition)
CSCI1270: Introduction to Database Systems
Another Use for FD’s: Schema Design
1000 L-17 Hayes 9M Bkln Dntn 500 L-93 Jones 1.7M Hnck Mianus 2000 L-23 Johnson 9M Bkln Dntn 1000 L-17 Jones 9M Bkln Dntn amt lno cname assets bcity bname
Example:
R: “Universal Relation”
Design:
R =
Tuple meaning: Jones has a loan (L-17) for $1000 taken out of the Dntn branch in Bkln which has assets of $9M
pro : Fast queries (No need for joins!) con : Redundancy, update anomalies, deletion anomalies
CSCI1270: Introduction to Database Systems
Decomposition Goal #1: Lossless Joins
Hayes 9M Bkln Dntn Jones 1.7M Hnck Mianus
Johnson 9M Bkln Dntn Jones 9M Bkln Dntn cname assets bcity bname
1000 L-17 Hayes 500 L-93 Jones 2000 L-23 Johnson 1000 L-17 Jones amt lno cname
=
A Bad Decomposition
1000 L-17 Hayes 9M Bkln Dntn 500 L-93 Jones 1.7M Hnck Mianus 1000 L-17 Jones 1.7M Hnck Mianus 3000 L-23 Johnson 9M Bkln Dntn 500 L-93 Jones 9M Bkln Dntn 1000 L-17 Jones 9M Bkln Dntn amt lno cname assets bcity bname
CSCI1270: Introduction to Database Systems
Decomposition Goal #1: Lossless Joins
1000 L-17 Hayes 9M Bkln Dntn 500 L-93 Jones 1.7M Hnck Mianus 1000 L-17 Jones 1.7M Hnck Mianus 3000 L-23 Johnson 9M Bkln Dntn 500 L-93 Jones 9M Bkln Dntn 1000 L-17 Jones 9M Bkln Dntn amt lno cname assets bcity bname
→
→ =
A Bad Decomposition
Problem:
“Lossy join”: By adding noise, have lost meaningful information as a result of decomposition
adds meaningless tuples
CSCI1270: Introduction to Database Systems
(R1 R2 has 7 tuples, whereas R has 4)
A: Lossy. R1 R2 includes:
Lossless Joins
Hayes 9M Bkln Dntn Jones 1.7M Hnck Mianus
Johnson 9M Bkln Dntn Jones 9M Bkln Dntn cname assets bcity bname
500 L-93 Mianus 2000 L-23 Dntn 1000 L-17 Dntn amt lno bname
… 2000 L-23 Hayes 9M Bkln Dntn 1000 L-17 Johnson 9M Bkln Dntn 2000 L-23 Jones 9M Bkln Dntn
… amt lno cname assets bcity bname
Is the Following Decomposition Lossless or Lossy? R1 = R2 =
CSCI1270: Introduction to Database Systems
Is the Following Decomposition Lossless or Lossy?
A: Lossless. R1 R2 has 4 tuples
Lossless Joins
L-17 Hayes 9M Dntn L-93 Jones 1.7M Mianus L-23 Johnson 9M Dntn L-17 Jones 9M Dntn lno cname assets bname
500 Hnck L-93 2000 Bkln L-23 1000 Bkln L-17 amt bcity lno R1 = R2 =
CSCI1270: Introduction to Database Systems
A: Lossless. R1 R2 has 4 tuples
Lossless Joins
Hayes 1000 L-17 Dntn Jones 500 L-93 Mianus
Johnson 2000 L-23 Dntn Jones 1000 L-17 Dntn cname amt lno bname
1.7M Bkln Mianus 9M Bkln Dntn
assets bcity bname
Lossless or Lossy?
Q: When is decomposition lossless?
R1 = R2 =
CSCI1270: Introduction to Database Systems
▪ A a key ⇒ |R2| ≤ n, & Relationship with R1 is n:1
Ensuring Lossless Joins
● ● ● ● ●
● ● ●
▪ A not a key ⇒ |R1| = n ∴ n tuples in result
A Decomposition of R, R = R1 ∪ R2 is Lossless iff
Intuition: Original relation R has n tuples
R1 R2 A A
…
R1 ∩ R2 → R1 or R1 ∩ R2 → R2
(i.e.: Intersecting atts must form a super key for one of the resulting smaller relations)
CSCI1270: Introduction to Database Systems
Decomposition Goal #2: Dependency Preservation
Goal: Efficient integrity checks of FD’s
An Example With No Dependency Preservation:
Decomposition: R = R1 ∪ R2
Lossless, but Not DP. Why?
R = (bname, bcity, assets, cname, lno, amt) bname → bcity assets lno → amt bname
R1 = (bname, assets, cname, lno) R2 = (lno, bcity, amt)
CSCI1270: Introduction to Database Systems
Decomposition Goal #2: Dependency Preservation (cont.)
CREATE ASSERTION bname-bcity CHECK NOT EXISTS (SELECT * FROM R1 AS x1, R2 AS y1,R1 AS x2, R2 AS y2 WHERE x1.lno = y1.lno AND x2.lno = y2.lno AND x1.lno = x2.lno AND x1.bname = x2.bname AND
y1.bcity <> y2.bcity)
Decomposition (cont.): R = R1 ∪ R2
Lossless, but Not DP. Why?
R1 = (bname, assets, cname, lno) R2 = (lno, bcity, amt)
A: bname → bcity crosses 2 tables
CSCI1270: Introduction to Database Systems
Decomposition Goal #2: Dependency Preservation
… A1 … An B1 … Bm …
To Ensure Best Possible Efficiency of FD Checks
Above: Ri “covers” the FD, A1, …, An → B1, …, Bm To Test if Decomposition R = R1 ∪ … Rn is DP,
Ensure that only a SINGLE table be examined for each FD i.e.: Ensure that A1, …, An → B1, …, Bm can be checked
by examining one table as in:
1. See which FD’s of R are covered by R1, …, Rn 2. Compare closure of (1) to closure of FD’s of R
Ri =
CSCI1270: Introduction to Database Systems
Decomposition Goal #2: Dependency Preservation
More Formally: To test if R = R1 ∪ … ∪ Rn is dependency preserving wrt R’s FD set, F:
1. Compute F+ 2. Compute G
G ← ∅ For i ← 1 to n DO
Add to G those FD’s in F+ covered by Ri
3. Compute G+
4. If F+ = G+: Decomposition is DP If F+ ≠ G+: Decomposition is not DP
CSCI1270: Introduction to Database Systems
Decomposition Goal #2: Dependency Preservation (cont.)
More Formally (cont.):
Example:
To test if R = R1 ∪ … ∪ Rn is dependency preserving wrt R’s FD set, F:
1. Compute F+ 2. Compute G 3. Compute G+ 4. Compute F+ - G+
F = {A → B, AB → D, C → D} R1 = (A, B, C); R2 = (C, D)
Is this decomposition of (A, B, C, D) DP?
CSCI1270: Introduction to Database Systems
Aside: Computing F+
Many Algorithms Call For It If you know Armstrong’s Axioms cold, can generate lazily:
1. Compute Fc 2. Use Armstrong’s Axioms to derive (X → Y) ∈ Fc+ as
needed
CSCI1270: Introduction to Database Systems
Decomposition Goal #2: Dependency Preservation
Example: F = {A → B, AB → D, C → D} R1 = (A, B, C); R2 = (C, D)
Is R = R1 ∪ R2 DP?
A: 1. F+ = {A → B, AB → D, C → D}+ Note: (A → D) ∈ F+ 2. G = ∅ ∪ {A → B, …} ∪ {C → D, …} Note: (A → D) ∉ G 3. G+ = {…} Note: (A → D) ∉ G+ 4. F+ ≠ G+ because (A → D) ∈ (F+ - G+)
∴ Decomposition is not DP
CSCI1270: Introduction to Database Systems
Decomposition Goal #2: Dependency Preservation
Example:
Q: Does it satisfy lossless joins?
F = {A → B, AB → D, C → D} What is a DP decomposition of F?
A: R = R1 ∪ R2 s.t. R1 = (A, B, D); R2 = (C, D) 1. F+ = {A → B, AB → D,C → D}+ 2. G+ = {A → B,AB → D, C → D}+ 3. F+ = G+ Note: G+ cannot introduce FD’s not in F+
∴ Decomposition is DP
A: No
CSCI1270: Introduction to Database Systems
Decomposition Goals Summary
Lossless Joins
Dependency Preservation Motivation: Efficient FD assertions Idea: No gic’s require joins of more than 1 table with itself Test: R = R1 ∪ … ∪ Rn is DP if closure of FD’s covered by
each Ri = closure of FD’s covered by R = F+ Ensured for: 3NF
Motivation: Avoid information loss Idea: No noise introduced when reconstitution universal
relation via joins Test: At each decomposition test: R = R1 ∪ R2 (R1 ∩ R2) → R1 or (R1 ∩ R2) → R2 Ensured for: BCNF, 3NF
CSCI1270: Introduction to Database Systems
Decomposition Goal #3 Redundancy Avoidance
Redundancy:
1. Name FD of this relation? Ans: B → C
2. Name the super keys of this relation A: All sets of atts that include A
3. When do we have redundancy? A: When ∃ some FD, X → Y covered by
relation & X not a super key
1 z p
1 z n
2 y m
2 y h
2 y g
1 x e
1 x a
CB A
CSCI1270: Introduction to Database Systems
Redundancy Avoidance
Decomposition Goals Summary (cont.)
Motivation: Avoid update, deletion anomalies Idea: Avoid update anomalies, wasted space
Test: For any X → Y covered by Ri, X should be a superkey of Ri Ensured for: BCNF
CSCI1270: Introduction to Database Systems
Boyce-Codd Normal Form
What is a Normal Form?
BCNF:
Characterization of schema decomposition in terms of properties it satisfies
Guarantees no redundancy and lossless joins (Not DP!)
Defined: Relation schema R, with FD set F, is in BCNF if: For all nontrivial X → Y in F+: X → R (i.e.: X is a super key)
CSCI1270: Introduction to Database Systems
1. A → B, A → R (A is a key) 2. A → C, A → R (A is a key) 3. B → C, B → A (B is not a key)
BCNF
Example: R = (A, B, C) F = {A → B, B → C}
A: Consider the nontrivial dependencies in F+:
Therefore, R not in BCNF
Is R in BCNF?
CSCI1270: Introduction to Database Systems
BCNF
Example: R = R1 ∪ R2 R1 = (A, B); R2 = (B, C) F = {A → B, B → C}
A: 1. Test R1: A → B covered, A → R1 (all other FD’s covered trivial) 2. Test R2: B → C covered, B → R2 (all other FD’s covered trivial)
∴ R1, R2 in BCNF
Are R1, R2 in BCNF?
CSCI1270: Introduction to Database Systems
a. Choose (X → Y) ∈ F+ s.t. → (X → Y) covered by Ri → X → Ri
ALGORITHM BCNF (R: Relation, F: FD set) BEGIN
BCNF
Decomposition Algorithm
1. Compute F+
2. Result ← {R}
3. While some Ri ∈ Result not in BCNF, DO
b. Decompose Ri on (X → Y) Ri1 ← X ∪ Y Ri2 ← Ri – Y
c. Result ← Result – {Ri} ∪ {Ri1,Ri2} 4. Return result
END
CSCI1270: Introduction to Database Systems
Ri = {A, B, C, D, E) (B → CD) ∈ F+, B → Ri
BCNF
Decomposition Algorithm Each Step:
Decompose Ri that is not in BCNF
Ri1 = (B, C, D) Note: B → CD Covered, and
B → Ri1
Ri2 = (A, B, E)
Progress!
CSCI1270: Introduction to Database Systems
BCNF
Decomposition Algorithm (cont.)
Example:
R = (A, B, C, D) F = {A → B, AB → D, B → C}
1. Compute F+: F+ = {A → B, AB → D, B → C,
A → C, A → D, AB → C, AC → D, AD → C, ABC → D, ABD → C} + all trivial dep’s
Decompose R into BCNF?
CSCI1270: Introduction to Database Systems
Ri = {A, B, C, D) B → C covered, B → Ri
BCNF
Decomposition Algorithm (cont.)
R1 = (B, C) In BCNF
1. B → C & B → R1
R2 = (A, B, D) In BCNF 2. A → B, 3. AB → D, 4. A → D covered &
A → R2, AB → R2 ∴ Solution is R = R1 ∪ R2
CSCI1270: Introduction to Database Systems
BCNF
Note: This will suffice!
Find 2 decompositions, 1 DP and 1 not DP
R = (A, B, C, D, E, H) F = {A → BC, E → HA}
Decompose R into BCNF: F+ = {A → B, A → C, A → BC E → H, E → A, E → HA
E → B, E → C, E → BC E → HB, E → HC, E → AB E → AC, AE → …, ABE → …, ACE → …, ADE → …, …} + all trivial dep’s
CSCI1270: Introduction to Database Systems
BCNF Decomposition
R = (A, B, C, D, E, H) F = {A → BC, E → HA}
Decomposition #1: R = R1 ∪ R3 ∪ R4
Q: Is this DP? A: Yes. All Fc covered by R1, R3, R4. Therefore F+ covered
R = (A, B, C, D, E, H) Decompose on A → BC
R1 = (A, B, C) R2 = (A, D, E, H) Decompose on E → HA
R4 = (D, E) R3 = (A, E, H)
CSCI1270: Introduction to Database Systems
R4 = (C, D, E) Decompose on E → C
BCNF Decomposition (cont.)
R = (A, B, C, D, E, H) F = {A → BC, E → HA}
(Note: Fc = F)
Decomposition #2: R = R1 ∪ R3 ∪ R5 ∪ R6
Q: Not DP. Why? A: A → C not covered by R1, R3, R5 , R6.
R = (A, B, C, D, E, H) Decompose on A → B
R1 = (A, B) R2 = (A, C, D, E, H) Decompose on E → HA
R3 = (A, E, H)
R5 = (C, E) R6 = (E, D)
CSCI1270: Introduction to Database Systems
More BCNF (cont.)
Q: Can we decompose on FD’s in Fc to get a DP BCNF decomposition? A: Sometimes, BCNF + DP not possible
Decomposition #1: Decomposition #2:
R1 = (L, K) R2 = (J, L) R1 = (J, K, L) R2 = (J, L)
Not DP: JK → L not covered Still not in BCNF (L not a superkey)
R = (J, K, L) Decompose on L → K
R = (J, K, L) Decompose on JK → L
R = (J, K, L) F = {JK → L, L → K}
CSCI1270: Introduction to Database Systems
Aside
Is This a Realistic Example?
JK → L
L → K
A: BankerName → BranchName BranchName CustomerName → BankerName
Every banker works at one branch A customer works with the same banker at a given branch
CSCI1270: Introduction to Database Systems
Testing for FDs Across Relations • Decomposition not dependency preserving => an extra
materialized view (MV) for each dependency α →β in Fc that is not preserved in the decomposition
• The MV is a projection on α β of the join of the relations in the decomposition
• DBMS maintains MV when the relations are updated. è No extra coding effort for programmer.
- Space overhead: storing MV - Time overhead: keeping MV up to date
CSCI1270: Introduction to Database Systems
Multivalued Dependencies
• There are database schemas in BCNF that do not seem to be sufficiently normalized
• Consider a database classes(course, teacher, book)
• The database lists for each course the set of teachers any one of which can be the course’s instructor, and the set of books, all of which are required for the course (no matter who teaches it).
CSCI1270: Introduction to Database Systems
(course, teacher, book) is the only key, and therefore the relation is in BCNF
Insertion anomalies – i.e., if Sara is a new teacher that can teach database, two tuples need to be inserted
(database, Sara, DB Concepts)(database, Sara, Ullman)
course teacher bookdatabasedatabasedatabasedatabasedatabasedatabaseoperating systemsoperating systemsoperating systemsoperating systems
AviAviHankHankSudarshanSudarshanAviAvi Jim Jim
DB ConceptsUllmanDB ConceptsUllmanDB ConceptsUllmanOS ConceptsShawOS ConceptsShaw
classes
CSCI1270: Introduction to Database Systems
Therefore, it is better to decompose classes into:
course teacherdatabasedatabasedatabaseoperating systemsoperating systems
AviHankSudarshanAvi Jim
teaches
course bookdatabasedatabaseoperating systemsoperating systems
DB ConceptsUllmanOS ConceptsShaw
textWe shall see that these two relations are in Fourth Normal Form (4NF)
CSCI1270: Introduction to Database Systems
Multivalued Dependencies (MVDs) Let R be a relation schema and let α ⊆ R and β ⊆ R.
The multivalued dependency α →→ β
holds on R if in any legal relation r(R), for all pairs for tuples t1 and t2 in r such that t1[α] = t2 [α], there exist tuples t3 and t4 in r such that:
t1[α] = t2 [α] = t3 [α] = t4 [α] t3[β] = t1 [β] t3[R – β] = t2[R – β] t4 β] = t2[β] t4[R – β] = t1[R – β]
CSCI1270: Introduction to Database Systems
Example • Let R be a relation schema with a set of attributes
that are partitioned into 3 nonempty subsets.Y, Z, W
• We say that Y →→ Z (Y multidetermines Z)if and only if for all possible relations r(R)
< y1, z1, w1 > ∈ r and < y1, z2, w2 > ∈ rimplies
< y1, z1, w2 > ∈ r and < y1, z2, w1 > ∈ r• Note that since the behavior of Z and W are
identical it follows that Y →→ Z if Y →→ W
CSCI1270: Introduction to Database Systems
Example (Cont.) • In our example:
course →→ teacher course →→ book
• The above formalizes the notion that a particular value of Y (course) has associated with it a set of values of Z (teacher) and a set of values of W (book), and these two sets are in some sense independent of each other.
Note: If Y → Z then Y →→ ZIndeed we have (in above notation) Z1 = Z2
The claim follows.
CSCI1270: Introduction to Database Systems
Use of Multivalued Dependencies • We use multivalued dependencies in two ways:
1. To test relations to determine whether they are legal under a given set of functional and multivalued dependencies
2. To specify constraints on the set of legal relations. We shall thus concern ourselves only with relations that satisfy a given set of functional and multivalued dependencies.
• If a relation r fails to satisfy a given multivalued dependency, we can construct a relation rʹ that does satisfy the multivalued dependency by adding tuples to r.
CSCI1270: Introduction to Database Systems
Fourth Normal Form
• A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in D+ of the form α →→ β, where α ⊆ R and β ⊆ R, at least one of the following hold:α →→ β is trivial (i.e., β ⊆ α or α ∪ β = R)α is a superkey for schema R
• If a relation is in 4NF it is in BCNF