Date post: | 04-Apr-2018 |
Category: |
Documents |
Upload: | ahmad-shdifat |
View: | 232 times |
Download: | 1 times |
of 103
7/30/2019 Advance SQL
1/103
SQL and More Databases Final
7/30/2019 Advance SQL
2/103
Simple SQL Queries
A SQL query has a form:SELECT . . .
FROM . . .
WHERE . . .;
The SELECT clause indicates which attributes shouldappear in the output.
The FROM gives the relation(s) the query refers to The WHERE clause is a Boolean expression indicating
which tuples are of interest.
The query result is a relation
Note that the result relation is unnamed.
7/30/2019 Advance SQL
3/103
Example SQL Query
Relation schema:
Course (courseNumber, name, noOfCredits)
Query:Find all the courses stored in the database
Query in SQL:
SELECT
FROM Course;
Note:
means all the attributes in the relationsinvolved.
7/30/2019 Advance SQL
4/103
Example SQL Query
Relation schema:
Movie (title, year, length, filmType)
Query:Find the titles of all movies stored in the database
Query in SQL:
SELECT title
FROM Movie;
7/30/2019 Advance SQL
5/103
Example SQL Query
Relation schema:
Student (ID, firstName, lastName, address, GPA)
Query:Find the ID of every student who has GPA > 3
Query in SQL:
SELECT ID
FROM Student
WHERE GPA > 3;
7/30/2019 Advance SQL
6/103
Example SQL Query
Relation schema:
Student (ID, firstName, lastName, address, GPA)
Query:Find the ID and last name of every student with first name John,
who has GPA > 3
Query in SQL:
SELECT ID, lastName
FROM Student
WHERE firstName = John AND GPA > 3;
7/30/2019 Advance SQL
7/103
WHERE clause The expressions that may follow WHERE are conditions
Standard comparison operators includes{ =, , , = }
The values that may be compared include constants and
attributes of the relation(s) mentioned in FROM clause Simple expression
Aop Value
AopB
Where A, B are attributes and op is a comparison operator
We may also apply the usual arithmetic operators, +,-,*,/, etc. tonumeric values before comparing them
(year - 1930) * (year - 1930) 100
The result of a comparison is a Boolean value TRUE orFALSE
Boolean expressions can be combined by the logical operatorsAND, OR, and NOT
7/30/2019 Advance SQL
8/103
Example SQL Query
Relation schema:
Movie (title, year, length, filmType)
Query:Find the titles of all color movies produced in 1990
Query in SQL:
SELECT title
FROM Movie
WHERE filmType = color AND year = 1990;
7/30/2019 Advance SQL
9/103
Example SQL Query
Relation schema:Movie (title, year, length, filmType)
Query:Find the titles of all color movies that are either made after 1970 or
are less than 90 minutes long
Query in SQL:SELECT title
FROM Movie
WHERE (year > 1970 OR length < 90) ANDfilmType = color;
Note on precedence rules:
AND takes precedence overOR, andNOT takes recedence over both
7/30/2019 Advance SQL
10/103
Products and Joins
SQL has a simple way to couple relations in one query
list each relevant relation in the FROM clause
All the relations in the FROM clause are coupled throughCartesian product (, in algebra)
7/30/2019 Advance SQL
11/103
Cartesian Product
From Set Theory:
The Cartesian Product of two sets R and S is the
set ofall pairs (a, b) such that: a R and b S. Denoted as R S
Note:
In general, R S S R
7/30/2019 Advance SQL
12/103
ExampleInstance S:Instance R:
R x S:
B C D
2 5 6
4 7 8
9 10 11
A B
1 2
3 4
A R.B S.B C D
1 2 2 5 6
1 2 4 7 8
1 2 9 10 11
3 4 2 5 6
3 4 4 7 8
3 4 9 10 11
7/30/2019 Advance SQL
13/103
ExampleInstance of Course:Instance of Student:
SELECT FROM Student, Course;ID firstName lastName GPA Address courseNumber name noOfCredits
111 Joe Smith 4.0 45 Pine av. Comp352 Data structures 3
111 Joe Smith 4.0 45 Pine av. Comp353 Databases 4
222 Sue Brown 3.1 71 Main st. Comp352 Data structures 3
222 Sue Brown 3.1 71 Main st. Comp353 Databases 4
333 Ann Johns 3.7 39 Bay st. Comp352 Data structures 3
333 Ann Johns 3.7 39 Bay st. Comp353 Databases 4
ID firstName lastName GPA Address
111 Joe Smith 4.0 45 Pine av.
222 Sue Brown 3.1 71 Main st.
333 Ann Johns 3.7 39 Bay st.
courseNumber name noOfCredits
Comp352 Data structures 3
Comp353 Databases 4
7/30/2019 Advance SQL
14/103
ExampleInstance of Course:Instance of Student:
SELECT ID, courseNumberFROM Student, Course;
ID firstName lastName GPA Address
111 Joe Smith 4.0 45 Pine av.
222 Sue Brown 3.1 71 Main st.
333 Ann Johns 3.7 39 Bay st.
courseNumber name noOfCredits
Comp352 Data structures 3
Comp353 Databases 4
ID courseNumber
111 Comp352
111 Comp353
222 Comp352
222 Comp353
333 Comp352
333 Comp353
7/30/2019 Advance SQL
15/103
Example
Relation schemas:
Student (ID, firstName, lastName, address, GPA)
Ugrad (ID, major) Query:
Find all information available about every undergraduate student
We can try to compute the Cartesian product ( )
SELECT FROM Student, Ugrad;
7/30/2019 Advance SQL
16/103
ExampleInstance of Ugrad:Instance of Student:
SELECT FROM Student, Ugrad;ID firstName lastName GPA Address ID major
111 Joe Smith 4.0 45 Pine av. 111 CS
111 Joe Smith 4.0 45 Pine av. 333 EE
222 Sue Brown 3.1 71 Main st. 111 CS
222 Sue Brown 3.1 71 Main st. 333 EE
333 Ann Johns 3.7 39 Bay st. 111 CS
333 Ann Johns 3.7 39 Bay st. 333 EE
ID firstName lastName GPA Address
111 Joe Smith 4.0 45 Pine av.
222 Sue Brown 3.1 71 Main st.
333 Ann Johns 3.7 39 Bay st.
ID major
111 CS
333 EE
Which tuples should
be in the query result
andwhich shouldnt?
7/30/2019 Advance SQL
17/103
ExampleInstance of Ugrad:Instance of Student:
SELECT FROM Student, Ugrad
WHERE Student.ID = Ugrad.ID;
ID firstName lastName GPA Address ID major
111 Joe Smith 4.0 45 Pine av. 111 CS
333 Ann Johns 3.7 39 Bay st. 333 EE
ID firstName lastName GPA Address
111 Joe Smith 4.0 45 Pine av.
222 Sue Brown 3.1 71 Main st.
333 Ann Johns 3.7 39 Bay st.
ID major
111 CS
333 EE
7/30/2019 Advance SQL
18/103
Joins in SQL
The above query is an example ofJoin operation
There are various kinds of joins and we will study them
later in detail To join relations R1,,Rn in SQL:
List all these relations in the FROM clause
Express the conditions in the WHERE clause in order to get the
desired join
7/30/2019 Advance SQL
19/103
Joining Relations
Relation schemas:
Movie (title, year, length, filmType)
Owns (title, year, studioName) Query:
Findtitle, length, andstudio nameof every movie
Query in SQL:
SELECT Movie.title, Movie.length, Owns.studioName
FROM Movie, Owns
WHERE Movie.title = Owns.titleANDMovie.year= Owns.year;
Is Owns in Owns.studioName necessary?
7/30/2019 Advance SQL
20/103
Joining Relations
Relation schemas:
Movie (title, year, length, filmType)
Owns (title, year, studioName) Query:
Find the title and length of every movie produced by Disney
Query in SQL:
SELECTMovie.title, length
FROM Movie, Owns
WHERE Movie.title = Owns.titleANDMovie.year= Owns.yearANDstudioName = Disney;
7/30/2019 Advance SQL
21/103
Joining Relations Relation schemas:
Movie (title, year, length, filmType)Owns (title, year, studioName)
StarsIn (title, year, starName) Query:
Find the title and length of each movie with Julia Roberts,produced by Disney
Query in SQL:SELECT Movie.title, Movie.lengthFROM Movie, Owns, StarsInWHERE Movie.title = Owns.titleANDMovie.year= Owns.year
ANDMovie.title = StarsIn.titleANDMovie.year= StarsIn.year
ANDstudioName = Disney ANDstarName = Julia Roberts;
7/30/2019 Advance SQL
22/103
Example
title year starName
T1 1990 JR
T2 1991 JR
title year studioName
T1 1990 Disney
T2 1991 MGM
title year length filmTyp
e
T1 1990 124 color
T2 1991 144 color
SELECT Movie.title, Movie.lengthFROM Movie, Owns, StarsInWHERE Movie.title = Owns.title AND Movie.year = Owns.yearAND
Movie.title = StarsIn.title AND Movie.year = StarsIn.yearAND
studioName = Disney ANDstarName = Julia Roberts;
title length
T1 124
MovieOwns
StarsIn
7/30/2019 Advance SQL
23/103
Example
Relation schemas:Movie (title, year, length, filmType, studioName, producerC#)Exec (name, address, cert#, netWorth)
Query:Find thenameof theproducerof Star Wars
Query in SQL:
SELECTExec.nameFROM Movie, Exec
WHERE Movie.title = Star WarsAND
Movie.producerC# = Exec.cert#;
7/30/2019 Advance SQL
24/103
Example
Relation schemas:Movie (title, year, length, filmType, studioName, producerC#)Exec (name, address, cert#, netWorth)
Query:Find the nameof the producer of Star Wars
Query with Subquery:SELECTname
FROM Exec
WHERE cert# =( SELECT producerC#
FROM Movie
WHERE title =Star Wars);
7/30/2019 Advance SQL
25/103
Example Relation schemas:
Movie(title, year, length, filmType, studioName, producerC#)Exec(name, address, cert#, netWorth)
StarsIn(title, year, starName) Query: Find the names of the producers ofHarrison Fords movies Query in SQL:
SELECTnameFROM ExecWHEREcert# IN(SELECTproducerC#
FROM MovieWHERE (title, year)IN(SELECTtitle, year
FROM StarsIn
WHEREstarName= Harrison Ford));
7/30/2019 Advance SQL
26/103
Example Relation schemas:
Movie(title, year, length, filmType, studioName, producerC#)Exec(name, address, cert#, netWorth)
StarsIn(title, year, starName) Query:Find names of the producers of Harrison Fords movies
Query in SQL:SELECT Exec.name
FROM Exec, Movie, StarsInWHERE Exec.cert# = Movie.producerC# AND
Movie.title = StarsIn.title ANDMovie.year= StarsIn.yearAND
starName = Harrison Ford;
7/30/2019 Advance SQL
27/103
Correlated Subqueries
Relation schema:Movie(title, year, length, filmType, studioName, producerC#)
Query:Find movie titles that appear more than once
Query in SQL:SELECT title
FROM Movie OldWHERE year< ANY (SELECT year
FROM Movie
WHERE title = Old.title);
Note the scopes of the variables in this query.
7/30/2019 Advance SQL
28/103
Correlated Subqueries Query in SQL
SELECT title
FROM Movie Old
WHERE year ANY (SELECT yearFROM Movie
WHERE title = Old.title);
The condition in the outerWHERE is true only if there is a movie with samtitle as Old.title that has a lateryear The query will produce a title one fewer times than there are movies with that title
What would be the result if we used , instead of ? For a movie title appearing 3 times, we would get 3 copies of the title in the output
7/30/2019 Advance SQL
29/103
Aggregation in SQL
SQL provides five operators that apply to a column ofa relation and produce some kind of summary
These operators are called aggregations These operators are used by applying them to a
scalar-valued expression, typically a column name, ina SELECTclause
7/30/2019 Advance SQL
30/103
Aggregation Operators SUM
the sum of values in the column
AVG
the average of values in the column
MIN
the least value in the column
MAX the greatest value in the column
COUNT
the number of values in the column, including the duplicates, unlessthe keyword DISTINCT is used explicitly
7/30/2019 Advance SQL
31/103
Example
Relation schema:Exec(name, address, cert#, netWorth)
Query:
Find the average net worth of all movie executives Query in SQL:
SELECTAVG(netWorth)
FROM Exec; The sum of all values in the column netWorth divided by
the number of these values
In general, if a tuple appears n times in a relation, it will be
counted n times when computing the average
7/30/2019 Advance SQL
32/103
Example
Relation schema:Exec (name, address, cert#, netWorth)
Query:How many tuples are there in the Exec relation?
Query in SQL:SELECTCOUNT(*)
FROM Exec;
The use of* as a parameter is unique to COUNT;
using * does not make sense for other aggregation operations
7/30/2019 Advance SQL
33/103
Example
Relation schema:Exec (name, address, cert#, netWorth)
Query:How many different names are there in the Exec relation?
Query in SQL:SELECTCOUNT (DISTINCT name)
FROM Exec;
In query processing time, the system first eliminates the duplicatesfrom column name, and then counts the number of values there
7/30/2019 Advance SQL
34/103
Aggregation -- Grouping
Often we need to consider the tuples in an SQL query ingroups, with regard to the value of some other column(s)
Example: suppose we want to compute:
Total length in minutes of movies produced by each studio:Movie(title, year, length, filmType, studioName, producerC#)
We must group the tuples in the Movie relation according totheir studio, and get the sum of the length values within eachgroup; the result would be something like:
studio SUM(length)
Disney 12345
MGM 54321
7/30/2019 Advance SQL
35/103
Aggregation - Grouping
Relation schema:Movie(title, year, length, filmType, studioName, producerC#)
Query:What is the total length in minutes produced by each studio? Query in SQL:
SELECT studioName, SUM(length)
FROM Movie
GROUP BY studioName;
Whatever aggregation used in the SELECT clause will be appliedonly within groups
Only those attributes mentioned in the GROUP BY clause mayappear unaggregated in the SELECT clause
Can we use GROUP BY without using aggregation? (Yes/No)
7/30/2019 Advance SQL
36/103
Aggregation -- Grouping
Relation schema:Movie(title, year, length, filmType, studioName, producerC#)
Exec(name, address, cert#, netWorth)
Query:For each producer (name), list the total length of the films produced
Query in SQL:SELECT Exec.name, SUM(Movie.length)
FROM Exec, Movie
WHERE Movie.producerC# = Exec.cert#
GROUP BY Exec.name;
7/30/2019 Advance SQL
37/103
Aggregation HAVING clause
We might be interested in not all but some groups of tuplesthat satisfy certain conditions
We can follow a GROUP BY clause with a HAVING clause
HAVING is followed by some conditions about the group
We can notuse a HAVING clause without GROUP BY
7/30/2019 Advance SQL
38/103
Aggregation HAVING clause Relation schema:
Movie (title, year, length, filmType, studioName, producerC#)
Exec(name, address, cert#, netWorth)
Query:For those producers who made at least one film prior to 1930, list thetotal length of the films produced
Query in SQL:SELECT Exec.name, SUM(Movie.length)FROM Exec, Movie
WHERE producerC# = cert#
GROUP BY Exec.name
HAVING MIN(Movie.year) 1930;
7/30/2019 Advance SQL
39/103
Aggregation HAVING clause This query chooses the group based on the property of the group
SELECT Exec.name, SUM(Movie.length)FROM Exec, MovieWHERE producerC# = cert#GROUP BY Exec.nameHAVING MIN(Movie.year) < 1930;
This query chooses the movies based on the property of each movie tuple
SELECT Exec.name, SUM(Movie.length)FROM Exec, MovieWHERE producerC# = cert# AND Movie.year < 1930GROUP BY Exec.name;
Note the difference!
7/30/2019 Advance SQL
40/103
Order By The SQL statements/queries we looked at so far return an unordered
relation/bag(except when using ORDER BY)
Movie (title, year, length, filmType, studioName, producerC#)
SELECT Exec.name, SUM(Movie.length)
FROM Exec, Movie
WHERE producerC# = cert#
GROUP BY Exec.name
HAVING MIN(Movie.year) < 1930
ORDER BY Exec.name ASC;
In general:
ORDER BY A1 ASC, B DESC, C ASC;
7/30/2019 Advance SQL
41/103
Database Modifications SQL & Database Modifications?
Next we will look at SQL statements that do not return something,but ratherchange the state of the database
There are three types of such SQL statements/transactions: Insert tuples into a relation
Delete certain tuples from a relation
Update values of certain components of certain existing tuples
We refer to these types of operations collectively as databasemodifications, and refer to such requests astransactions
7/30/2019 Advance SQL
42/103
Insertion The insertion statement consists of:
The keyword INSERT INTO
The name of a relation R
A parenthesized list of attributes of the relation R The keyword VALUES
A tuple expression, that is, a parenthesized list of concrete values,one for each attribute in the attribute list
The form of an insert statement:
INSERTINTOR(A1, An)VALUES(v1, vn);
A tuple is created and added, where vi is the value ofattribute
Ai, fori=1,2,,n
7/30/2019 Advance SQL
43/103
Insertion
Relation schema:StarsIn (title, year, starName)
Update the database:Add Sydney Greenstreet to the list of stars ofThe Maltese Falcon
In SQL:
INSERT INTO StarsIn (title,year, starName)
VALUES(The Maltese Falcon, 1942, Sydney Greenstreet);
Another formulation of this query:
INSERT INTO StarsIn
VALUES(The Maltese Falcon, 1942, Sydney Greenstreet);
7/30/2019 Advance SQL
44/103
Insertion The previous insertion statement was very simple
It added only one tuple into a relation
Instead of using explicitvalues for one tuple, we can
compute a set of tuples to be inserted using a subquery This subquery replaces the keyword VALUES and the tuple
expression in the INSERT statement
7/30/2019 Advance SQL
45/103
Insertion
Database schema:Studio(name, address, presC#)
Movie(title, year, length, filmType, studioName, producerC#)
Update the database:Add to Studio, all studio names mentioned in the Movie relation
If the list of attributes does not include all attributes of relationR, then the tuple created has default values for the missingattributes
Since there is no way to determine an address or apresident for such a studio value, NULL will be used for theattributes address and presC#
7/30/2019 Advance SQL
46/103
Insertion
Database schema:Studio(name, address, presC#)
Movie(title, year, length, filmType, studioName, producerC#)
Update the database:Add to Studio, all studio names mentioned in the Movie relation
In SQL:
INSERT INTO Studio(name)SELECT DISTINCT studioName
FROM Movie
WHERE studioName NOT IN(SELECT name
FROM Studio);
7/30/2019 Advance SQL
47/103
Deletion A deletion statement consists of :
The keyword DELETE FROM
The name of a relation R
The keyword WHERE A condition
The form of a delete statement:
DELETE FROM RWHERE condition ; The effect of executing this statement is that every tuple in
relation Rsatisfying the condition will be deleted from R
Note: unlike the INSERT, we need a WHERE clause here
7/30/2019 Advance SQL
48/103
Deletion
Relation schema:StarsIn(title, year, starName)
Update:Delete: Sydney Greenstreet was a star in The Maltese Falcon
In SQL:DELETE FROM StarIn
WHEREtitle = The Maltese Falcon ANDstarName = Sydney Greenstreet;
7/30/2019 Advance SQL
49/103
Deletion
Relation schema:Exec(name, address, cert#, netWorth)
Update:Delete every movie executive whose net worth is < $10,000,000
In SQL:DELETE FROM Exec
WHERE netWorth < 10,000,000;
Anything wrong here?!
7/30/2019 Advance SQL
50/103
Deletion
Relation schema:Studio(name, address, presC#)
Movie(title, year, length, filmType, studioName, producerC#)
Update:Delete from Studio, all movies produced by studios not mentioned in
Movie (i.e., we dont want to have non-producing studios)
In SQL:DELETE FROM Studio
WHERE name NOT IN (SELECT StudioNameFROM Movie);
7/30/2019 Advance SQL
51/103
Defining Database Schema
To create a table in SQL:
CREATE TABLEname (list of elements);
Principal elements are attributes and theirtypes, but key
declarations and constraints may also appear
Example:
CREATE TABLE Star (
name CHAR(30),address VARCHAR(255),
genderCHAR(1),
birthdate DATE
);
7/30/2019 Advance SQL
52/103
Defining Database Schema
To delete a table:
DROP TABLEname;
Example:DROP TABLE Star;
7/30/2019 Advance SQL
53/103
Data types
INT orINTEGER
REAL orFLOAT
DECIMAL(n, d) -- NUMERIC(n, d) DECIMAL(6, 2), e.g., 0123.45
CHAR(n)/BIT(B) fixed length character/bit string Unused part is padded with the "pad character, denoted as
VARCHAR(n) / BIT VARYING(n) variable-length strings upto n characters
7/30/2019 Advance SQL
54/103
Data types (contd) Time:
SQL2 format is TIME 'hh:mm:ss[.ss...]'
Date:
SQL2 format is DATE yyyy-mm-dd (m =0 or 1)
The default format of date in Oracle is dd-mon-yy
Example:
CREATE TABLE Days(d DATE);INSERT INTO Days VALUES(08-aug-02);
Oracle function to_date converts a specified format intodefault format, e.g., INSERT INTO Days VALUES (to_date('2002-08-08', 'yyyy-mm-dd'));
7/30/2019 Advance SQL
55/103
Altering Relation Schemas Adding Columns
Add an attribute to a relation R with
ALTER TABLE R ADD column declaration ;
Example: Add attribute phone to table Star ALTER TABLE StarADD phone CHAR(16);
Removing Columns
Remove an attribute from a relation R using DROP: ALTER TABLE R DROP COLUMN column_name ;
Example: Remove column phone from Star
ALTER TABLE StarDROP COLUMN phone;
Note: Cant drop if it is the only column
7/30/2019 Advance SQL
56/103
Attribute Properties
We can assert that the value of an attribute to be:
NOT NULL
every tuple must have a real (non-null) value for this attribute
DEFAULTvalue
Null is the default value for every attribute A
The owner of the relation can define some other value as the
default, instead of NULL
7/30/2019 Advance SQL
57/103
Attribute PropertiesCREATE TABLE Star (
nameCHAR(30),
addressVARCHAR(255),
genderCHAR(1) DEFAULT?,birthdateDATE NOT NULL);
Example: Add an attribute with a default value:
ALTER TABLE StarADDphoneCHAR(16) DEFAULTunlisted;
INSERT INTO Star(name, birthdate) VALUES (Sally ,0000-00-00)name address gender birthdate phoneSally NULL ? 0000-00-00 unlisted
INSERT INTO Star(name, phone) VALUES (Sally,333-2255)
this insertion could not be performed since the value forbirthdate is
not given and it is disallowed to use NULL as the default
7/30/2019 Advance SQL
58/103
Schema Refinement
Functional Dependencies:Essential to Normalization
Theory
7/30/2019 Advance SQL
59/103
Functional Dependencies
Consider the relation:Movie (title, year, length, filmType, studioName, starName)
What are the functional dependencies?title, year length
title, year filmType
title, year studioName title, year length, filmType, studioName
Note that the FD title, year starName does not hold
7/30/2019 Advance SQL
60/103
Logical Implication: Reasoning with FDs
Consider relation R(A, B,C)with the set of FDs:
F = {A B, B C}
We can deduce from F that A C also holds on R.How? Apply the definition
To detect possible redundancy, is it necessary to
consider all the given FDs?As shown above, there might be some additional hidden
(nontrivial) FDs implied by a given set of FDs
7/30/2019 Advance SQL
61/103
Logical Implication (Contd)
ConsiderR(A1,A2,A3,A4,A5) with FDs:
F = { A1 A2,A2 A3, A2A3 A4,A2A3A4 A5 }
Prove that FA5 A
1Solution method: Provide a counter-example; give a relation
instance r of R that satisfies every FD in F but notA5 A1
A1 A2 A3 A4 A5t1: 0 1 1 1 1
t2: 1 1 1 1 1 A desired instance rofR.
7/30/2019 Advance SQL
62/103
Closure of a set of FDs
Defn: The closure of F, denoted F+, is the set ofFDs that are logically implied by F
How can we compute F+? Definitely, F+ includes Fbut possibly more FDs
We need to know how to reasonabout FDs
7/30/2019 Advance SQL
63/103
Equivalence
Defn: Suppose R is a relation schema, and S and T aresets of functional dependencies on R.
T and S are equivalent (ST)
Example: Suppose R = {A,B,C}, and
S = {A B, B C, A C}
T = {A B, B C}Can show that ST
7/30/2019 Advance SQL
64/103
Armstrongs Axioms [1974]
R is a relation schema, and X, Y and Z are subsets ofR.
Reflexivity
IfY X, then X Y(trivial FDs)
Augmentation
IfX Y, then XZ YZ, for every Z
Transitivity
IfX Y and Y Z, then X Z
These are sound and complete inferencerules for FDs
7/30/2019 Advance SQL
65/103
Additional rules / axioms
Other useful rules that follow from Armstrong Axioms
Union (Combining) Rule
IfX Y and X Z, then X YZ Decomposition (Splitting) Rule
IfX YZ, then X Y and X Z
Pseudotransitivity Rule IfX Y and WY Z, then XW Z
NOTE: X, Y, Z, and W are sets of attributes
7/30/2019 Advance SQL
66/103
ExampleDiscovering hidden FDs
Consider a relation schema R = {A, B, C, G, H, I} withFDs F = { A B, A C, CG H, CG I, B H }
Using these rules, we can derive the following FDs Since A B and B H, then A H, by transitivity
Since CG H and CG I, then CG HI, by union
Since A C then AG CG, by augmentation
Now, since AG CG and CG I, then AG I, bytransitivity (Do AG H)
Many trivial dependencies can be derived(!)by augmentation
7/30/2019 Advance SQL
67/103
Computing the Closure of Attributes
Given a set F of FDs and a set X of attributes, how do wecompute the closure ofX w.r.t. F?
Starting with X, we repeatedly expand the set, by adding the right
hand side (RHS) of every FD, once we have included its LHD inthe set.
When the set cannot be expanded anymore, we have obtained
the result, X+
7/30/2019 Advance SQL
68/103
An Algorithm to Compute X+ under FX + X (initialization step)
repeat
for each FD W Z in Fdo:
if W X+ then
X + X + Z // include Z to the result
untilX+ does not change anymore
Complexity question: Inthe worst case, how many timesthe repeat statement will be executed?
7/30/2019 Advance SQL
69/103
Example
Consider a relation scheme R = { A, B, C, D, E, F } withthe set ofFDs { AB C, BC AD, D E, CF B }
Compute{A, B}+
Execution result at each iteration:
X+ = {A, B}
Using AB C, we get X+ = {A, B, C}
Using BC AD, we get X+
= {A, B, C, D} Using D E, we get X+ = {A, B, C,D, E}
No more change to X+ is possible.
X+= {A, B}+= {A, B, C, D, E}
Does the order in which FDs appear matter in this computation?
7/30/2019 Advance SQL
70/103
Implication Problem Revisited
Is a given FD X Y implied by a set Fof FDs?
That is to ask whetherX Y is in F+?
How to answer this question?An algorithm for this:
Compute X+ underF, and
Check ifY is in X+
If yes, then F X Y
Otherwise F X Y
7/30/2019 Advance SQL
71/103
Example
Consider a relation schema R = { A, B, C, D, E, F } withthe FDs F = { AB C, BC AD, D E, CF B }
True/false:F AB D? Two steps:
Compute X+= {A, B}+= {A, B, C, D, E}
Check ifD X+
Yes, AB D is implied by F
7/30/2019 Advance SQL
72/103
Example
Consider a relation scheme R = { A, B, C, D, E, F } withFDs F = { AB C, BC AD, D E, CF B }
Is D A implied by F? Two steps:
Compute X+= {D}+= {D, E}
Check ifA X+
Since A is not in {D, E}, we conclude that D Ais notimplied byF
7/30/2019 Advance SQL
73/103
Closures and Keys
When will X+ include all attributes of a relation R?
Clearly, the answer is yes iffXis a (superkey) key ofR
To check ifX
is a candidate key ofR, we may check if:
1. X+ contains all attributes ofR, i.e., X+ = R, and
2.No proper subset S ofX has this property, i.e., A X, {XA}+ R
Knowledge about keys is essential for Normal forms
7/30/2019 Advance SQL
74/103
Canonical Cover
Number of iterations of the algorithm for computing the
closure of a set of attributes depends on the number of
FDs in F
The same will be observed for other algorithms that we will study
(such as the decomposition algorithms)
Can we minimize F?
7/30/2019 Advance SQL
75/103
Covers FDs can be represented in several different ways without changing
the set of legal/valid instances of the relation
Let F and Gbe sets of FDs. We say GfollowsfromF, if every
relation instance that satisfies F also satisfies G. In symbols: FG.
We may also say: Gis implied byForGis covered byF.
If both FG and GF hold, then we say that G and F are
equivalent and denote this by FG Note that FG iffF+G+
IfFG we may also say: G is a coverofF and vice versa
7/30/2019 Advance SQL
76/103
Canonical Cover
Let Fbe a set of FDs. A canonical / minimal cover
ofF is a set Gof FDs that satisfies the following:
1. G is equivalent to F; that is, GF
2. G is minimal; that is, if we obtain a set Hof FDs from
Gby deleting one or more of its FDs, or by deleting
one or more attributes from some FD in G, then F H
3. Every FD in G is of the form X A, where A is a
single attribute
7/30/2019 Advance SQL
77/103
Canonical Cover
A canonical coverG is minimal in two respects:
1. Every FD in G is required in order forG to be equivalent to F
2. Every FD in G is as small as possible, that is, each attribute on the left hand side is necessary.
Recall: the RHS of every FD in G is a single attribute
7/30/2019 Advance SQL
78/103
Computing Canonical Cover
Given a set Fof FDs, how to compute a canonical coverG ofF?
Step 1:Put the FDs in the standard form
Initialize G:=F
Replace each FD X A1A2Ak inG with XA1, XA2, , XAk Step 2: Minimize the left hand side of each FD
E.g., for each FD AB C in G, check if A or B on the LHS is redundant ,
i.e.,(G {AB C } {A C })+F+?
Step 3: Deleteredundant FDs
For each FD X A in G, check if it is redundant, i.e., whether
(G {X A })+ F+?
7/30/2019 Advance SQL
79/103
Computing Canonical Cover
R = { A, B, C, D, E, H}
F = { A B, DE A, BC E, AC E, BCD A,AED B }
Step oneput FDs in the standard form
All present FDs are in the standard form
G = {A B, DE A, BC E, AC E, BCD A, AED B}
7/30/2019 Advance SQL
80/103
Computing Canonical Cover
Step two Check for left redundancy
For every FD X A in G, check if the closure of a subset ofX
determines A. If so, remove redundant attribute(s) from X
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
81/103
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
A B
obviously OK (no left redundancy)
DE A
D+= D
E+= E
OK (no left redundancy)
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
82/103
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
BC E
B+= B
C+= C
OK (no left redundancy)
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
83/103
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
AC E
A+= AB
C+= C
OK (no left redundancy)
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
84/103
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
BCD A
B+
= B C+= C
D+= D
BC+
= BCE CD+= CD
BD+= BD
OK (no left redundancy)
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
85/103
Computing Canonical Cover
G = { A B, DE A, BC E, AC E, BCD A, AED B }
AED B
A+
= AB
E & D are redundant
we can remove themfrom AED B
G = { A B, DE A, BC E, AC E, BCD A, A B }
G = { DE A, BC E, AC E, BCD A, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
86/103
Computing Canonical Cover
Step 3Check for redundant FDs
For every FD X A in G
Remove X A from G; call the result G Compute X+underG
IfA X+, then X A is redundant and hence we remove
the FD X A from G (that is, we rename G to G)
R = { A, B, C, D, E, H}
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
87/103
Computing Canonical Cover
G = { DE A, BC E, AC E, BCD A, A B }
Remove DE A from G
G = { BC E, AC E, BCD A, A B }
Compute DE+underG DE+= DE (computed underG)
Since A DE, the FD DE A is not redundant
G = { DE A, BC E, AC E, BCD A, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
88/103
Computing Canonical Cover
G = { DE A, BC E, AC E, BCD A, A B }
Remove BC E from G
G = { DE A, AC E, BCD A, A B }
Compute BC+underG BC+= BC
BC E is not redundant
G = { DE A, BC E, AC E, BCD A, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
89/103
Computing Canonical Cover
G = { DE A, BC E, AC E, BCD A, A B }
Remove AC E from G
G = { DE A, BC E, BCD A, A B }
Compute AC+underG AC+= ACBE
Since E ACBE, AC E is redundant remove it from G
G = { DE A, BC E, BCD A, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
90/103
Computing Canonical Cover
G = { DE A, BC E, BCD A, A B }
Remove BCD A from G
G = { DE A, BC E, A B }
Compute BCD+underG BCD+= BCDEA
This FD is redundant remove it from G
G = { DE A, BC E, A B }
R = { A, B, C, D, E, H }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
91/103
Computing Canonical Cover
G = { DE A, BC E,A B }
Remove A B from G
G = { DE A, BC E }
Compute A+underG A+= A
This FD is not redundant (Another reason why this is true?)
G = { DE A, BC E, A B }
G is a minimal cover forF
R = { A, B, C, D, E, F }
F = { A B, DE A, BC E, AC E, BCD A,AED B }
7/30/2019 Advance SQL
92/103
Several Canonical Covers Possible?
Relation R={A,B,C} with F = {A B, A C,B A, B C, C B, C A}
Several canonical covers exist G = {A B, B A, B C, C B}
G = {A B, B C, C A}
A B
C
A B
C
A B
C
Can you find more ?
7/30/2019 Advance SQL
93/103
How to Deal with Redundancy?
Name Address RepresentingFirm SpokesPerson
Carrie Fisher 123 Maple Star One Joe Smith
Harrison Ford 789 Palm dr. Star One Joe SmithMark Hamill 456 Oak rd. Movies & Co Mary Johns
Relation Instance:
Relation Schema:
Star(name, address, representingFirm, spokesPerson)
We can decompose this relation into two smaller relations
F = { name address, representingFirm, spokePerson,
representingFirm spokesPerson }
7/30/2019 Advance SQL
94/103
How to Deal with Redundancy?
Relation Schema:
Star(name, address, representingFirm, spokesperson)
Decompose this relation into the following relations:
Star(name, address, representingFirm)with F1={ name address, representingFirm }
andFirm (representingFirm, spokesPerson)with F2={ representingFirm spokesPerson }
F = { representingFirm spokesPerson }
7/30/2019 Advance SQL
95/103
7/30/2019 Advance SQL
96/103
Decomposition
A decomposition of a relation schema R consists of replacing R bytwo or more non-empty relation schemas such that each one is asubset ofR and together they include all attributes ofR. Formally,
R = {R1,,Rm} is a decomposition if all conditions below hold:
(0)Ri, for all i in {1,,m}(1)R1 Rm= R
(2)Ri Rj, for different i and j in {1,,m} When m = 2, the decomposition R = { R1, R2 } is called binary
Not every decomposition of R is desirable Properties of a decomposition?
(1) Lossless-join this is a must
(2) Dependency-preserving this is desirable
Explanation follows
7/30/2019 Advance SQL
97/103
ExampleRelation Instance: Decomposed into:
B C
2 3
2 5
A B C
1 2 3
4 2 5
A B
1 2
4 2
To recover information, we join the relations:
A B C
1 2 3
4 2 5
4 2 3
1 2 5
Why do we have new tuples?
7/30/2019 Advance SQL
98/103
Lossless-Join Decomposition
R is a relation schema and F is a set of FDs overR.
A binary decomposition ofR into relation schemas R1 andR2with attribute sets X andY is said to be a lossless-joindecomposition with respect to F, if for every instance rofR that satisfies F, we have X( r) Y( r) = r
Thm: Let R be a relation schema and Fa set of FDs on R.
A binary decomposition ofR into R1 and R2with attributesets X andY is lossless iff X Y XorX Y Y,i.e., this binary decomposition is lossless if the commonattributes ofX andY form a key of R1orR2
7/30/2019 Advance SQL
99/103
Example: Lossless-joinRelation Instance: Decomposed into:
B C
2 3
A B C
1 2 3
4 2 3
A B
1 2
4 2
To recover the original relation r, we join the two relations:
A B C
1 2 3
4 2 3
F = { B C }
No new tuples !
7/30/2019 Advance SQL
100/103
Example: Dependency PreservationRelation Instance:
Decomposed into:
B C D
2 5 7
3 6 8
A B
1 2
4 3
F = { B C, B D, A D }A B C D1 2 5 7
4 3 6 8
Can we enforce A D?How ?
7/30/2019 Advance SQL
101/103
Dependency-Preserving Decomposition
A dependency-preserving decomposition allows us to enforceevery FD, on each insertion or modification of a tuple, byexamining just one single relation instance
Let R be a relation schema that is decomposed into two schemaswith attribute sets X andY, and let Fbe a set of FDs overR. Theprojection of F on X (denoted by FX) is the set of FDs in F
+ thatinvolve only attributes in X
Recall that a FD U V in F+
is in FX if all the attributes in Uand V are in X;In this case wesay this FD is relevant to X
The decomposition of < R,F > into two schemas with attribute sets
X andY is dependency-preserving if( FX FY )+F+
7/30/2019 Advance SQL
102/103
Normal Forms
Given a relation schema R, we must be able to determinewhether it is good or we need to decompose it intosmaller relations, and if so, how?
To address these issues, we need to study normal forms
If a relation schema is in one of these normal forms, weknow that it is in some good shape in the sense that
certain kinds of problems (related to redundancy) cannot arise
7/30/2019 Advance SQL
103/103
1NF2NF3NFBCNF
Normal Forms
The normal forms based on FDs are First normal form (1NF)
Second normal form (2NF)
Third normal form (3NF) Boyce-Codd normal form (BCNF)
These normal forms have increasingly restrictiverequirements