Date post: | 14-Feb-2018 |
Category: |
Documents |
Upload: | mikkezavala |
View: | 227 times |
Download: | 0 times |
of 88
7/23/2019 DBMS Caracterization
1/88
Databases
Ken Moody
Computer LaboratoryUniversity of Cambridge, UK
Lecture notes by Timothy G. Griffin
Lent 2012
Ken Moody (cl.cam.ac.uk) Databases DB 2012 1 / 175
Lecture 01 : What is a DBMS?
DB vs. IR
Relational Databases
ACID properties
Two fundamental trade-offsOLTP vs. OLAP
Course outline
Ken Moody (cl.cam.ac.uk) Databases DB 2012 2 / 175
7/23/2019 DBMS Caracterization
2/88
Example Database Management Systems (DBMSs)
A few database examplesBanking : supporting customer accounts, deposits and
withdrawalsUniversity : students, past and present, marks, academic status
Business : products, sales, suppliers
Real Estate : properties, leases, owners, renters
Aviation : flights, seat reservations, passenger info, prices,payments
Aviation : Aircraft, maintenance history, parts suppliers, partsorders
Ken Moody (cl.cam.ac.uk) Databases DB 2012 3 / 175
Some observations about these DBMSs ...
They contains highly structured data that has been engineered tomodel somerestrictedaspect of the real world
Theysupport the activityof an organization in an essential way
They supportconcurrent access, both read and write
They often outlive their designersUsers need to know very little about the DBMS technology used
Well designed database systems are nearly transparent, just partof our infrastructure
Ken Moody (cl.cam.ac.uk) Databases DB 2012 4 / 175
7/23/2019 DBMS Caracterization
3/88
Databases vs Information Retrieval
Always askWhat problem am I solving?
DBMS IR systemexact query results fuzzy query resultsoptimized for concurrent updates optimized for concurrent readsdata models a narrow domain domain often open-endedgenerates documents (reports) search existing documentsincrease control over information reduce information overload
And of course there are many systems that combine elements of DBand IR.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 5 / 175
Still the dominant approach : Relational DBMSs
The problem : in 1970 you could notwrite a database application withoutknowing a great deal about thelow-level physical implementation ofthe data.
Codds radical idea [C1970]: giveusers a model of data and a
language for manipulating that datawhich is completely independent ofthe details of its physicalrepresentation/implementation.
This decouples development ofDatabase Management Systems(DBMSs) from the development ofdatabase applications (at least in an
idealized world).This is the kind of abstraction at the heart of Computer Science!
Ken Moody (cl.cam.ac.uk) Databases DB 2012 6 / 175
7/23/2019 DBMS Caracterization
4/88
What services do applications expect from a DBMS?Transactions ACID properties (Concurrent Systems course)
Atomicity Either all actions are carried out, or none are
logs needed to undo operations, if needed
ConsistencyIf each transaction is consistent, and the database is
initially consistent, then it is left consistentApplications designers must exploit the DBMSs
capabilities.
Isolation Transactions are isolated, or protected, from the effects ofother scheduled transactions
Serializability, 2-phase commit protocol
Durability If a transactions completes successfully, then its effectspersist
Logging and crash recovery
These concepts should be familiar from Concurrent Systems andApplications.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 7 / 175
What constitutes a good DBMS application design?
Domain of Interest Domain of Interest
Database Database
real-world change
database update(s)
represent represent
At the very least, this diagram should commute!
Does your database design support all required changes?
Can an update corrupt the database?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 8 / 175
7/23/2019 DBMS Caracterization
5/88
Relational Database Design
Our tools
Entity-Relationship (ER) modeling high-level,diagram-baseddesignRelational modeling formal modelnormal formsbased
on Functional Dependencies (FDs)
SQL implementation Where the rubber meets the road
The ER and FD approaches are complementaryER facilitates design by allowing communication with domainexpertswho may know little about database technology.
FD allows us formally explore general design trade-offs. Such asA Fundamental Trade-off in Database Design:the more we
reducedata redundancy, the harder it is to enforce some types ofdata integrity. (An example of this is made precise when we lookat 3NF vs. BCNF.)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 9 / 175
ER Demo Diagram (Notation follows SKS book)1
Employee
NameNumber
ISA
Mechanic SalesmanDoes
RepairJobNumber
Description
CostParts
Work
Repairs Car
License
ModelYear
Manufacturer
Buys
Price
Date
Value
Sells
Date
Value
Commission
Client ID
Name PhoneAddress
buyerseller
1By Pvel Calado,http://www.texample.net/tikz/examples/entity-relationship-diagram
Ken Moody (cl.cam.ac.uk) Databases DB 2012 10 / 175
7/23/2019 DBMS Caracterization
6/88
A Fundamental Trade-off in Database
Implementation Query response vs. updatethroughputRedundancy is a Bad Thing.
One of the main goals of ER and FD modeling is to reduce data
redundancy. We seeknormalizeddesigns.A normalized database can support high update throughput andgreatly facilitates the task of ensuring semantic consistency anddata integrity.
Update throughput is increased because in a normalizeddatabase a typical transaction need only lock a few data items perhaps just one field of one row in a very large table.
Redundancy is a Good Thing.
A de-normalized database can greatly improve the response timeof read-only queries.
Selective and controlled de-normalization is often required inoperational systems.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 11 / 175
OLAP vs. OLTP
OLTP Online Transaction Processing
OLAP Online Analytical Processing
Commonly associated with terms like DecisionSupport, Data Warehousing, etc.
OLAP OLTPSupports analysis day-to-day operations
Data is historical currentTransactions mostly reads updates
optimized for query processing updatesNormal Forms not important important
Ken Moody (cl.cam.ac.uk) Databases DB 2012 12 / 175
7/23/2019 DBMS Caracterization
7/88
Example : Data Warehouse (Decision support)
business analysis queries
Extract
fast updates
Operational Database Data Warehouse
Ken Moody (cl.cam.ac.uk) Databases DB 2012 13 / 175
Example : Embedded databases
FIDO = Fetch Intensive Data Organization
Ken Moody (cl.cam.ac.uk) Databases DB 2012 14 / 175
7/23/2019 DBMS Caracterization
8/88
Example : Hinxton Bio-informatics
Ken Moody (cl.cam.ac.uk) Databases DB 2012 15 / 175
NoSQL Movement
TechnologiesKey-value store
Directed Graph Databases
Main memory stores
Distributed hash tables
ApplicationsFacebookGoogle
iMDB
...
Always remember to ask : What problem am I solving?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 16 / 175
7/23/2019 DBMS Caracterization
9/88
Term Outline
Lecture 02 The relational data model.
Lecture 03 Entity-Relationship (E/R) modelling
Lecture 04 Relational algebra and relational calculus
Lecture 05 SQLLecture 06 Case Study - Cancer registry for the NHS - challenges
Lecture 07 Schema refinement I
Lecture 08 Schema refinement II
Lecture 09 Schema refinement III and advanced design
Lecture 10 On-line Analytical Processing (OLAP)
Lecture 11 Case Study - Cancer registry for the NHS -
experiencesLecture 12 XML as a data exchange format
Ken Moody (cl.cam.ac.uk) Databases DB 2012 17 / 175
Recommended ReadingTextbooks
SKS Silberschatz, A., Korth, H.F. and Sudarshan, S. (2002).Database system concepts. McGraw-Hill (4th edition).
(Adjust accordingly for other editions)
Chapters 1 (DBMSs)
2 (Entity-Relationship Model)
3 (Relational Model)
4.1 4.7 (basic SQL)
6.1 6.4 (integrity constraints)
7 (functional dependencies and normal
forms)
22 (OLAP)
UW Ullman, J. and Widom, J. (1997). A first course indatabase systems. Prentice Hall.
CJD Date, C.J. (2004). An introduction to database systems.
Addison-Wesley (8th ed.).Ken Moody (cl.cam.ac.uk) Databases DB 2012 18 / 175
7/23/2019 DBMS Caracterization
10/88
Reading for thefunof it ...
Research Papers (Google for them)
C1970 E.F. Codd, (1970). "A Relational Model of Data for LargeShared Data Banks". Communications of the ACM.
F1977 Ronald Fagin (1977) Multivalued dependencies and anew normal form for relational databases. TODS 2 (3).
L2003 L. Libkin. Expressive power of SQL. TCS, 296 (2003).
C+1996 L. Colby et al. Algorithms for deferred view maintenance.SIGMOD 199.
G+1997 J. Gray et al. Data cube: A relational aggregationoperator generalizing group-by, cross-tab, and sub-totals(1997) Data Mining and Knowledge Discovery.
H2001 A. Halevy. Answering queries using views: A survey.VLDB Journal. December 2001.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 19 / 175
Lecture 02 : The relational data model
Mathematical relations and relational schema
Using SQL to implement a relational schema
Keys
Database query languages
The Relational Algebra
The Relational Calculi (tuple and domain)
a bit of SQL
Ken Moody (cl.cam.ac.uk) Databases DB 2012 20 / 175
7/23/2019 DBMS Caracterization
11/88
Lets start with mathematical relations
Suppose thatS1and S2are sets. The Cartesian product, S1 S2, isthe set
S1 S2={(s1, s2)| s1S1, s2S2}
A(binary) relation overS1 S2is any set rwithrS1 S2.
In a similar way, if we have nsets,
S1, S2, . . . , Sn,
then ann-ary relationris a set
rS1 S2 Sn={(s1, s2, . . . , sn)| siSi}
Ken Moody (cl.cam.ac.uk) Databases DB 2012 21 / 175
Relational Schema
LetXbe a set of kattribute names.
We will often ignore domains (types) and say thatR(X)denotes arelational schema.
When we writeR(Z, Y)we meanR(Z Y)andZ Y= .
u.[X] =v.[X]abbreviatesu.A1=v.A1 u.Ak=v.Ak.Xrepresents some (unspecified) ordering of the attribute names,A1, A2, . . . , Ak
Ken Moody (cl.cam.ac.uk) Databases DB 2012 22 / 175
7/23/2019 DBMS Caracterization
12/88
Mathematical vs. database relations
Suppose we have ann-tupletS1 S2 Sn. Extracting thei-thcomponent oft, say asi(t), feels a bit low-level.
Solution: (1) Associate a name, Ai(called anattribute name) witheach domainSi. (2) Instead of tuples, userecords sets of pairseach associating an attribute name Aiwith a value in domainSi.
A database relationRover the schemaA1:S1 A2:S2 An :Snis afiniteset
R {{(A1, s1), (A2, s2), . . . , (An, sn)} |siSi}
Ken Moody (cl.cam.ac.uk) Databases DB 2012 23 / 175
ExampleA relational schemaStudents(name: string,sid: string,age: integer)
A relational instance of this schema
Students = {{(name, Fatima), (sid, fm21), (age, 20)},{(name, Eva), (sid, ev77), (age, 18)},
{(name, James), (sid, jj25), (age, 19)}}
A tabular presentation
name sid age
Fatima fm21 20Eva ev77 18James jj25 19
Ken Moody (cl.cam.ac.uk) Databases DB 2012 24 / 175
7/23/2019 DBMS Caracterization
13/88
Key Concepts
Relational Key
SupposeR(X)is a relational schema with Z X. If for any recordsu
andvin any instance ofRwe haveu.[Z] =v.[Z] =u.[X] =v.[X],
thenZ is asuperkey forR. If no proper subset ofZ is a superkey, thenZis akey forR. We writeR(Z, Y)to indicate thatZ is a key forR(Z Y).
Note that this is asemanticassertion, and that a relation can have
multiple keys.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 25 / 175
Creating Tables in SQL
create table Students
(sid varchar(10),
name varchar(50),
age int);
-- insert record with attribute names
insert into Students set
name = Fatima, age = 20, sid = fm21;
-- or insert records with values in same order
-- as in create table
insert into Students values
(jj25 , James , 19),
(ev77 , Eva , 18);
Ken Moody (cl.cam.ac.uk) Databases DB 2012 26 / 175
7/23/2019 DBMS Caracterization
14/88
Listing a Table in SQL
-- list by attribute order of create table
mysql> select * from Students;
+------+--------+------+| sid | name | age |
+------+--------+------+
| ev77 | Eva | 18 |
| fm21 | Fatima | 20 |
| jj25 | James | 19 |
+------+--------+------+
3 rows in set (0.00 sec)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 27 / 175
Listing a Table in SQL
-- list by specified attribute order
mysql> select name, age, sid from Students;
+--------+------+------+
| name | age | sid |
+--------+------+------+
| Eva | 18 | ev77 || Fatima | 20 | fm21 |
| James | 19 | jj25 |
+--------+------+------+
3 rows in set (0.00 sec)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 28 / 175
7/23/2019 DBMS Caracterization
15/88
Keys in SQLAkeyis a set of attributes that will uniquely identify any record (row) ina table.
-- with this create table
create table Students
(sid varchar(10),name varchar(50),
age int,
primary key (sid));
-- if we try to insert this (fourth) student ...
mysql> insert into Students set
name = Flavia, age = 23, sid = fm21;
ERROR 1062 (23000): Duplicate
entry fm21 for key PRIMARY
Ken Moody (cl.cam.ac.uk) Databases DB 2012 29 / 175
What is a (relational) database query language?
Input : a collection of Output : a singlerelation instances relation instance
R1, R2, , Rk = Q(R1, R2, , Rk)
How can we expressQ?In order to meet Codds goals we want a query language that ishigh-level and independent of physical data representation.
There aremanypossibilities ...
Ken Moody (cl.cam.ac.uk) Databases DB 2012 30 / 175
7/23/2019 DBMS Caracterization
16/88
The Relational Algebra (RA)
Q ::= R base relation| p(Q) selection| X(Q) projection
| Q Q product| Q Q difference| Q Q union| Q Q intersection| M(Q) renaming
pis a simple boolean predicate over attributes values.
X= {A1, A2, . . . , Ak}is a set of attributes.
M={A1 B1, A2 B2, . . . , Ak Bk}is a renaming map.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 31 / 175
Relational Calculi
The Tuple Relational Calculus (TRC)
Q={t |P(t)}
The Domain Relational Calculus (DRC)
Q={(A1=v1, A2=v2, . . . , Ak=vk)| P(v1, v2, , vk)}
Ken Moody (cl.cam.ac.uk) Databases DB 2012 32 / 175
7/23/2019 DBMS Caracterization
17/88
The SQL standard
Origins at IBM in early 1970s.SQL has grown and grown through many rounds of
standardization : ANSI: SQL-86 ANSI and ISO : SQL-89, SQL-92, SQL:1999, SQL:2003,
SQL:2006, SQL:2008
SQL is made up of many sub-languages : Query Language Data Definition Language System Administration Language ...
Ken Moody (cl.cam.ac.uk) Databases DB 2012 33 / 175
Selection
R
A B C D
20 10 0 5511 10 0 74 99 17 2
77 25 4 0
=
Q(R)
A B C D
20 10 0 5577 25 4 0
RA Q=A>12(R)
TRC Q={t |tR t.A> 12}
DRC Q={{(A, a), (B, b), (C, c), (D, d)} |{(A, a), (B, b), (C, c), (D, d)} R a>12}
SQL select * from R where R.A > 12
Ken Moody (cl.cam.ac.uk) Databases DB 2012 34 / 175
7/23/2019 DBMS Caracterization
18/88
Projection
R
A B C D
20 10 0 5511 10 0 74 99 17 2
77 25 4 0
=
Q(R)
B C
10 099 1725 4
RA Q=B,C(R)
TRC Q={t | uR t.[B, C] =u.[B, C]}
DRC Q={{(B, b), (C, c)} |
{(A, a), (B, b), (C, c), (D, d)} R}SQL select distinct B, C from R
Ken Moody (cl.cam.ac.uk) Databases DB 2012 35 / 175
Why thedistinctin the SQL?
The SQL query
select B, C from R
will produce a bag (multiset)!
R
A B C D
20 10 0 5511 10 0 74 99 17 2
77 25 4 0
=
Q(R)
B C
10 0 10 0 99 1725 4
SQL is actually based on multisets, not sets. We will look into thismore in Lecture 11.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 36 / 175
7/23/2019 DBMS Caracterization
19/88
Lecture 03 : Entity-Relationship (E/R) modelling
Outline
EntitiesRelationships
Their relational implementations
n-ary relationships
Generalization
On the importance of SCOPE
Ken Moody (cl.cam.ac.uk) Databases DB 2012 37 / 175
Some real-world data ...
... from the Internet Movie Database (IMDb).
Title Year Actor
Austin Powers: International Man of Mystery 1997 Mike Myers
Austin Powers: The Spy Who Shagged Me 1999 Mike MyersDude, Wheres My Car? 2000 Bill ChottDude, Wheres My Car? 2000 Marc Lynn
Ken Moody (cl.cam.ac.uk) Databases DB 2012 38 / 175
7/23/2019 DBMS Caracterization
20/88
Entities diagrams and Relational Schema
Movie
TitleYear
MovieID Person
FirstNameLastName
PersonID
These diagrams represent relational schema
Movie(MovieID, Title, Year)
Person(PersonID, FirstName, LastName)
Yes, this ignores types ...
Ken Moody (cl.cam.ac.uk) Databases DB 2012 39 / 175
Entity sets (relational instances)
Movie
MovieID Title Year
55871 Austin Powers: International Man of Mystery 199755873 Austin Powers: The Spy Who Shagged Me 1999171771 Dude, Wheres My Car? 2000
(Tim used line number from IMDb raw file movies.list as MovieID.)
Person
PersonID FirstName LastName
6902836 Mike Myers1757556 Bill Chott5882058 Marc Lynn
(Tim used line number from IMDb raw file actors.list as PersonID)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 40 / 175
7/23/2019 DBMS Caracterization
21/88
Relationships
Movie
TitleMovieID
Year ActsIn Person
FirstNameLastName
PersonID
Ken Moody (cl.cam.ac.uk) Databases DB 2012 41 / 175
Foreign Keys and Referential Integrity
Foreign Key
Suppose we haveR(Z, Y). Furthermore, letS(W)be a relationalschema withZ W. We say that Zrepresents aForeign Key inSforRif for any instance we have Z(S) Z(R). This is a semanticassertion.
Referential integrity
A database is said to havereferential integritywhen all foreign keyconstraints are satisfied.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 42 / 175
7/23/2019 DBMS Caracterization
22/88
A relational representation
A relational schema
ActsIn(MovieID, PersonID)
Withreferential integrity constraints
MovieID(ActsIn) MovieID(Movie)
PersonID(ActsIn) PersonID(Person)
ActsIn
PersonID MovieID
6902836 55871
6902836 558731757556 1717715882058 171771
Ken Moody (cl.cam.ac.uk) Databases DB 2012 43 / 175
Foreign Keys in SQL
create table ActsIn
( MovieID int not NULL,
PersonID int not NULL,
primary key (MovieID, PersonID),
constraint actsin_movie
foreign key (MovieID)references Movie(MovieID),
constraint actsin_person
foreign key (PersonID)
references Person(PersonID))
Ken Moody (cl.cam.ac.uk) Databases DB 2012 44 / 175
7/23/2019 DBMS Caracterization
23/88
Relational representation of relationships, in general?
That depends ...
Mapping Cardinalities for binary relations,RS T
RelationRis meaning
many to many no constraints
one to many tT, s1, s2S.(R(s1, t) R(s2, t)) = s1=s2
many to one s S, t1, t2T.(R(s, t1) R(s, t2)) = t1=t2
one to one one to many and many to one
Note that the database terminology differs slightly from standardmathematical terminology.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 45 / 175
Diagrams for Mapping Cardinalities
ER diagram RelationRis
TRSmany to many (M :N)
TRSone to many (1: M)
TRSmany to one (M :1)
TRSone to one (1:1)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 46 / 175
7/23/2019 DBMS Caracterization
24/88
Relationships to Relational Schema
T X
Y
R
U
SZ
W
RelationRis Schema
many to many (M :N) R(X, Z, U)
one to many (1: M) R(X, Z, U)
many to one (M :1) R(X, Z, U)
one to one (1:1) R(X, Z, U)and/orR(X, Z, U)(alternate keys)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 47 / 175
one to one does not mean a "1-to-1 correspondence
T X
Y
R
U
SZ
W
This database instance is OKS R TZ W
z1 w1z2 w2z3 w3
Z X Uz1 x2 u1
X Y
x1 y1x2 y2x3 y3x4 y4
Ken Moody (cl.cam.ac.uk) Databases DB 2012 48 / 175
7/23/2019 DBMS Caracterization
25/88
Some more real-world data ... (a slight change ofSCOPE)
Title Year Actor RoleAustin Powers: International Man of Mystery 1997 Mike Myers Austin PowersAustin Powers: International Man of Mystery 1997 Mike Myers Dr. EvilAustin Powers: The Spy Who Shagged Me 1999 Mike Myers Austin PowersAustin Powers: The Spy Who Shagged Me 1999 Mike Myers Dr. EvilAustin Powers: The Spy Who Shagged Me 1999 Mike Myers Fat BastardDude, Wheres My Car? 2000 Bill Chott Big Cult Guard 1Dude, Wheres My Car? 2000 Marc Lynn Cop with Whips
How will this change our model?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 49 / 175
WillActsInremain a binary Relationship?
Movie
TitleYear
MovieID ActsIn
Role
Person
FirstNameLastName
PersonID
No! An actor can have many roles in the same movie!
Ken Moody (cl.cam.ac.uk) Databases DB 2012 50 / 175
7/23/2019 DBMS Caracterization
26/88
CouldActsInbe modeled as a Ternary Relationship?
Movie
TitleYear
MovieID ActsIn Person
FirstNameLastName
PersonID
Role
Description
Yes, this works!
Ken Moody (cl.cam.ac.uk) Databases DB 2012 51 / 175
Can a ternary relationship be modeled with multiple
binary relationships?
MovieHasCastingCastingActsInPerson
RequiresRole
Role
TheCastingentity seems artificial. What attributes would it have?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 52 / 175
7/23/2019 DBMS Caracterization
27/88
Sometimes ternary to multiple binary makes moresense ...
BranchWorks-OnEmployee
Job
BranchInvolvesProjectAssigned-ToEmployee
Requires
Job
Ken Moody (cl.cam.ac.uk) Databases DB 2012 53 / 175
Generalization
Comedy
ISA
Movie
Drama
Questions
Is every movie either comedy or a drama?Can a movie be a comedy and a drama?
But perhaps this isnt a good model ...
What attributes would distinguish Drama and Comedy entities?
What aboundScience Fiction?
PerhapsGenrewould make a nice entity, which could have a
relationship withMovie.Would a ternary relationship be better?Ken Moody (cl.cam.ac.uk) Databases DB 2012 54 / 175
7/23/2019 DBMS Caracterization
28/88
Question: What is the right model?
Answer: The question doesnt make sense!There is no right model ...
It depends on the intended use of the database.
What activity will the DBMS support?What data is needed to support that activity?
The issue of SCOPE is missing from most textbooks
Supposethat all databases begin life with beautifully designedschemas.
Observethat many operational databases are in a sorry state.
Concludethat thescope and goalsof a database continuallychange, and thatschema evolutionis a difficult problem to solve,in practice.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 55 / 175
Another change of SCOPE ...
Movies with detailed release datesTitle Country Day Month Year
Austin Powers: International Man of Mystery USA 02 05 1997Austin Powers: International Man of Mystery Iceland 24 10 1997Austin Powers: International Man of Mystery UK 05 09 1997Austin Powers: International Man of Mystery Brazil 13 02 1998Austin Powers: The Spy Who Shagged Me USA 08 06 1999Austin Powers: The Spy Who Shagged Me Iceland 02 07 1999Austin Powers: The Spy Who Shagged Me UK 30 07 1999Austin Powers: The Spy Who Shagged Me Brazil 08 10 1999Dude, Wheres My Car? USA 10 12 2000Dude, Wheres My Car? Iceland 9 02 2001Dude, Wheres My Car? UK 9 02 2001Dude, Wheres My Car? Brazil 9 03 2001Dude, Wheres My Car? Russia 18 09 2001
Ken Moody (cl.cam.ac.uk) Databases DB 2012 56 / 175
7/23/2019 DBMS Caracterization
29/88
... and an attribute becomes an entity with aconnecting relation.
Movie
TitleYear
MovieID
Movie
TitleMovieID
Year Released MovieRelease
CountryDate
Year
Month
Day
Ken Moody (cl.cam.ac.uk) Databases DB 2012 57 / 175
Lecture 04 : Relational algebra and relational calculus
OutlineConstructing new tuples!
Joins
Limitations of Relational Algebra
Ken Moody (cl.cam.ac.uk) Databases DB 2012 58 / 175
7/23/2019 DBMS Caracterization
30/88
Renaming
R
A B C D
20 10 0 55
11 10 0 74 99 17 2
77 25 4 0
=
Q(R)
A E C F
20 10 0 55
11 10 0 74 99 17 2
77 25 4 0
RA Q={BE, DF}(R)
TRC Q={t | uR t.A= u.A t.E=u.E t.C=u.C t.F =u.D}
DRC Q={{(A, a), (E, b), (C, c), (F, d)} |{(A, a), (B, b), (C, c), (D, d)} R}
SQL select A, B as E, C, D as F from R
Ken Moody (cl.cam.ac.uk) Databases DB 2012 59 / 175
Union
R
A B
20 1011 104 99
S
A B
20 1077 1000
=
Q(R, S)
A B
20 1011 104 99
77 1000
RA Q=R S
TRC Q={t |tR tS}
DRC Q={{(A, a), (B, b)} | {(A, a), (B, b)} R {(A, a), (B, b)} S}
SQL (select * from R) union (select * from S)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 60 / 175
7/23/2019 DBMS Caracterization
31/88
Intersection
R
A B
20 1011 104 99
S
A B
20 1077 1000
=
Q(R)
A B
20 10
RA Q=R S
TRC Q={t |tR tS}
DRC Q={{(A, a), (B, b)} | {(A, a), (B, b)} R {(A, a), (B, b)} S}
SQL(select * from R) intersect (select * from S)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 61 / 175
Difference
R
A B
20 1011 104 99
S
A B
20 1077 1000
=
Q(R)
A B
11 104 99
RA Q=R S
TRC Q={t |tR tS}
DRC Q={{(A, a), (B, b)} | {(A, a), (B, b)} R {(A, a), (B, b)} S}
SQL (select * from R) except (select * from S)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 62 / 175
7/23/2019 DBMS Caracterization
32/88
Wait, are we missing something?
Suppose we want to add information about college membership to ourStudent database. We could add an additional attribute for the college.
StudentsWithCollege :+--------+------+------+--------+
| name | age | sid | college|
+--------+------+------+--------+
| Eva | 18 | ev77 | Kings |
| Fatima | 20 | fm21 | Clare |
| James | 19 | jj25 | Clare |
+--------+------+------+--------+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 63 / 175
Put logically independent data in distinct tables?Students : +--------+------+------+-----+
| name | age | sid | cid |
+--------+------+------+-----+
| Eva | 18 | ev77 | k |
| Fatima | 20 | fm21 | cl |
| James | 19 | jj25 | cl |
+--------+------+------+-----+
Colleges : +-----+---------------+
| cid | college_name |
+-----+---------------+
| k | Kings |
| cl | Clare |
| sid | Sidney Sussex |
| q | Queens |
... .....
But how do we put them back together again?Ken Moody (cl.cam.ac.uk) Databases DB 2012 64 / 175
7/23/2019 DBMS Caracterization
33/88
Product
R
A B
20 1011 10
4 99
S
C D
14 9977 100 =
Q(R, S)A B C D
20 10 14 9920 10 77 100
11 10 14 9911 10 77 1004 99 14 994 99 77 100
Note the automaticflattening
RA Q=R S
TRC Q={t | uR, vS, t.[A, B] =u.[A, B] t.[C, D] =
v.[C, D]}DRC Q={{(A, a), (B, b), (C, c), (D, d)} |
{(A, a), (B, b)} R {(C, c), (D, d)} S}
SQL select A, B, C, D from R, SKen Moody (cl.cam.ac.uk) Databases DB 2012 65 / 175
Product is special!
R
A B
20 104 99
=
R AC, BD(R)
A B C D
20 10 20 1020 10 4 994 99 20 10
4 99 4 99
is the only operation in the Relational Algebra that created newrecords (ignoring renaming),
But usually creates too many records!
Joinsare the typical way of using products in a constrainedmanner.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 66 / 175
7/23/2019 DBMS Caracterization
34/88
Natural Join
Natural JoinGivenR(X, Y)and S(Y, Z), we define the natural join, denotedR S, as a relation over attributes X, Y, Zdefined as
R S {t | uR, vS, u.[Y] =v.[Y] t=u.[X] u.[Y] v.[Z]}
In the Relational Algebra:
R S=X,Y,Z(Y=Y(R YY(S)))
Ken Moody (cl.cam.ac.uk) Databases DB 2012 67 / 175
Join example
Students
name sid age cid
Fatima fm21 20 clEva ev77 18 kJames jj25 19 cl
Colleges
cid cname
k Kingscl Clareq Queens...
...
=
name,cname(Students Colleges)
name cname
Fatima ClareEva Kings
James Clare
Ken Moody (cl.cam.ac.uk) Databases DB 2012 68 / 175
7/23/2019 DBMS Caracterization
35/88
The same in SQL
select name, cname
from Students, Colleges
where Students.cid = Colleges.cid
+--------+--------+
| name | cname |
+--------+--------+
| Eva | Kings |
| Fatima | C lare |
| James | Clare |
+--------+--------+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 69 / 175
Division
GivenR(X, Y)and S(Y), the division ofRbyS, denotedR S, is therelation over attributesXdefined as (in the TRC)
R S {x | sS, x sR}.
name award
Fatima writingFatima musicEva musicEva writingEva danceJames dance
award
musicwritingdance
= name
Eva
Ken Moody (cl.cam.ac.uk) Databases DB 2012 70 / 175
7/23/2019 DBMS Caracterization
36/88
Division in the Relational Algebra?
Clearly,R SX(R). SoR S=X(R) C, whereCrepresentscounter examples to the division condition. That is, in the TRC,
C={x | sS, x sR}.
U=X(R) Srepresents all possible x sforxX(R)andsS,
soT =U Rrepresents all thosex sthat are not in R,
soC=X(T)represents those recordsxthat are counterexamples.
Division in RA
R SX(R) X((X(R) S) R)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 71 / 175
Query Safety
A query likeQ={t |tR tS}raises some interesting questions.Should we allow the following query?
Q={t |tS}
We want our relations to befinite!
Safety
A (TRC) queryQ={t |P(t)}
issafeif it is always finite for any database instance.
Problem : query safety is not decidable!
Solution : define a restricted syntax that guarantees safety.
Safe queries can be represented in the Relational Algebra.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 72 / 175
7/23/2019 DBMS Caracterization
37/88
Limitations of simple relational query languages
The expressive power of RA, TRC, and DRC are essentially thesame. None can express thetransitive closureof a relation.
We could extend RA to more powerful languages (like Datalog).SQL has been extended with many features beyond the RelationalAlgebra. stored procedures recursive queries ability to embed SQL in standard procedural languages
Ken Moody (cl.cam.ac.uk) Databases DB 2012 73 / 175
Lecture 05 : SQL and integrity constraints
OutlineNULLin SQL
three-valued logic
Multisets and aggregation in SQLViews
General integrity constraints
Ken Moody (cl.cam.ac.uk) Databases DB 2012 74 / 175
7/23/2019 DBMS Caracterization
38/88
What isNULLin SQL?
What if you dont know Kims age?
mysql> select * from students;
+------+--------+------+| sid | name | age |
+------+--------+------+
| ev77 | Eva | 18 |
| fm21 | Fatima | 20 |
| jj25 | James | 19 |
| ks87 | Kim | NULL |
+------+--------+------+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 75 / 175
What isNULL?
NULLis aplace-holder, not a value!
NULLis not a member of any domain (type),
For records withNULL forage, an expression like age > 20mustunknown!
This means we need (at least) three-valued logic.
LetrepresentWe dont know!
T F
T T F F F F F
F
T F
T T T T
F T F T
v v
T F
F T
Ken Moody (cl.cam.ac.uk) Databases DB 2012 76 / 175
7/23/2019 DBMS Caracterization
39/88
NULLcan lead to unexpected resultsmysql> select * from students;
+------+--------+------+
| sid | name | age |
+------+--------+------+
| ev77 | Eva | 18 |
| fm21 | Fatima | 20 |
| jj25 | James | 19 |
| ks87 | Kim | NULL |
+------+--------+------+
mysql> select * from students where age 19;
+------+--------+------+
| sid | name | age |
+------+--------+------+| ev77 | Eva | 18 |
| fm21 | Fatima | 20 |
+------+--------+------+
select ... where P
The select statement only returns those records where the wherepredicate evaluates totrue.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 77 / 175
The ambiguity of NULL
Possible interpretations of NULLThere is a value, but we dont know what it is.
No value is applicable.
The value is known, but you are not allowed to see it.
...
A great deal of semantic muddle is created by conflating all of theseinterpretations into one non-value.
On the other hand, introducing distinct NULLs for each possibleinterpretation leads to very complex logics ...
Ken Moody (cl.cam.ac.uk) Databases DB 2012 78 / 175
7/23/2019 DBMS Caracterization
40/88
Not everyone approves ofNULL
C. J. Date [D2004], Chapter 19Before we go any further, we should make it very clear that in ouropinion (and in that of many other writers too, we hasten to add),NULLs and 3VL are and always were a serious mistake and have noplace in the relational model.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 79 / 175
ageis not a good attribute ...
Theagecolumn is guaranteed to go out of date! Lets record dates ofbirth instead!
create table Students
( sid varchar(10) not NULL,
name varchar(50) not NULL,
birth_date date,cid varchar(3) not NULL,
primary key (sid),
constraint student_college foreign key (cid)
references Colleges(cid) )
Ken Moody (cl.cam.ac.uk) Databases DB 2012 80 / 175
7/23/2019 DBMS Caracterization
41/88
ageis not a good attribute ...
mysql> select * from Students;
+------+---------+------------+-----+| sid | name | birth_date | cid |
+------+---------+------------+-----+
| ev77 | Eva | 1990-01-26 | k |
| fm21 | Fatima | 1988-07-20 | c l |
| jj25 | James | 1989-03-14 | cl |
+------+---------+------------+-----+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 81 / 175
Use aviewto recover original table(Note : the age calculation here is not correct!)
create view StudentsWithAge as
select sid, name,
(year(current_date()) - year(birth_date)) as age,
cid
from Students;
mysql> select * from StudentsWithAge;
+------+---------+------+-----+| sid | name | age | cid |
+------+---------+------+-----+
| ev77 | Eva | 19 | k |
| fm21 | Fatima | 21 | cl |
| jj25 | James | 20 | cl |
+------+---------+------+-----+
Views are simply identifiers that represent a query. The views name
can be used as if it were a stored table.Ken Moody (cl.cam.ac.uk) Databases DB 2012 82 / 175
7/23/2019 DBMS Caracterization
42/88
But that calculation is not correct ...Clearly the calculation of age does not take into account the day andmonth of year.
From 2010 Database Contest (winner : Sebastian Probst Eide)SELECT year(CURRENT_DATE()) - year(birth_date) -
CASE WHEN month(CURRENT_DATE()) < month(birth_date)THEN 1
ELSE
CASE WHEN month(CURRENT_DATE()) = month(birth_date)
THEN
CASE WHEN day(CURRENT_DATE()) < day(birth_date)
THEN 1
ELSE 0
END
ELSE 0
END
END
AS age FROM Students
Ken Moody (cl.cam.ac.uk) Databases DB 2012 83 / 175
An Example ...
mysql> select * from marks;
+-------+-----------+------+
| sid | course | mark |
+-------+-----------+------+
| ev77 | databases | 92 |
| ev77 | spelling | 99 |
| tgg22 | spelling | 3 || tgg22 | databases | 100 |
| fm21 | databases | 92 |
| fm21 | spelling | 100 |
| jj25 | databases | 88 |
| jj25 | spelling | 92 |
+-------+-----------+------+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 84 / 175
7/23/2019 DBMS Caracterization
43/88
... of duplicates
mysql> select mark from marks;
+------+
| mark |
+------+| 92 |
| 99 |
| 3 |
| 100 |
| 92 |
| 100 |
| 88 |
| 92 |
+------+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 85 / 175
Why Multisets?
Duplicates are important foraggregate functions.
mysql> select min(mark),
max(mark),
sum(mark),
avg(mark)
from marks;
+-----------+-----------+-----------+-----------+
| min(mark) | max(mark) | sum(mark) | avg(mark) |
+-----------+-----------+-----------+-----------+
| 3 | 100 | 666 | 83.2500 |
+-----------+-----------+-----------+-----------+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 86 / 175
7/23/2019 DBMS Caracterization
44/88
Thegroup byclause
mysql> select course,
min(mark),
max(mark),
avg(mark)
from marks
group by course;
+-----------+-----------+-----------+-----------+
| course | min(mark) | max(mark) | avg(mark) |
+-----------+-----------+-----------+-----------+
| databases | 88 | 100 | 93.0000 |
| spelling | 3 | 100 | 73.5000 |
+-----------+-----------+-----------+-----------+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 87 / 175
Visualizing group by
sid course mark
ev77 databases 92ev77 spelling 99tgg22 spelling 3
tgg22 databases 100fm21 databases 92fm21 spelling 100jj25 databases 88jj25 spelling 92
group by=
course mark
spelling 99spelling 3spelling 100spelling 92
course mark
databases 92databases 100databases 92databases 88
Ken Moody (cl.cam.ac.uk) Databases DB 2012 88 / 175
7/23/2019 DBMS Caracterization
45/88
Visualizing group by
course mark
spelling 99
spelling 3spelling 100spelling 92
course mark
databases 92databases 100databases 92
databases 88
min(mark)=
course min(mark)spelling 3
databases 88
Ken Moody (cl.cam.ac.uk) Databases DB 2012 89 / 175
Thehavingclause
How can we select on the aggregated columns?
mysql> select course,
min(mark),
max(mark),
avg(mark)
from marks
group by course
having min(mark) > 60;
+-----------+-----------+-----------+-----------+
| course | min(mark) | max(mark) | avg(mark) |
+-----------+-----------+-----------+-----------+
| databases | 88 | 100 | 93.0000 |
+-----------+-----------+-----------+-----------+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 90 / 175
7/23/2019 DBMS Caracterization
46/88
Use renaming to make things nicer ...
mysql> select course,
min(mark) as minimum,
max(mark) as maximum,
avg(mark) as average
from marks
group by course
having minimum > 60;
+-----------+---------+---------+---------+
| course | minimum | maximum | average |
+-----------+---------+---------+---------+
| databases | 88 | 100 | 93.0000 |
+-----------+---------+---------+---------+
Ken Moody (cl.cam.ac.uk) Databases DB 2012 91 / 175
Materialized Views
SupposeQis a very expensive, and very frequent query.Why not de-normalize some data to speed up the evaluation of Q?
This might be a reasonable thing to do, or ... ... it might be the first step to destroying the integrity of your data
design.Why not store the value of Qin a table? This is called amaterialized view. But now there is a problem: How often should this view be
refreshed?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 92 / 175
7/23/2019 DBMS Caracterization
47/88
General integrity constraints
Suppose thatCis some constraint we would like to enforce on ourdatabase.
LetQC
be a query that captures all violations of C.
Enforce (somehow) that the assertion that is always QCempty.
Example
C=Z W, and FD that was not preserved for relation R(X),
LetQRbe a join that reconstructs R,
LetQRbe this query withX X and
QC=W=W(Z=Z(QR Q
R
))
Ken Moody (cl.cam.ac.uk) Databases DB 2012 93 / 175
Assertions in SQL
create view C_violations as ....
create assertion check_C
check not (exists C_violations)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 94 / 175
7/23/2019 DBMS Caracterization
48/88
Lectures 06 : Case Study - Cancer registry for theNHS
ECRIC is a cancer registry, recording details about all tumours inpeople in the East of England. This data is particularly sensitive, andits use is strictly controlled. The lecture focusses on the challenges ofscaling up the registration system to cover all cancer patients inEngland, while still maintaining the long term accuracy and continuityof the data set.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 95 / 175
Lecture 07 : Schema refinement I
OutlineER is for top-down and informal (but rigorous) design
FDs are used for bottom-up and formal design and analysis
update anomaliesReasoning about Functional Dependencies
Heaths rule
Ken Moody (cl.cam.ac.uk) Databases DB 2012 96 / 175
7/23/2019 DBMS Caracterization
49/88
Update anomalies
Big Table
sid name college course part term_name
yy88 Yoni New Hall Algorithms I IA Easteruu99 Uri Kings Algorithms I IA Easterbb44 Bin New Hall Databases IB Lentbb44 Bin New Hall Algorithms II IB Michaelmaszz70 Zip Trinity Databases IB Lentzz70 Zip Trinity Algorithms II IB Michaelmas
How can we tell if an insert record is consistent with currentrecords?
Can we record data about a course before students enroll?
Will we wipe out information about a college when last studentassociated with the college is deleted?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 97 / 175
Redundancy implies more locking ...
... at least for correct transactions!
Big Table
sid name college course part term_name
yy88 Yoni New Hall Algorithms I IA Easteruu99 Uri Kings Algorithms I IA Easterbb44 Bin New Hall Databases IB Lent
bb44 Bin New Hall Algorithms II IB Michaelmaszz70 Zip Trinity Databases IB Lentzz70 Zip Trinity Algorithms II IB Michaelmas
ChangeNew HalltoMurray Edwards College Conceptually simple update May require locking entire table.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 98 / 175
7/23/2019 DBMS Caracterization
50/88
Redundancy is the root of (almost) all database evils
It may not be obvious, but redundancy is also the cause of updateanomalies.By redundancy wedo notmean that some values occur manytimes in the database! A foreign key value may be have millions of copies!
But then, what do we mean?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 99 / 175
Functional Dependency
Functional Dependency (FD)
LetR(X)be a relational schema andY X,Z Xbe two attributesets. We sayYfunctionally determinesZ, writtenY Z, if for any twotuplesuand vin an instance of R(X)we have
u.Y= v.Y u.Z= v.Z.
We callY Z afunctional dependency.
A functional dependency is a semantic assertion. It represents a rulethat should always hold in any instance of schema R(X).
Ken Moody (cl.cam.ac.uk) Databases DB 2012 100 / 175
7/23/2019 DBMS Caracterization
51/88
Example FDs
Big Table
sid name college course part term_name
yy88 Yoni New Hall Algorithms I IA Easter
uu99 Uri Kings Algorithms I IA Easterbb44 Bin New Hall Databases IB Lentbb44 Bin New Hall Algorithms II IB Michaelmaszz70 Zip Trinity Databases IB Lentzz70 Zip Trinity Algorithms II IB Michaelmas
sid name
sid college
course partcourse term_name
Ken Moody (cl.cam.ac.uk) Databases DB 2012 101 / 175
Keys, revisited
Candidate Key
LetR(X)be a relational schema andY X. Yis acandidate keyif1 The FDY Xholds, and2 for no proper subsetZ YdoesZ Xhold.
Prime and Non-prime attributesAn attributeAisprimeforR(X)if it is a member of some candidate keyforR. Otherwise,Aisnon-prime.
Database redundancy roughly means the existence of non-keyfunctional dependencies!
Ken Moody (cl.cam.ac.uk) Databases DB 2012 102 / 175
7/23/2019 DBMS Caracterization
52/88
Semantic Closure
Notation
F |=
Y
Z
means that any database instance that that satisfies every FD ofF,must also satisfyY Z.
Thesemantic closureof F, denotedF+, is defined to be
F+ ={Y Z | Y Z atts(F)and F |=Y Z}.
Themembership problemis to determine ifY Z F+.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 103 / 175
Reasoning about Functional Dependencies
We writeF Y Z whenY Z can be derived fromFvia thefollowing rules.
Armstrongs Axioms
Reflexivity IfZ
Y
, thenF
Y
Z
.Augmentation IfF Y ZthenF Y, W Z, W.
Transitivity IfF Y Z andF |=Z W, thenF Y W.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 104 / 175
7/23/2019 DBMS Caracterization
53/88
Logical Closure (of a set of attributes)
Notation
closure(F, X) ={A| F X A}
Claim 1IfY W FandY closure(F, X), thenW closure(F, X).
Claim 2Y W F+ if and only ifW closure(F, Y).
Ken Moody (cl.cam.ac.uk) Databases DB 2012 105 / 175
Soundness and Completeness
Soundness
F f = f F+
Completenessf F+ = F f
Ken Moody (cl.cam.ac.uk) Databases DB 2012 106 / 175
7/23/2019 DBMS Caracterization
54/88
Proof of Completeness (soundness left as an exercise)
Show(F f) = (F |=f):
Suppose(F Y Z)forR(X).
LetY+ =closure(F, Y).
BZ, withBY+.
Construct an instance ofRwith just two records, uand v, thatagree onY+ but not onX Y+.
By construction, this instance does not satisfy Y Z.But it does satisfyF! Why? letS T be any FD in F, withu.[S] =v.[S]. SoS Y+. and soT Y+by claim 1,
and sou.[T] =v.[T]
Ken Moody (cl.cam.ac.uk) Databases DB 2012 107 / 175
Closure
By soundness and completeness
closure(F, X) ={A| F X A}= {A| X A F+}
Claim 2 (from previous lecture)
Y W F+ if and only ifW closure(F, Y).
If we had an algorithm for closure(F, X), then we would have a (bruteforce!) algorithm for enumerating F+:
F+
for every subsetY atts(F) for every subsetZ closure(F, Y),
outputY Z
Ken Moody (cl.cam.ac.uk) Databases DB 2012 108 / 175
7/23/2019 DBMS Caracterization
55/88
Attribute Closure Algorithm
Input : a set of FDsFand a set of attributesX.
Output : Y=closure(F, X)
1 Y:= X
2 while there is someS T FwithS Yand T Y, thenY:= Y T.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 109 / 175
An Example (UW1997, Exercise 3.6.1)
R(A, B, C, D)withFmade up of the FDs
A, BCCDDA
What isF+?
Brute force!Lets just consider all possible nonempty setsX there are only 15...
Ken Moody (cl.cam.ac.uk) Databases DB 2012 110 / 175
7/23/2019 DBMS Caracterization
56/88
Example (cont.)
F ={A, BC, CD, DA}
For the single attributes we have
{A}+ ={A},
{B}+ ={B},{C}+ ={A, C, D},
{C}CD= {C, D}
DA= {A, C, D}
{D}+ ={A, D}
{D}DA= {A, D}
The only new dependency we get with a single attribute on the left isCA.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 111 / 175
Example (cont.)
F ={A, BC, CD, DA}
Now consider pairs of attributes.
{A, B}+ ={A, B, C, D}, soA, BDis a new dependency
{A, C}+ ={A, C, D}, soA, CDis a new dependency
{A, D}+ ={A, D}, so nothing new.
{B, C}+ ={A, B, C, D}, soB, CA, Dis a new dependency
{B, D}+ ={A, B, C, D}, soB, DA, Cis a new dependency
{C, D}+ ={A, C, D}, soC, D Ais a new dependency
Ken Moody (cl.cam.ac.uk) Databases DB 2012 112 / 175
7/23/2019 DBMS Caracterization
57/88
Example (cont.)
F ={A, BC, CD, DA}
For the triples of attributes:
{A, C, D}+ ={A, C, D},{A, B, D}+ ={A, B, C, D}, soA, B, DCis a new dependency
{A, B, C}+ ={A, B, C, D}, soA, B, CDis a new dependency
{B, C, D}+ ={A, B, C, D}, soB, C, DAis a new dependency
And since{A, B, C, D}+ ={A, B, C, D}, we get no newdependencies with four attributes.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 113 / 175
Example (cont.)
We generated 11 new FDs:
C A A, B DA, C D B, C AB, C D B, D AB, D C C, D A
A, B, C D A, B, D C
B, C, D A
Can you see the Key?
{A, B},{B, C}, and{B, D}are keys.
Note: this schema is already in 3NF! Why?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 114 / 175
7/23/2019 DBMS Caracterization
58/88
Consequences of Armstrongs Axioms
Union IfF |=Y Z and F |=Y W, thenF |=Y W, Z.Pseudo-transitivity IfF |=Y Z and F |=U, Z W, then
F |=Y, U W.
Decomposition IfF |=Y Z and W Z, thenF |=Y W.
Exercise : Prove these using Armstrongs axioms!
Ken Moody (cl.cam.ac.uk) Databases DB 2012 115 / 175
Proof of the Union Rule
Suppose we haveF |=Y Z,F |=Y W.
By augmentation we have
F |=Y, Y Y, Z,
that is, F |=Y Y, Z.
Also using augmentation we obtain
F |=Y, Z W, Z.
Therefore, by transitivity we obtain
F |=Y W, Z.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 116 / 175
7/23/2019 DBMS Caracterization
59/88
Example application of functional reasoning.
Heaths RuleSupposeR(A, B, C)is a relational schema with functionaldependencyA B, then
R=A,B(R) A A,C(R).
Ken Moody (cl.cam.ac.uk) Databases DB 2012 117 / 175
Proof of Heaths Rule
We first show thatRA,B(R) A A,C(R).
Ifu= (a, b, c) R, thenu1= (a, b) A,B(R)andu2= (a, c) A,C(R).
Since{(a, b)} A{(a, c)}= {(a, b, c)}we knowuA,B(R) A A,C(R).
In the other direction we must showR =A,B(R) A A,C(R) R.Ifu= (a, b, c) R, then there must exist tuplesu1= (a, b) A,B(R)andu2= (a, c) A,C(R).
This means that there must exist au = (a, b, c) Rsuch thatu2=A,C({(a, b
, c)}).
However, the functional dependency tells us that b= b, sou= (a, b, c) R.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 118 / 175
7/23/2019 DBMS Caracterization
60/88
Closure Example
R(A, B, C, D, E, F)withA, BCB, CD
DEC, F B
What is the closure of{A, B}?
{A, B} A,BC
= {A, B, C}B,CD
= {A, B, C, D}DE= {A, B, C, D, E}
So{A, B}+ ={A, B, C, D, E}andA, BC, D, E.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 119 / 175
Lecture 08 : Normal Forms
OutlineFirst Normal Form (1NF)
Second Normal Form (2NF)
3NF and BCNFMulti-valued dependencies (MVDs)
Fourth Normal Form
Ken Moody (cl.cam.ac.uk) Databases DB 2012 120 / 175
7/23/2019 DBMS Caracterization
61/88
The Plan
Given a relational schemaR(X)with FDsF :Reason about FDs IsFmissing FDs that are logically implied by those in F?
Decompose eachR(X)into smallerR1(X1), R2(X2), Rk(Xk),where eachRi(Xi)is in the desired Normal Form.
Are some decompositions better than others?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 121 / 175
Desired properties of any decomposition
Lossless-join decomposition
A decomposition of schema R(X)to S(Y Z)and T(Y (X Z))is alossless-join decomposition if for every database instances we haveR=S T.
Dependency preserving decompositionA decomposition of schema R(X)to S(Y Z)and T(Y (X Z))isdependency preserving, if enforcing FDs on Sand Tindividually hasthe same effect as enforcing all FDs onS T.
We will see that it is not always possible to achieve both of these goals.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 122 / 175
7/23/2019 DBMS Caracterization
62/88
First Normal Form (1NF)
We will assume every schema is in 1NF.
1NF
A schemaR(A1:S1, A2:S2, , An:Sn)is in First Normal Form(1NF) if the domainsS1are elementary their values areatomic.
name
Timothy George Griffin =
first_name middle_name last_name
Timothy George Griffin
Ken Moody (cl.cam.ac.uk) Databases DB 2012 123 / 175
Second Normal Form (2NF)
Second Normal Form (2NF)A relational schemaRis in 2NF if for every functional dependencyX Aeither
A X, or
Xis a superkey forR, orAis a member of some key, or
Xis not a proper subset of any key.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 124 / 175
7/23/2019 DBMS Caracterization
63/88
3NF and BCNF
Third Normal Form (3NF)
A relational schemaRis in 3NF if for every functional dependencyX Aeither
A X, or
Xis a superkey forR, or
Ais a member of some key.
Boyce-Codd Normal Form (BCNF)A relational schemaRis in BCNF if for every functional dependencyX Aeither
A X, or
Xis a superkey forR.
Is something missing?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 125 / 175
Another look at Heaths Rule
GivenR(Z, W, Y)with FDsF
IfZ W F+, the
R=Z,W(R) Z,Y(R)
What about an implication in the other direction? That is, suppose wehave
R=Z,W(R) Z,Y(R).
Q Can we conclude anything about FDs on R? In particular,is it true that Z Wholds?
A No!
Ken Moody (cl.cam.ac.uk) Databases DB 2012 126 / 175
7/23/2019 DBMS Caracterization
64/88
We just needonecounter example ...
R = A,B(R) A,C(R)
A B C
a b1 c1a b2 c2a b1 c2a b2 c1
A B
a b1a b2
A C
a c1a c2
ClearlyA Bis not an FD of R.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 127 / 175
A concrete example
course_name lecturer text
Databases Tim Ullman and WidomDatabases Fatima DateDatabases Tim DateDatabases Fatima Ullman and Widom
Assuming that texts and lecturers are assigned to coursesindependently, then a better representation would in two tables:
course_name lecturer
Databases TimDatabases Fatima
course_name text
Databases Ullman and WidomDatabases Date
Ken Moody (cl.cam.ac.uk) Databases DB 2012 128 / 175
7/23/2019 DBMS Caracterization
65/88
Time for a definition! MVDs
Multivalued Dependencies (MVDs)
LetR(Z, W, Y)be a relational schema. A multivalued dependency,denotedZ W, holds if whenever tanduare two records that agreeon the attributes ofZ, then there must be some tuple vsuch that
1 vagrees with both tanduon the attributes ofZ,2 vagrees with ton the attributes of W,3 vagrees with uon the attributes ofY.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 129 / 175
A few observations
Note 1Every functional dependency is multivalued dependency,
(Z W) = (ZW).
To see this, just let v=uin the above definition.
Note 2LetR(Z, W, Y)be a relational schema, then
(ZW) (Z Y),
by symmetry of the definition.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 130 / 175
7/23/2019 DBMS Caracterization
66/88
MVDs and lossless-join decompositions
Fun Fun FactLetR(Z, W, Y)be a relational schema. The decompositionR1(Z, W),R2(Z, Y)is a lossless-join decomposition of Rif and only if the MVDZWholds.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 131 / 175
Proof of Fun Fun Fact
Proof of(ZW) = R= Z,W(R) Z,Y(R)
SupposeZ W.
We know (from proof of Heaths rule) that RZ,W(R) Z,Y(R).So we only need to show Z,W(R) Z,Y(R) R.
SupposerZ,W(R) Z,Y(R).
So there must be a tRanduRwith
{r}= Z,W({t}) Z,Y({u}).In other words, there must be a tRand uRwitht.Z= u.Z.So the MVD tells us that then there must be some tuplevRsuch that
1 vagrees with bothtanduon the attributes of Z,2 vagrees withton the attributes ofW,3 vagrees withuon the attributes of Y.
Thisvmust be the same as r, so rR.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 132 / 175
7/23/2019 DBMS Caracterization
67/88
Proof of Fun Fun Fact (cont.)
Proof ofR= Z,W(R) Z,Y(R) = (ZW)
SupposeR=Z,W(R) Z,Y(R).
Lettandube any records in Rwitht.Z= u.Z.Letvbe defined by{v}= Z,W({t}) Z,Y({u})(and we knowvRby the assumption).Note that by construction we have
1 v.Z= t.Z= u.Z,2 v.W= t.W,3 v.Y= u.Y.
Therefore,Z Wholds.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 133 / 175
Fourth Normal Form
Trivial MVDThe MVDZ Wistrivialfor relational schema R(Z, W, Y)if
1 Z W={}, or2 Y= {}.
4NFA relational schemaR(Z, W, Y)is in 4NF if for every MVD Z Weither
ZWis a trivial MVD, or
Zis a superkey for R.
Note : 4NFBCNF3NF 2NF
Ken Moody (cl.cam.ac.uk) Databases DB 2012 134 / 175
7/23/2019 DBMS Caracterization
68/88
Summary
We always want the lossless-join property. What are our options?
3NF BCNF 4NFPreserves FDs Yes Maybe Maybe
Preserves MVDs Maybe Maybe MaybeEliminates FD-redundancy Maybe Yes Yes
Eliminates MVD-redundancy No No Yes
Ken Moody (cl.cam.ac.uk) Databases DB 2012 135 / 175
Inclusions
Clearly BCNF3NF2NF. These are proper inclusions:
In 2NF, but not 3NFR(A, B, C), withF ={A B, BC}.
In 3NF, but not BCNFR(A, B, C), withF ={A, BC, CB}.
This is in 3NF sinceABand ACare keys, so there are nonon-prime attributes
But not in BCNF sinceCis not a key and we have CB.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 136 / 175
7/23/2019 DBMS Caracterization
69/88
Schema refinement III and advanced design
OutlineGeneral Decomposition Method (GDM)
The lossless-join condition is guaranteed by GDM
The GDMdoes notalways preserve dependencies!
FDs vs ER models?
Weak entities
Using FDs and MVDs to refine ER models
Another look at ternary relationships
Ken Moody (cl.cam.ac.uk) Databases DB 2012 137 / 175
General Decomposition Method (GDM)
GDM1 Understand your FDsF(computeF+),2 findR(X) =R(Z, W, Y)(setsZ,Wand Yare disjoint) with FD
Z W F+ violating a condition of desired NF,3 splitRinto two tablesR1(Z, W)and R2(Z, Y)4 wash, rinse, repeat
ReminderForZ W, if we assumeZ W= {}, then the conditions are
1 Zis a superkey for R(2NF, 3NF, BCNF)2 Wis a subset of some key (2NF, 3NF)3 Zis not a proper subset of any key (2NF)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 138 / 175
7/23/2019 DBMS Caracterization
70/88
The lossless-join condition is guaranteed by GDM
This method will produce a lossless-join decomposition because
of (repeated applications of) Heaths Rule!That is, each time we replace an Sby S1and S2, we will alwaysbe able to recoverSas S1 S2.
Note that in GDM step 3, the FD Z Wmay represent akeyconstraintforR1.
But does the method always terminate? Please think about this ....
Ken Moody (cl.cam.ac.uk) Databases DB 2012 139 / 175
General Decomposition Method Revisited
GDM++1 Understand your FDs and MVDsF(computeF+),2 findR(X) =R(Z, W, Y)(setsZ,Wand Yare disjoint) with either
FDZ W F+ or MVDZ W F+ violating a condition of
desired NF,3 splitRinto two tablesR1(Z, W)and R2(Z, Y)4 wash, rinse, repeat
Ken Moody (cl.cam.ac.uk) Databases DB 2012 140 / 175
7/23/2019 DBMS Caracterization
71/88
Return to Example Decompose to BCNF
R(A, B, C, D)
F ={A, BC, CD, DA}
Which FDs inF+ violate BCNF?
C AC DD A
A, C DC, D A
Ken Moody (cl.cam.ac.uk) Databases DB 2012 141 / 175
Return to Example Decompose to BCNF
DecomposeR(A, B, C, D)to BCNF
UseCDto obtain
R1(C, D). This is in BCNF. Done.R2(A, B, C)This is not in BCNF. Why? A, Band B, Care the only
keys, andCAis a FD forR1. So useCAto obtain R2.1(A, C). This is in BCNF. Done. R2.2(B, C). This is in BCNF. Done.
Exercise : Try starting with any of the other BCNF violations and seewhere you end up.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 142 / 175
7/23/2019 DBMS Caracterization
72/88
The GDMdoes notalways preserve dependencies!
R(A, B, C, D, E)
A, B CD, E C
B D
{A, B}+ ={A, B, C, D},
soA, BC, D,
and{A, B, E}is a key.
{B, E}+ ={B, C, D, E},
soB, EC, D,
and{A, B, E}is a key (again)
Lets try for a BCNF decomposition ...
Ken Moody (cl.cam.ac.uk) Databases DB 2012 143 / 175
Decomposition 1
DecomposeR(A, B, C, D, E)usingA, BC, D:R1(A, B, C, D). Decompose this using BD: R1.1(B, D). Done. R1.2(A, B, C). Done.
R2(A, B, E). Done.But in this decomposition, how will we enforce this dependency?
D, EC
Ken Moody (cl.cam.ac.uk) Databases DB 2012 144 / 175
7/23/2019 DBMS Caracterization
73/88
Decomposition 2
DecomposeR(A, B, C, D, E)usingB, EC, D:R3(B, C, D, E). Decompose this usingD, EC
R3.1(C, D, E). Done. R3.2(B, D, E). Decompose this usingBD: R3.2.1(B, D). Done. R3.2.2(B, E). Done.
R4(A, B, E). Done.
But in this decomposition, how will we enforce this dependency?
A, BC
Ken Moody (cl.cam.ac.uk) Databases DB 2012 145 / 175
Summary
It is always possible to obtain BCNF that has the lossless-joinproperty (using GDM) But the result may not preserve all dependencies.
It is always possible to obtain 3NF that preserves dependencies
and has the lossless-join property. Using methods based on minimal covers (for example, see
EN2000).
Ken Moody (cl.cam.ac.uk) Databases DB 2012 146 / 175
7/23/2019 DBMS Caracterization
74/88
Recall : a small change ofscope...... changed this entity
Movie
TitleYear
MovieID
into two entities and a relationship :
Movie
TitleMovieID
Released MovieRelease
CountryDate
Year
Month
Day
But is there something odd about the MovieRelease entity?
Ken Moody (cl.cam.ac.uk) Databases DB 2012 147 / 175
MovieRelease represents aWeak entity set
Movie
TitleMovieID
Released MovieRelease
CountryDate
Year
Month
Day
DefinitionWeak entity sets do not have a primary key.
The existence of a weak entity depends on an identifying entity setthrough anidentifying relationship.
The primary key of the identifying entity together with the weakentitiesdiscriminators(dashed underline in diagram) identify eachweak entity element.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 148 / 175
7/23/2019 DBMS Caracterization
75/88
Can FDs help us think about implementation?
R(I, T, D, C)I T
I = MovieIDT = TitleD = DateC = Country
Turn the decomposition crank to obtain
R1(I, T) R2(I, D, C)
I(R2) I(R1)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 149 / 175
Movie Ratings exampleScope = UK
Title Year Rating
Austin Powers: International Man of Mystery 1997 15Austin Powers: The Spy Who Shagged Me 1999 12Dude, Wheres My Car? 2000 15
Scope = Earth
Title Year Country RatingAustin Powers: International Man of Mystery 1997 UK 15Austin Powers: International Man of Mystery 1997 Malaysia 18SXAustin Powers: International Man of Mystery 1997 Portugal M/12Austin Powers: International Man of Mystery 1997 USA PG-13Austin Powers: The Spy Who Shagged Me 1999 UK 12Austin Powers: The Spy Who Shagged Me 1999 Portugal M/12Austin Powers: The Spy Who Shagged Me 1999 USA PG-13Dude, Wheres My Car? 2000 UK 15Dude, Wheres My Car? 2000 USA PG-13
Dude, Wheres My Car? 2000 Malaysia 18PLKen Moody (cl.cam.ac.uk) Databases DB 2012 150 / 175
7/23/2019 DBMS Caracterization
76/88
7/23/2019 DBMS Caracterization
77/88
Oh, but the real world is such a bother!
from IMDb raw data file certificates.list
2 Fast 2 Furious (2003) Switzerland:14 (canton of Vaud)2 Fast 2 Furious (2003) Switzerland:16 (canton of Zurich)
28 Days (2000) Canada:13+ (Quebec)
28 Days (2000) Canada:14 (Nova Scotia)
28 Days (2000) Canada:14A (Alberta)
28 Days (2000) Canada:AA (Ontario)
28 Days (2000) Canada:PA (Manitoba)
28 Days (2000) Canada:PG (British Columbia)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 153 / 175
Ternary or multiple binary relationships?
TRS
U
TR3ER1S
R2
U
Ken Moody (cl.cam.ac.uk) Databases DB 2012 154 / 175
7/23/2019 DBMS Caracterization
78/88
Ternary or multiple binary relationships?
TRS
U
TR2SR1U
Ken Moody (cl.cam.ac.uk) Databases DB 2012 155 / 175
Look again at ER Demo Diagram2
How might this be refined using FDs or MVDs?
Employee
NameNumber
ISA
Mechanic SalesmanDoes
RepairJobNumber
Description
CostParts
Work
Repairs Car
License
ModelYear
Manufacturer
Buys
Price
Date
Value
Sells
Date
Value
Commission
Client ID
Name PhoneAddress
buyerseller
2By Pvel Calado,http://www.texample.net/tikz/examples/entity-relationship-diagram
Ken Moody (cl.cam.ac.uk) Databases DB 2012 156 / 175
7/23/2019 DBMS Caracterization
79/88
Lecture 10 : On-line Analytical Processing (OLAP)
OutlineLimits of SQL aggregationOLAP : Online Analytic Processing
Data cubes
Star schema
Ken Moody (cl.cam.ac.uk) Databases DB 2012 157 / 175
Limits of SQL aggregation
Flat tables are great for processing, but hard for people to readand understand.
Pivot tables and cross tabulations (spreadsheet terminology) arevery useful for presenting data in ways that people canunderstand.
SQL does not handle pivot tables and cross tabulations well.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 158 / 175
7/23/2019 DBMS Caracterization
80/88
OLAP vs. OLTP
OLTP : Online Transaction Processing (traditional databases) Data is normalized for the sake of updates.
OLAP : Online Analytic Processing These are (almost) read-only databases. Data is de-normalized for the sake of queries! Multi-dimensional data cube emerging as common data model.
This can be seen as a generalization of SQLsgroup by
Ken Moody (cl.cam.ac.uk) Databases DB 2012 159 / 175
OLAP Databases : Data Models and Design
The big question
Is the relational model and its associated query language (SQL) wellsuited for OLAP databases?
Aggregation (sums, averages, totals, ...) are very common inOLAP queries Problem : SQL aggregation quickly runs out of steam. Solution : Data Cube and associated operations (spreadsheets on
steroids)
Relational design is obsessed with normalization Problem : Need to organize data well since all analysis queries
cannot be anticipated in advance. Solution : Multi-dimensional fact tables, with hierarchy in
dimensions, star-schema design.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 160 / 175
7/23/2019 DBMS Caracterization
81/88
A very influential paper [G+1997]
Ken Moody (cl.cam.ac.uk) Databases DB 2012 161 / 175
From aggregates to data cubes
Ken Moody (cl.cam.ac.uk) Databases DB 2012 162 / 175
7/23/2019 DBMS Caracterization
82/88
The Data Cube
Data modeled as ann-dimensional (hyper-) cube
Each dimension is associated with a hierarchyEach point records facts
Aggregation and cross-tabulation possible along all dimensions
Ken Moody (cl.cam.ac.uk) Databases DB 2012 163 / 175
Hierarchy forLocationDimension
Ken Moody (cl.cam.ac.uk) Databases DB 2012 164 / 175
7/23/2019 DBMS Caracterization
83/88
Cube Operations
Ken Moody (cl.cam.ac.uk) Databases DB 2012 165 / 175
The Star Schema as a design tool
Ken Moody (cl.cam.ac.uk) Databases DB 2012 166 / 175
7/23/2019 DBMS Caracterization
84/88
Lectures 11 : Case Study - Cancer registry for theNHS, Part II
The extension of ECRIC to cover all of England requires schemareconciliation, a problem that remains unresolved since it was firstencountered in the 1980s. Jem Rashbass has a long track record inNHS IT, and is now CEO of ECRIC. Jem will explain what the NHSneeds and why - some of the existing challenges and futureopportunities. The session will close with an open forum in which theDBA of the now national level Cancer Registry DBMS will join Jem.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 167 / 175
Lecture 12 : XML as a data exchange format
OutlineHTML vs. XML
Using XML to solve the data exchange problem
Domain-specific XML schemaNative XML databases
Ken Moody (cl.cam.ac.uk) Databases DB 2012 168 / 175
7/23/2019 DBMS Caracterization
85/88
HTML vs XML
HTML
HTML = Content + (fixed) Schema + (fixed) presentation
Untangle these and generalize to
XML
XML = ContentXSL = defines presentations
DTD or XSchema = defines schema
HTML : Hypertext Markup LanguageXML : eXtensible Markup LanguageXSL : Extensible Stylesheet Language (similar to CSS)CSS : Cascading Style SheetsDTD : Document Type Definition
Ken Moody (cl.cam.ac.uk) Databases DB 2012 169 / 175
XML data is semi-structured UniCode text
Body of text, and possibly nested tags.
An XML schema definestag names
which associated values are optional or requiredtypes of associated values
type of the associated body
What would Churchill say?XML is the worst form of data representation, except for all those otherforms that have been tried from time to time.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 170 / 175
7/23/2019 DBMS Caracterization
86/88
The data exchange problem
Ken Moody (cl.cam.ac.uk) Databases DB 2012 171 / 175
XML as a data exchange standard
Domain-specific schema can become standards.Ken Moody (cl.cam.ac.uk) Databases DB 2012 172 / 175
7/23/2019 DBMS Caracterization
87/88
There are now thousands of domain-specific schema
WML: Wireless markup language (WAP)OFX: Open financial exchangeCML: Chemical markup language
AML: Astronomical markup languageMathML: Mathematics markup languageSMIL: Synchronized multimedia integration languageThML: Theological markup language
.....
The public XML schema is in some many ways dual to the manyprivate SQL schemas involved in data exchange.
Ken Moody (cl.cam.ac.uk) Databases DB 2012 173 / 175
Two basic kinds of XML databases (hybrids possible)
XML-enabled databases Native XML databaseRelational (XML for exchange) direct storage of XML dataData-centric Document-centricSQL XPath and XQueryhttp://www.mysql.com/ http://basex.org
http://exist.sourceforge.net
Ken Moody (cl.cam.ac.uk) Databases DB 2012 174 / 175
7/23/2019 DBMS Caracterization
88/88
The End
(http://xkcd.com/327)
Ken Moody (cl.cam.ac.uk) Databases DB 2012 175 / 175