+ All Categories
Home > Documents > Normal Forms and XML

Normal Forms and XML

Date post: 06-Jan-2016
Category:
Upload: chinara
View: 43 times
Download: 0 times
Share this document with a friend
Description:
Normal Forms and XML. Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 10, 2007. Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan. Announcements. Homework 3 will be due Monday10/22 Midterm: Wednesday 10/24 - PowerPoint PPT Presentation
Popular Tags:
29
Normal Forms and XML Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 10, 2007 me slide content courtesy of Susan Davidson & Raghu Ramakrishnan
Transcript
Page 1: Normal Forms and XML

Normal Forms and XML

Zachary G. IvesUniversity of Pennsylvania

CIS 550 – Database & Information Systems

October 10, 2007

Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Page 2: Normal Forms and XML

2

Announcements

Homework 3 will be due Monday10/22

Midterm: Wednesday 10/24

Please start forming into project groups of 4, with a list of names due 10/24

Page 3: Normal Forms and XML

3

Lossless Join Decomposition

R1, … Rk is a lossless join decomposition of R w.r.t. an FD set F if for every instance r of R that satisfies F,

R1(r) ⋈ ... ⋈ Rk(r) = r

Consider:

What if we decompose on (sid, name) and (serno, subj, cid, exp-grade)?

sid

name serno subj

cid exp-grade

1 Sam 570103

AI 570 B

23 Nitin 550103

DB 550 A

Page 4: Normal Forms and XML

4

Testing for Lossless Join

R1, R2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F+

(R1 R2) R1 – R2

(R1 R2) R2 – R1

So for the FD set:sid nameserno cid, exp-gradecid subj

Is (sid, name) and (serno, subj, cid, exp-grade) a lossless decomposition?

Page 5: Normal Forms and XML

5

Dependency Preservation

Ensures we can “easily” check whether a FD X Y is violated during an update to a database:

The projection of an FD set F onto a set of attributes Z, FZ is

{X Y | X Y F +, X Y Z}i.e., it is those FDs local to Z’s attributes

A decomposition R1, …, Rk is dependency preserving if F + = (FR1 ... FRk)+

The decomposition hasn’t “lost” any essential FD’s, so we can check without doing a join

Page 6: Normal Forms and XML

6

Example of Lossless and Dependency-Preserving Decompositions

Given relation scheme R(name, street, city, st, zip, item, price)

And FD set name street, citystreet, city ststreet, city zipname, item price

Consider the decomposition R1(name, street, city, st, zip) and R2(name, item, price) Is it lossless? Is it dependency preserving?

What if we replaced the first FD by name, street city?

Page 7: Normal Forms and XML

7

Another Example

Given scheme: R(sid, fid, subj)and FD set: fid subj

sid, subj fidConsider the decomposition

R1(sid, fid) and R2(fid, subj)

Is it lossless? Is it dependency preserving?

Page 8: Normal Forms and XML

8

FD’s and Keys

Ideally, we want a design s.t. for each nontrivial dependency X Y, X is a superkey for some relation schema in R We just saw that this isn’t always possible

Hence we have two kinds of normal forms

Page 9: Normal Forms and XML

9

Two Important Normal Forms

Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every X A that holds over R,

either A X (it is trivial) ,oror X is a superkey for R

Third Normal Form (3NF). For every relation scheme R and for every X A that holds over R,

either A X (it is trivial), or X is a superkey for R, or A is a member of some key for R

Page 10: Normal Forms and XML

10

Normal Forms Compared

BCNF is preferable, but sometimes in conflict with the goal of dependency preservation

It’s strictly stronger than 3NF

Let’s see algorithms to obtain: A BCNF lossless join decomposition

(nondeterministic) A 3NF lossless join, dependency preserving

decomposition

Page 11: Normal Forms and XML

11

BCNF Decomposition Algorithm(from Korth et al.; our book gives a recursive version)

result := {R}compute F+while there is a relation schema Ri in result that isn’t in BCNF{

let A B be a nontrivial FD on Ri

s.t. A Ri is not in F+ and A and B are disjoint

result:= (result – Ri) {(Ri - B), (A,B)}}

i.e., A doesn’t form a key

Page 12: Normal Forms and XML

12

An Example

Given the schema:Stuff(sid, name, serno, classroom, cid, fid, prof)

And FDs:sid name serno classroom, cid, fidfid prof

Find the Boyce-Codd Normal Form for this schema

What if instead: sid name classroom, cid serno fid prof serno cid

Page 13: Normal Forms and XML

13

3NF Decomposition AlgorithmLet F be a minimal coveri:=0for each FD A B in F { if none of the schemas Rj, 1 j i, contains AB { increment i Ri := (A, B) }}if no schema Rj, 1 j i contains a candidate key for R { increment i Ri := any candidate key for R}return (R1, …, Ri)

Build dep.-preservingdecomp.

Ensurelosslessdecomp.

Page 14: Normal Forms and XML

14

An Example

Given the schema:Stuff(sid, name, serno, classroom, cid, fid, prof)

And FDs:sid name serno classroom, cid, fidfid prof

Find the Third Normal Form for this schema

What if instead: sid name classroom, cid serno fid prof serno cid

Page 15: Normal Forms and XML

15

Summary of Normalization

We can always decompose into 3NF and get: Lossless join Dependency preservation

But with BCNF: We are only guaranteed lossless joins The algorithm is nondeterministic, so there is not

a unique decomposition for a given schema R

BCNF is stronger than 3NF: every BCNF schema is also in 3NF

Page 16: Normal Forms and XML

16

Normalization Is Good… Or Is It?

In some cases, we might not mind redundancy, if the data isn’t directly updated: Reports (people like to see breakdowns by

semester, department, course, etc.) Warehouses (archived copies of data for doing

complex analysis) Data sharing (sometimes we may export data

into object-oriented or hierarchical formats)

Page 17: Normal Forms and XML

17

XML: A Semi-Structured Data Model

Page 18: Normal Forms and XML

18

Why XML?

XML is the confluence of several factors: The Web needed a more declarative format for data Documents needed a mechanism for extended tags Database people needed a more flexible interchange

format “Lingua franca” of data It’s parsable even if we don’t know what it means!

Original expectation: The whole web would go to XML instead of HTML

Today’s reality: Not so… But XML is used all over “under the covers”

Page 19: Normal Forms and XML

19

Why DB People Like XML

Can get data from all sorts of sources Allows us to touch data we don’t own! This was actually a huge change in the DB community

Interesting relationships with DB techniques Useful to do relational-style operations Leverages ideas from object-oriented, semistructured

data

Blends schema and data into one format Unlike relational model, where we need schema first … But too little schema can be a drawback, too!

Page 20: Normal Forms and XML

20

XML Anatomy<?xml version="1.0" encoding="ISO-8859-1" ?> <dblp> <mastersthesis mdate="2002-01-03" key="ms/Brown92">  <author>Kurt P. Brown</author>   <title>PRPL: A Database Workload Specification Language</title>   <year>1992</year>   <school>Univ. of Wisconsin-Madison</school>   </mastersthesis> <article mdate="2002-01-03" key="tr/dec/SRC1997-018">  <editor>Paul R. McJones</editor>   <title>The 1995 SQL Reunion</title>   <journal>Digital System Research Center Report</journal>   <volume>SRC1997-018</volume>   <year>1997</year>   <ee>db/labs/dec/SRC1997-018.html</ee>   <ee>http://www.mcjones.org/System_R/SQL_Reunion_95/</ee>   </article>

Processing Instr.

Element

Attribute

Close-tag

Open-tag

Page 21: Normal Forms and XML

21

Well-Formed XML

A legal XML document – fully parsable by an XML parser All open-tags have matching close-tags (unlike

so many HTML documents!), or a special:<tag/> shortcut for empty tags (equivalent to

<tag></tag>

Attributes (which are unordered, in contrast to elements) only appear once in an element

There’s a single root element XML is case-sensitive

Page 22: Normal Forms and XML

22

XML as a Data Model

XML “information set” includes 7 types of nodes: Document (root) Element Attribute Processing instruction Text (content) Namespace Comment

XML data model includes this, plus typing info, plus order info and a few other things

Page 23: Normal Forms and XML

23

XML Data Model Visualized(and simplified!)

Root

?xml dblp

mastersthesis article

mdate key

author title year school editor title yearjournal volume eeee

mdatekey

2002…

ms/Brown92

Kurt P….

PRPL…

1992

Univ….

2002…

tr/dec/…

Paul R.

The…

Digital…

SRC…

1997

db/labs/dec

http://www.

attributeroot

p-i element

text

Page 24: Normal Forms and XML

24

What Does XML Do?

Serves as a document format (super-HTML) Allows custom tags (e.g., used by MS Word,

openoffice) Supplement it with stylesheets (XSL) to define

formatting

Data exchange format (must agree on terminology)

Marshalling and unmarshalling data in SOAP and Web Services

Page 25: Normal Forms and XML

25

XML as a Super-HTML(MS Word)

<h1 class="Section1"><a name="_top“ />CIS 550: Database and Information Systems</h1><h2 class="Section1">Fall 2004</h2><p class="MsoNormal">

<place>311 Towne</place>, Tuesday/Thursday<time Hour="13" Minute="30">1:30PM –

3:00PM</time></p>

Page 26: Normal Forms and XML

26

XML Easily Encodes Relations

sid

serno

exp-grade

1 570103

B

23 550103

A<student-course-grade><tuple><sid>1</sid><serno>570103</serno><exp-grade>B</exp-grade></tuple><tuple><sid>23</sid><serno>550103</serno><exp-grade>A</exp-grade></tuple>

</student-course-grade>

Student-course-grade

Page 27: Normal Forms and XML

27

But XML is More Flexible…“Non-First-Normal-Form” (NF2)

<parents> <parent name=“Jean” >

<son>John</son><daughter>Joan</daughter><daughter>Jill</daughter>

</parent> <parent name=“Feng”>

<daughter>Felicity</daughter> </parent>… Coincides with “semi-structured data”,

invented by DB people at Penn and Stanford

Page 28: Normal Forms and XML

28

XML and Code

Web Services (.NET, recent Java web service toolkits) are using XML to pass parameters and make function calls Why?

Easy to be forwards-compatible Easy to read over and validate (?) Generally firewall-compatible

Drawbacks? XML is a verbose and inefficient encoding!

XML is used to represent: SOAP: the “envelope” that data is marshalled into XML Schema: gives some typing info about structures being

passed WSDL: the IDL (interface def language) UDDI: provides an interface for querying about web services

Page 29: Normal Forms and XML

29

Integrating XML: What If We Have Multiple Sources with the Same Tags?

Namespaces allow us to specify a context for different tags

Two parts: Binding of namespace to URI Qualified names

<root xmlns=“http://www.first.com/aspace” xmlns:otherns=“…”>

<tag xmlns:myns=“http://www.fictitious.com/mypath”><thistag>is in the default namespace (aspace)</thistag><myns:thistag>is in myns</myns:thistag><otherns:thistag>is a different tag in otherns</otherns:thistag>

</tag></root>


Recommended