Date post: | 19-Dec-2015 |
Category: |
Documents |
View: | 222 times |
Download: | 1 times |
2/6/05 Salman Azhar: Database Systems
1
XML Salman Azhar
Semi-structured DataXML (Extensible Markup Language)
Well-formed and Valid XMLDocument Type Definitions
IDs and IDREFs
These slides use some figures, definitions, and explanations from Elmasri-Navathe’s Fundamentals of Database Systems
and Molina-Ullman-Widom’s Database Systems
2/6/05 Salman Azhar: Database Systems
2
Framework
1. Information Integration : Making databases from various places
work as one.
2. Semi-structured Data : A new data model designed to cope with
problems of information integration.
3. XML : A standard language for describing semi-
structured data schemas and representing data.
2/6/05 Salman Azhar: Database Systems
3
1. Information Integration Generally databases in an enterprises
have: Several underlying database management
systems Oracle, MS SQL Server, DB2, Informix, Sybase (SQL Server), MS
Access, etc. Several underlying database schemas
Information in an employee table can contain Employee Name, SSN, DOB, title, hrsPerWeek.
modifiedTime, modifiedBy Employee Name, SSN, DOB, title, degree, createTime,
createBy Employee Name, SSN, DOB, title, salary, modifiedTime,
modifiedBy, createTime, createBy
2/6/05 Salman Azhar: Database Systems
4
2. Semi-structured Data A new data model designed to
cope with problems of information integration
Accommodates of different DBMS Oracle, MS SQL Server, DB2, Informix, Sybase (SQL Server), MS
Access, etc.
Integrates different schemas Employee Name, SSN, DOB, title, hrsPerWeek, modifiedTime,
modifiedBy Employee Name, SSN, DOB, title, degree, createTime, createBy Employee Name, SSN, DOB, title, salary, createTime, createBy,
modifiedTime, modifiedBy
2/6/05 Salman Azhar: Database Systems
5
3. XML A standard language for
describing semi-structured data schemas and representing data.
2/6/05 Salman Azhar: Database Systems
6
The Information-Integration Problem
Major bottleneck in enterprise application integration
For example… Hewlett Packard split into HP and
Agilent Need to separate data into different
destinations HP bought Compaq
Need to integrate data from different sources
2/6/05 Salman Azhar: Database Systems
7
The Information-Integration Problem
Related data exists in many places and could, in principle, work together.
But different databases differ in:1. Model
relational, object-oriented?2. Schema
normalized/denormalized?3. Terminology
are consultants employees? Retirees? Subcontractors?
4. Conventions meters versus feet?
2/6/05 Salman Azhar: Database Systems
8
Example Consider merger of two stores in a
Mall may be some overlap in the products
sold but the databases are different
2/6/05 Salman Azhar: Database Systems
9
Example Each company has a database
One may use a relational DBMS the other keeps the data in an MS-Word
document One stores the phones of distributors,
the other does not One distinguishes products in one
department the other doesn’t
One counts inventory by number of items, the other by cases
2/6/05 Salman Azhar: Database Systems
10
Two Approaches to Integration
1. Warehousing Makes a copy of the data
More developed of the two
2. Mediation Creates a view of the data
Newer and less developed
2/6/05 Salman Azhar: Database Systems
11
Warehouse Diagram
Warehouse
Wrapper Wrapper
Source 1 Source 2
User query Result
2/6/05 Salman Azhar: Database Systems
12
A Mediator
Mediator
Wrapper Wrapper
Source 1 Source 2
User query
Query
Query
QueryQuery
Result
Result
Result
Result
Result
2/6/05 Salman Azhar: Database Systems
13
Warehousing Make copies of the data sources at a
central site and transform it to a common schema
Reconstruct data daily/weekly Do not try to keep it more up-to-date than that. Pro:
very well-developed several commercial tools are available
Con: data can be old since updates are expensive 24-hour availability threatened by large data updates
2/6/05 Salman Azhar: Database Systems
14
Mediation Create a view of all sources, as if they
were integrated Answers a view query by translating it to
terminology of the sources and querying them
Pro: Current data
Con: Can be slow as it requires real time merger of
different data sources Lack of tools available
2/6/05 Salman Azhar: Database Systems
15
Warehouse Diagram
Warehouse
Wrapper Wrapper
Source 1 Source 2
User query Result
2/6/05 Salman Azhar: Database Systems
16
A Mediator
Mediator
Wrapper Wrapper
Source 1 Source 2
User query
Query
Query
QueryQuery
Result
Result
Result
Result
Result
2/6/05 Salman Azhar: Database Systems
17
Semi-structured: Motivation
Most effective approach to Information Integration: Semi-structured Data Model or Semi-structured Objects
2/6/05 Salman Azhar: Database Systems
18
Semi-structured: Motivation
Main limitation of Object-Oriented Models: Object Models are Strongly Typed
Objects of a class have one structure only
Semi-structured approach solves this problem
2/6/05 Salman Azhar: Database Systems
19
Semi-structured Data Purpose:
Represent data from independent sources more flexibly than
either relational or object-oriented models
2/6/05 Salman Azhar: Database Systems
20
Semi-structured Data Each object has a class of their
own and properties are defined whatever labels are attached to that object Properties mean
attributes, relationships, methods, etc.
2/6/05 Salman Azhar: Database Systems
21
Semi-structured Data Think of objects
but with the type of each object is the objects its own business
not that of its “class”
Labels to indicate meaning of substructures
2/6/05 Salman Azhar: Database Systems
22
Semi-structured Graphs
Easy to think of Semi-structured data as Graphs Nodes = objects Labels on arcs =
attributes leading to a leaf node relationships leading to another node
2/6/05 Salman Azhar: Database Systems
23
Semi-structured Graphs
Atomic values at leaf nodes nodes with no arcs out
Flexibility: no restriction on… labels out of a node number of successors with a given
label
2/6/05 Salman Azhar: Database Systems
24
Example: Data Graph
Pepsi
PepsiCo
BestSeller2003
Main StKFC
Sobe
soda sodarest
manfmanf
sellsAt
name
namename
addr
prize
year award
root
The soda object for Pepsi
(arc-in called soda;
arc-out called name to Pepsi)
Notice anew kindof data.
Root object represents the entire DB. Often look like trees, but are not.
The restaurant object for
KFC (arc-in called rest;
arc-out labeled name to
KFC)
2/6/05 Salman Azhar: Database Systems
25
Stage is Now Set for XML
A technology has application to different situations foundations remain the same applications changes
2/6/05 Salman Azhar: Database Systems
26
Extensible Markup Language (XML)
XML uses tags for semantics (e.g., “this is an
address”) HTML
uses tags for formatting (e.g., “italic”), Key idea:
create tag sets for a domain (e.g., genomics) translate all data into properly tagged XML
docs
2/6/05 Salman Azhar: Database Systems
27
Well-Formed and Valid XML Well-Formed XML
allows you to invent your own tags similar to labels in semi-structured data graph
Valid XML involves a DTD (Document Type Definition) DTD gives
a grammar for the use of labels limits the set of labels our of node the order and number of times a label occurs
2/6/05 Salman Azhar: Database Systems
28
Well-Formed XML All XML documents have
Header Body
Header defines version specifies that the document is in well-
formed XML Body can include
root tag several properly matching tags
2/6/05 Salman Azhar: Database Systems
29
Well-Formed XML: Header
Start the document with a declaration surrounded by <? … ?> .
Normal declaration for Well-Formed XML is:
<? XML VERSION = “1.0” STANDALONE = “yes” ?>
Version indicates version number Standalone = “yes” means no DTD
no DTD means well-formed XML
2/6/05 Salman Azhar: Database Systems
30
Well-Formed XML: Body
Body of document is a root tag surrounding nested tags. Body can include:
several properly matching tags (as in html structure)
special tag called root tag can have a special meaning such as document
type or can be generic
2/6/05 Salman Azhar: Database Systems
31
Tags
Tags, as in HTML are normally matched pairs, as
<BLAH> … </BLAH> may be nested arbitrarily some tags requiring no matching
ending such as <P> in HTML, are also permitted however, we will not use these in
examples
2/6/05 Salman Azhar: Database Systems
32
Example: Well-Formed XML
<? XML VERSION = “1.0” STANDALONE = “yes” ?>
<RESTS><REST>
<NAME>Taco Bell</NAME><SODA><NAME>Pepsi</NAME>
<PRICE>1.00</PRICE></ SODA>
<SODA><NAME>Sobe</NAME><PRICE>2.00</PRICE></
SODA></REST ><REST> …</REST >…
</RESTS>
Root tag RESTS surrounds the
entire document
One of several nested REST tags
representing information about a
single REST<NAME> tag
specifies the REST name
<SODA> tags have names and price for
each Soda nested in
<NAME> and <PRICE> tags
Literal Data items are
contained at the atomic level
2/6/05 Salman Azhar: Database Systems
33
XML and Semi-structured Data Consider this…
Is Well-Formed XML documents with nested tags is exactly the same idea as trees of semi-structured data?
Tags are the labels on edges
Nodes represent data between matching tags
Parent-child relationship is immediate nesting in XML
2/6/05 Salman Azhar: Database Systems
34
XML and Semi-structured Data
Semi-structured approach allows for non-tree structures
We shall see that XML also enables non-tree structures mimics the semi-structured data
model
2/6/05 Salman Azhar: Database Systems
35
Group Exercise
Convert the following into a Semi-structured representation<? XML VERSION = “1.0” STANDALONE = “yes”
?><RESTS>
<REST><NAME>Taco Bell</NAME><SODA><NAME>Pepsi</NAME>
<PRICE>1.00</PRICE></ SODA>
<SODA><NAME>Sobe</NAME><PRICE>2.00</PRICE></
SODA></REST ><REST> …</REST >…
</RESTS>
Note: Do not turn over to the next page before
attempting this exercise yourself!
2/6/05 Salman Azhar: Database Systems
36
Solution:The semi-structured representation
Taco Bell
Pepsi 1.00 Sobe 2.00
PRICE
REST
REST
RESTS
NAME . . .
REST
PRICENAME
SODASODA
NAME
Note: Data is stored in leaf
nodes and structure (tags) in
internal nodes
<? XML VERSION = “1.0” STANDALONE = “yes” ?><RESTS> <REST> <NAME>Taco Bell</NAME> <SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></ SODA> <SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA> </REST > <REST> … </REST > …</RESTS>
2/6/05 Salman Azhar: Database Systems
37
Valid XML Switching gears: Well-formed to Valid XML
Valid XML is the most interesting use of XML Essentially a context-free grammar for
describing XML tags and their nesting Specified by DTD
Each domain of interest creates one DTD that describes all the documents this group will share
For example, electronic components, travel industry, etc., will have their own DTDs
2/6/05 Salman Azhar: Database Systems
38
DTD Structure
<!DOCTYPE <root tag> [<!ELEMENT <name> ( <components> )
<more elements>]>
Note: !DOCTYPE is key word with <root tag>
being the name of DOCTYPE
Between [ … ] list of ELEMENT definition
Each !ELEMENT has a <name> with the allowed list of <components> usually in
the order listed
2/6/05 Salman Azhar: Database Systems
39
DTD Elements
Element definition consists of its name (tag) and a parenthesized description of any
nested tags includes order of subtags and their multiplicity (0, 1, or many times)
Leaves (text elements) have #PCDATA in place of nested tags
2/6/05 Salman Azhar: Database Systems
40
Example: DTD
<!DOCTYPE RESTS [<!ELEMENT RESTS (REST*)><!ELEMENT REST (RNAME, SODA+)><!ELEMENT NAME (#PCDATA)>
]>
RESTS can have * (0 or more) REST
REST has NAME and then + (1 or more) SODA… Order matters!
NAME and PRICE are data
(#PCDATA): No more tags just
text
SODA has NAME followed PRICE
SODA’s NAME and PRICE are data (#PCDATA)
GROUP EXERCISE: COMPLETE THE DTD
Note: Do not turn over to the next page before attempting this exercise yourself!
2/6/05 Salman Azhar: Database Systems
41
Example: DTD
<!DOCTYPE RESTS [<!ELEMENT RESTS (REST*)><!ELEMENT REST (RNAME, SODA+)><!ELEMENT NAME (#PCDATA)><!ELEMENT SODA (NAME, PRICE)>
<!ELEMENT NAME (#PCDATA)><!ELEMENT PRICE (#PCDATA)>
]>
RESTS can have * (0 or more) REST
REST has NAME and then + (1 or more) SODA… Order matters!
NAME and PRICE are data
(#PCDATA): No more tags just
textSODA has NAME followed PRICE
2/6/05 Salman Azhar: Database Systems
42
Element Descriptions Rules
Subtags must appear in order shown A tag may be followed by a symbol to
indicate its multiplicity: Identical to UNIX regular expressions. * = zero or more. + = one or more. ? = zero or one.
Alternative sequences of tags can be connected by the symbol |
2/6/05 Salman Azhar: Database Systems
43
Example: Element Description
A name is Either an optional title (e.g., “Dr.”), a
first name, and a last name, in that order,
or it is an IP address
<!ELEMENT NAME (
(TITLE?, FIRST, LAST) | IPADDR
)>
Alternative symbol
2/6/05 Salman Azhar: Database Systems
44
Use of DTDs
In order to specify a document follows a particular DTD
1. Set STANDALONE = “no”a) Either include the DTD as a preamble of
the XML documentb) Follow DOCTYPE and the <root tag> by
SYSTEM and a path to the file where the DTD is stored
2/6/05 Salman Azhar: Database Systems
45
Example (a)<? XML VERSION = “1.0” STANDALONE = “no” ?><!DOCTYPE RESTS [
<!ELEMENT RESTS (REST*)><!ELEMENT REST (NAME, SODA+)><!ELEMENT NAME (#PCDATA)><!ELEMENT SODA (NAME, PRICE)><!ELEMENT NAME (#PCDATA)><!ELEMENT PRICE (#PCDATA)>
]>
<RESTS><REST>
<NAME>Taco Bell</NAME><SODA><NAME>Pepsi</NAME> <PRICE>1.00</PRICE></
SODA><SODA><NAME>Sobe</NAME> <PRICE>2.00</PRICE></SODA>
</REST ><REST> …</REST >…
</RESTS>
DTD
Document
Same as earlier but this time it conforms to the above DTD
2/6/05 Salman Azhar: Database Systems
46
Example (b) Assume the RESTS DTD is in file
rest.dtd<? XML VERSION = “1.0” STANDALONE = “no” ?><!DOCTYPE Rests SYSTEM “rest.dtd”><RESTS>
<REST><NAME>Taco Bell</NAME><SODA><NAME>Pepsi</NAME>
<PRICE>1.00</PRICE></ SODA><SODA><NAME>Sobe</NAME>
<PRICE>2.00</PRICE></SODA></REST ><REST> …</REST >…
</RESTS>
Get the DTD from the file rest.dtd
DocumentSame as
earlier but this time it conforms to the DTD in
rest.dtd
2/6/05 Salman Azhar: Database Systems
47
Attributes Attributes are another important
component of DTD and XML docs Opening tags in XML can have
attributes like <A HREF = “…”> in HTML
In DTD <!ATTLIST <elementname>… > gives a list of attributes and their data types
for this element
2/6/05 Salman Azhar: Database Systems
48
Example: Attributes Rests can have an attribute kind
which is either qsr, family, or other. The element definition is unchanged However, we add an ATTLIST. <!ELEMENT REST (NAME SODA*)>
<!ATTLIST REST kind “qsr” |
“family” | “other”>
2/6/05 Salman Azhar: Database Systems
49
Example: Attribute Use In a document that allows REST tags, we
might see:<REST kind = “qsr”>
<NAME>KFC</NAME>
<SODA><NAME>Pepsi</NAME>
<PRICE>1.00</PRICE></SODA>
...
</REST>
New info: kind = “qsr”
2/6/05 Salman Azhar: Database Systems
50
IDs and IDREFs Introduce links from one object to
another Allows the structure of an XML
document to be a general graph rather than just a tree.
These are pointers from one object to another in analogy to HTML’s NAME = “blah” and
HREF = “#blah”
2/6/05 Salman Azhar: Database Systems
51
Creating IDs
We give an element Elephant an attribute Attention of type ID in the DTD
When using tag <Elephant> in an XML document, give its attribute Attention a unique value. For example,
<Elephant Attention = “213”>
2/6/05 Salman Azhar: Database Systems
52
Creating IDREFs
IDREFs are similar to IDs: To allow objects of type Fig to refer to
another object with an ID attribute, give Fig an attribute of type IDREF (single
string of type ID) Or, let the attribute have type IDREFS,
so the Fig –object can refer to any number of other objects (any number strings of type ID).
2/6/05 Salman Azhar: Database Systems
53
Example: IDs and IDREFs Let us redesign our RESTS DTD to include
both REST and SODA sub-elements Both rests and sodas will have ID attributes
called name Rests have PRICE sub-objects,
consisting of a number (the price of one soda) and an IDREF theSoda leading to that soda
Sodas have attribute soldBy, which is an IDREFS leading to all the rests that sell it
2/6/05 Salman Azhar: Database Systems
54
The DTD
<!DOCTYPE Rests [<!ELEMENT RESTS (REST*, SODA*)><!ELEMENT REST (PRICE+)>
<!ATTLIST REST name ID><!ELEMENT PRICE (#PCDATA)>
<!ATTLIST PRICE theSoda IDREF><!ELEMENT SODA ()>
<!ATTLIST SODA name ID, soldBy IDREFS>
]>
RESTS have 0+ REST and 0+
SODA
REST objects have name as an ID attribute and have one or more PRICE sub-
objectsPRICE
objects have a
number (the price) and
one reference to
a soda Soda objects have an ID attribute called name,and a soldBy attribute that is a set of Rest
names
2/6/05 Salman Azhar: Database Systems
55
Example Document
<RESTS><REST name = “Taco Bell”>
<PRICE theSoda = “Pepsi”>1.00</PRICE>
<PRICE theSoda = “Sobe”>2.00</PRICE></REST> …<SODA name = “Pepsi”, soldBy = “KFC,
TacoBell,…”></SODA> …
</RESTS>
<!DOCTYPE Rests [<!ELEMENT RESTS (REST*, SODA*)><!ELEMENT REST (PRICE+)>
<!ATTLIST REST name ID><!ELEMENT PRICE (#PCDATA)>
<!ATTLIST PRICE theSoda IDREF><!ELEMENT SODA ()>
<!ATTLIST SODA name ID, soldBy IDREFS>
]>
2/6/05 Salman Azhar: Database Systems
56
Recap
Semi-structured Data XML (Extensible Markup Language) Well-formed and Valid XML Document Type Definitions IDs and IDREFs
2/6/05 Salman Azhar: Database Systems
57
Perspective
Here XML is used as a EDI medium EDI = electronic data interchange
There are many other using for XML Each has its own utilization