Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-2
Topics in this Chapter
• The Web and the Internet• An Overview of XML• XML Data Definition• XML Data Manipulation• XML and Databases• SQL Facilities
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-3
The Web and the Internet
• Often thought of as synonymous, the Web and the Internet refer to two different arenas
• The Web is a gigantic amorphous database • The Internet is a giant network• URL’s are used to locate resources on the
network(Uniform Resource Locator/Identifier)
• Markup languages are used to interact with the database
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-4
Hypertext
• Hypertext Markup Language is a simple language for creating and displaying documents
• Hypertext Transfer Protocol(HTTP) is used to transfer these documents over the internet
• At each server data can be served up from system files, or from databases
• The databases on web servers can be SQL databases
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-5
XML
• XML provides extensions that permit the markup language to interact with hypertext as well as many other languages, including SQL, and so is useful when implementing web databases
• XML normally begins with a header called a declaration, followed by an element, consisting of start tag, character data, and end tag
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-6
XML
• XML normally begins with a header called a declaration, followed by an element, consisting of start tag, character data, and end tag
• XML declaration XML element start tag, character data, end tag
• <?xml version=”1.0”>• <greeting kind=“succinct”>Hello, World.</greeting>
greeting tag;kind=“succinct” XML attributeAttribute name is “kind”; value=“succinct”
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-7
XML History
• XML was created in 1996 to overcome limitations in SGML and HTML
• SGML is large and complicated• HTML fails to separate structural, semantic,
and formatting meta-data, and is not always “well-formed”
• XML has not supplanted HTML in web browsers, but is used in other areas, especially data interchange
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-8
XML History
SGML is large and complicated. It allow user to define their own tags and give their meaning
• <Paragraph> <Sentence>
<Subject> You</ Subject > <verb> Should specify</ verb > <Object>the <adjective> <emph1> first</ emph1 > </adjective> parameter </ Object ></ Sentence ><Sentence>……..</ Sentence >
• </Paragraph>
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-9
XML Applications
• Purchase orders, parts catalogues, and inventory records can be expressed in XML
• A database could consist of XML documents only, but it would NOT be relational
• XML can be used to represent relations, which could facilitate interchange between the internet and relational databases
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-10
XML Applications
• <?xml version+”1.0”>• <!– This is an XML representation of the parts relatoin --- >• <!-This partsdatabase Parts # PNAME COLOR WEIGHT CITY>
• <Partsrelation>
<Partstuple>
<PNUM> P1</PNUM>
<PNAME> NUT</ PNAME >
<COLOR> RED> </COLOR>
<WEIGHT> 12 </WEIGHT>
<CITY>LONDON </ CITY >
</Partstuple>
<Partstuple>
<PNUM> P2</PNUM>
<PNAME>left-wing-part-10th-part-Bolt</ PNAME >
<COLOR> Green </COLOR>
<WEIGHT> 17 </WEIGHT>
<CITY>Paris </ CITY >
</Partstuple>
• </Partsrelation>
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-11
XML Applications
• An XML information set is a document hierarchy
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-12
XML Hierarchy
• The root node is the top, and it has children• Each child has one parent• Relations are structured; XML documents are
said to be semi-structured, because its rules are looser
• An API to XML’s document object model supports retrieval, insertion, deletion and updates(pp901)
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-13
DTDs
• Document Type Definitions can be constructed using the DTD definition language
• DTDs are part of the XML standard• A DTD can mirror the structure of a relation
and then be used to format the output from queries
• In turn, the XML document produced can be used to generate a relation at the other end
• Text objects must be well-formed and valid
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-14
XML Applications
• <?xml version=”1.0”>• <!– This is an XML representation of the parts relatoin --- >• <DOCTYPE. . . .>
• <Partsrelation>
<Note>Revised Version </Note>
<Partstuple CITY= “LONDON” >
<PNUM> P1</PNUM>
<PNAME> NUT</ PNAME >
<WEIGHT> 12 </WEIGHT>
<Note>Part COLOR is Red by Default </Note>
</Partstuple>
<Partstuple COLOR=“GREEN”, CITY= “PARIS” >
<PNUM> P2</PNUM>
<PNAME>left-wing-part-10th-part-Bolt</ PNAME >
<COLOR> Green </COLOR>
<WEIGHT> 17 </WEIGHT>
</Partstuple>
• </Partsrelation>
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-15
XML Applications
• <?xml version+”1.0”>
• <!– This is an XML representation of the parts relation --- >• <!-Marker’s meaning ? Zero or one, * zero or more, +1 or more>• <DOCTYPE. . . .>• <Partsrelation>
• 1. <!elements Partsrelation (Note. Partuple*)><Note>Revised Version </Note> 2. <!elements Note(#PCDATA)>
<Partstuple CITY= “LONDON” >3. <!elements Partuple (PNUM. PNAME, WEIGHT, NOTE?)>
4. <attribute Partuple CITY(LONDON|Oslo|Paris) #required COLOR( Red|Green|Blue) “Red”>
<PNUM> P1</PNUM> 5. <!elementsPNUM (#PCDATA)>
<PNAME> NUT</ PNAME > 6. <!elementsPNAME (#PCDATA)>
<WEIGHT> 12 </WEIGHT> 7. <!elements WEIGHT (#PCDATA)>
<Note>Part COLOR is Red by Default </Note></Partstuple><Partstuple COLOR=“GREEN”, CITY= “PARIS” >
<PNUM> P2</PNUM> <PNAME>left-wing-part-10th-part-Bolt</ PNAME > <COLOR> Green </COLOR> <WEIGHT> 17 </WEIGHT></Partstuple>
• </Partsrelation>
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-16
Well-Formedness
• A textual object is well-formed if and only if:• It conforms to the grammar defined in the
XML standard• Any textual object it references is well-formed• Examples of fatal flaws: • Start and end tags don’t match, or are missing• More than one root element included
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-17
Validity
• A textual object is valid if and only if it is well-formed and it conforms to a specified DTD
• DTDs can support uniqueness and referential constraints via ID and IDREF attribute types
• These constraints do not function as keys, but can be used to transmit information from one relvar to another
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-18
Limitations of DTDs
• DTDs do not use XML syntax, and they cannot be processed by XML parsers
• Since everything in this arena is a character string, data type support is lacking
• They enforce an ordering of elements that is contra-relational
• They are still beneficial because they enforce a standard that is widely used
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-19
XML Schema
• XML Schema is an XML derivative, and can be interpreted by XML parsers
• Are written using a collection of names, from a name space (http://www.w3.org/2002/XMLSchema)
• The name space specification: xmlns:xsd=“http://www.w3.org/2002/XMLSchema”
• It is considerably more prolix• XML can enforce primitive types and some
derived types• XML types have essentially no operators
because “types” are still character strings
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-20
XML Schema
• XML Schema is an XML derivative, and can be interpreted by XML parsers
• It is considerably more prolix• XML can enforce primitive types and some
derived types• XML types have essentially no operators
because “types” are still character strings
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-21
XML Data Manipulation
• XQuery is based on Xpath, which means that it is a read-only facility for traversing XMLs hierarchical paths
• Because XQuery can report horizontal and vertical subsets, and combine the results, it is said to support “select, project, and join”
• XUpdate is in the early planning stages, but presumably will support updates
• For now, only proprietary solutions to update
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-22
XML and Databases
• Three approaches:• Store XML documents as attributes• Shred documents into attributes• Store XML documents in “XML databases”
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-23
XML Documents as Attributes
• Define a new type, XMLDOC• As a new type, XMLDOC should have
operators defined, that can retrieve like XQuery, and that can check for well-formedness and validity
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-24
XML Documents Shred and Publish
• An XML document may be shredded into its components, which are then stored as attributes
• Attributes can be recombined and published as XML Documents
• This is an effective way for SQL databases to interact with the web
• Relational databases do not store hierarchies, nor are they intrinsically ordered, so shred and publish may not be “nonloss”
Copyright © 2004 Pearson Addison-Wesley. All rights reserved. 27-25
SQL Facilities
• XML Collection will offer support for shred and publish, where the publish feature supports publishing the XML data, and its schema
• XML Column will offer a new built-in type, XML that will come an XMLGEN operator to publish XML documents
• Database vendors offer built-in functions that can read and write elements within XML attribute values, e.g., XMLFILETOCLOB