1 Chapter 10: XML What is XML What is XML Basic Components of XML Basic Components of XML XPath...

Post on 29-Dec-2015

274 views 0 download

Tags:

transcript

1

Chapter 10: XMLChapter 10: XML

What is XMLWhat is XML Basic Components of XMLBasic Components of XML XPathXPath XQueryXQuery

2

What is XML?What is XML?

EExxtensible tensible MMarkup arkup LLanguageanguage Structured markupStructured markup Simplified SGMLSimplified SGML Next-generation HTMLNext-generation HTML W3C Recommendation (spec)W3C Recommendation (spec)

World Wide Web ConsortiumWorld Wide Web Consortium

3

Family TreeFamily Tree

SGML (1985)

HTML (1993)

XML (1998)

GML (1969)

4

HTML ExampleHTML Example

<HTML><HTML><HEAD><HEAD><TITLE>HTML example</TITLE><TITLE>HTML example</TITLE></HEAD> </HEAD>

<BODY> <BODY>

<H1>HTML example</H1> <H1>HTML example</H1>

<P>This is an example of HTML markup codes. </P><P>This is an example of HTML markup codes. </P>

</BODY></BODY>

</HTML></HTML>

ExampleExample

5

HTML and XMLHTML and XML

HTML: HTML: content and presentation are mixed, structure?content and presentation are mixed, structure? Tags, e.g. <H>, <li>, are fixed and specify Tags, e.g. <H>, <li>, are fixed and specify

presentation presentation XML:XML:

Content, presentation, and structure are Content, presentation, and structure are separatedseparated

User can define new tags with meaningful User can define new tags with meaningful annotationannotation

6

Basic SyntaxBasic Syntax

Starts with XML declarationStarts with XML declaration<?xml version="1.0" standalone=“yes”?><?xml version="1.0" standalone=“yes”?>

Rest of document inside the "root Rest of document inside the "root element"element"<TEI.2>…</TEI.2><TEI.2>…</TEI.2>

<state><state>

<sname> Texas </sname><sname> Texas </sname>

<scode> TX </scode><scode> TX </scode>

</state></state>

7

Two Kinds of XMLTwo Kinds of XML

Standalone Standalone <?xml version="1.0" standalone=“yes”?><?xml version="1.0" standalone=“yes”?>

Using Document Type Definition (DTD)Using Document Type Definition (DTD) <?xml version="1.0" standalone=“no”?><?xml version="1.0" standalone=“no”?> <!DOCTYPE state SYSTEM “state.dtd”><!DOCTYPE state SYSTEM “state.dtd”> DTD is the meta-data to describe available tagsDTD is the meta-data to describe available tags <!DOCTYPE state[<!DOCTYPE state[

<!ELEMENT state(sname, scode)><!ELEMENT state(sname, scode)>

<!ELEMENT sname (#PCDATA)><!ELEMENT sname (#PCDATA)>

<!ELEMENT scode (#PCDATA)><!ELEMENT scode (#PCDATA)>

]>]>

8

HTML is an application of HTML is an application of XMLXML

Available tags, e.g. <P> are used to Available tags, e.g. <P> are used to describe presentationdescribe presentation

Where is the DTD of HTML?Where is the DTD of HTML?

9

Well-formed vs. ValidWell-formed vs. Valid

XML must be XML must be well-formedwell-formed correct syntaxcorrect syntax tags match, tags nest, all characters legaltags match, tags nest, all characters legal parser must reject if not well-formedparser must reject if not well-formed

XML may be XML may be validvalid with respect to a with respect to a DTD (Document Type Definition)DTD (Document Type Definition) tags are used correctlytags are used correctly tags are all declaredtags are all declared attributes are declaredattributes are declared

10

Validity CheckingValidity Checking

Checks everything specified in a DTDChecks everything specified in a DTD Can't check text (currency, spelling)Can't check text (currency, spelling) Checks against DTD: this is a valid memo, Checks against DTD: this is a valid memo,

book, bibliography, ...book, bibliography, ...

11

XML SyntaxXML Syntax

The XML declarationThe XML declaration ElementsElements EntitiesEntities TextText Declarations and NotationsDeclarations and Notations Processing InstructionsProcessing Instructions CommentsComments

12

The XML DeclarationThe XML Declaration

At very beginning of fileAt very beginning of file Officially optional, but always use itOfficially optional, but always use it Can declare version, encoding, standaloneCan declare version, encoding, standalone

Must be in that orderMust be in that order Each is optionalEach is optional

Must declare other encodingsMust declare other encodings <?xml encoding="Big5"?><?xml encoding="Big5"?>

<?xml encoding="ISO-8859-1"?><?xml encoding="ISO-8859-1"?>

13

ElementsElements

Basic building block of XMLBasic building block of XML Star and end tagStar and end tag

<person>Nico</person><person>Nico</person> Attributes: <date format=“iso8601”> Attributes: <date format=“iso8601”>

</date></date> May be abbreviated by: <date May be abbreviated by: <date

format=“iso8601”/> format=“iso8601”/> Elements can be arbitrary nested to Elements can be arbitrary nested to

describe very rich information structuredescribe very rich information structure

14

Elements and AttributesElements and Attributes Attributes can parameterize an elementAttributes can parameterize an element

<state region = “Southen”><state region = “Southen”> <sname> Texas </sname><sname> Texas </sname> <scode> TX </scode><scode> TX </scode> </state></state>

Can be represented by sub-element Can be represented by sub-element <state><state> <region> Southen </region><region> Southen </region> <sname> Texas </sname><sname> Texas </sname> <scode> TX </scode><scode> TX </scode> </state></state>

15

Attribute SyntaxAttribute Syntax

Name can be any Unicode character, digit, Name can be any Unicode character, digit, or '.', '-', '_'or '.', '-', '_'

Cannot repeat: Cannot repeat: same attribute name can not appear more same attribute name can not appear more

than once in an elementthan once in an element Order doesn't matterOrder doesn't matter Values must be quoted (single or double)Values must be quoted (single or double) Values may not contain "<"Values may not contain "<" Values may have defaults in DTDValues may have defaults in DTD

16

Attributes and Sub-Attributes and Sub-elementselements

A matter of preferenceA matter of preference Main differences:Main differences:

Attribute name can not repeat in the same Attribute name can not repeat in the same elementelement

Sub-element can repreatSub-element can repreat Attribute values are always string dataAttribute values are always string data

Sub-elements can have further sub-elementsSub-elements can have further sub-elements

17

Special AttributesSpecial Attributes

id has unique identifier for elementid has unique identifier for element idref references an ididref references an id

<state id = “texas”> <sname> Texas </sname> <scode> TX </scode> <cityin idref = “dallas”/> </state>

<city id = “dallas”> <dcode> DAL </ccode> <cname> Dallas </cname> <stateof idref = “texas”/></city>

18

A unit of textA unit of text Five predefined entitiesFive predefined entities

&amp; (&) &apos;(‘) &lt;(<) &gt;(>) &quot;&amp; (&) &apos;(‘) &lt;(<) &gt;(>) &quot;(“)(“)

Define your own in DTDDefine your own in DTD<!ENTITY euro "&#x20AC;"><!ENTITY euro "&#x20AC;">

Use numeric character referencesUse numeric character references&#x20AC; &#8364;&#x20AC; &#8364;

EntitiesEntities

19

TextText

Character stringsCharacter strings Use predefined entities (&lt; &amp; …)Use predefined entities (&lt; &amp; …)

XML Example: &lt; (>) &amp;(&) &lt;(<)XML Example: &lt; (>) &amp;(&) &lt;(<) CDATA ("character data") section for raw CDATA ("character data") section for raw

text without using entitiestext without using entities<![CDATA[ if a< b then print a is less than b<![CDATA[ if a< b then print a is less than b

]]>]]>

20

DeclarationsDeclarations

Allow validity checkingAllow validity checking OptionalOptional May be internal (in document), external, or May be internal (in document), external, or

bothboth DTD (Document Type Definition) is all DTD (Document Type Definition) is all

active declarationsactive declarations Use existing DTDs when possibleUse existing DTDs when possible

21

External DTDExternal DTD

Most commonMost common Use DOCTYPE declaration before root Use DOCTYPE declaration before root

elementelement <!DOCTYPE greeting SYSTEM "hello.dtd"><!DOCTYPE greeting SYSTEM "hello.dtd">

<greeting>Hello, world!</greeting><greeting>Hello, world!</greeting>

22

Internal (standalone) DTDInternal (standalone) DTD

For custom documentsFor custom documents Also uses DOCTYPE declarationAlso uses DOCTYPE declaration

<!DOCTYPE greeting [<!DOCTYPE greeting [<!ELEMENT greeting (#PCDATA)><!ELEMENT greeting (#PCDATA)>]>]><greeting>Hello, world!</greeting><greeting>Hello, world!</greeting>

Specify in XML declarationSpecify in XML declaration <?xml version="1.0" standalone="yes"?><?xml version="1.0" standalone="yes"?>

23

External plus Internal DTDExternal plus Internal DTD

Usually to declare entitiesUsually to declare entities Use DOCTYPE declaration before root Use DOCTYPE declaration before root

elementelement <!DOCTYPE greeting SYSTEM "hello.dtd" [<!DOCTYPE greeting SYSTEM "hello.dtd" [

<!ENTITY excl "&#x21;"><!ENTITY excl "&#x21;">]>]><greeting>Hello, world&excl;</greeting><greeting>Hello, world&excl;</greeting>

24

Element Type DeclarationsElement Type Declarations

Declare nameDeclare name Declare allowed contentDeclare allowed content

<!ELEMENT a EMPTY><!ELEMENT a EMPTY><!ELEMENT either (one | theother)><!ELEMENT either (one | theother)><!ELEMENT ordered (first, second)><!ELEMENT ordered (first, second)><!ELEMENT list (item+)><!ELEMENT list (item+)><!ELEMENT dl ((dt?, dd?)*)><!ELEMENT dl ((dt?, dd?)*)><!ELEMENT text (#PCDATA)><!ELEMENT text (#PCDATA)><!ELEMENT mixed (#PCDATA | b | i | em)><!ELEMENT mixed (#PCDATA | b | i | em)>

25

Attribute List DeclarationsAttribute List Declarations

Declare attributes for an elementDeclare attributes for an element Declare value typesDeclare value types Declare defaultsDeclare defaults

<!ATTLIST termdef<!ATTLIST termdef id ID #REQUIRED id ID #REQUIRED name CDATA #IMPLIED> name CDATA #IMPLIED><!ATTLIST list<!ATTLIST list type (bullets|ordered|glossary) type (bullets|ordered|glossary) "ordered">"ordered"><!ATTLIST form<!ATTLIST form method CDATA #FIXED "POST"> method CDATA #FIXED "POST">

26

Entity DeclarationsEntity Declarations

<!ENTITY copy “&#x00A9;”><!ENTITY copy “&#x00A9;”> <!ENTITY copyright <!ENTITY copyright

"&copy; Infoseek Corp. 1999, All rights "&copy; Infoseek Corp. 1999, All rights reserved">reserved">

27

Processing InstructionsProcessing Instructions

Instructions to applicationsInstructions to applications fonts?fonts? security?security? correctness checks?correctness checks?

Linking to a style sheetLinking to a style sheet<?xml-stylesheet href="mystyle.css" <?xml-stylesheet href="mystyle.css"

type="text/css"?> type="text/css"?> Instructions to indexing robotsInstructions to indexing robots

<?robots index="no" follow="yes"?><?robots index="no" follow="yes"?>

28

CommentsComments

Like HTML and SGMLLike HTML and SGML<!-- a comment --><!-- a comment -->

Anything is OK inside a commentAnything is OK inside a comment <!-- <head> & <tail> are elements --><!-- <head> & <tail> are elements -->

<!-- <?xml?> declaration goes here --><!-- <?xml?> declaration goes here -->

29

What is a DTD?What is a DTD?

"Document Type Definition""Document Type Definition" Bunch of XML declarationsBunch of XML declarations Usually external to documentUsually external to document Designed for some purpose (use one that Designed for some purpose (use one that

matches your needs)matches your needs) Best left to expertsBest left to experts

30

A Bug Report DocumentA Bug Report Document

<?xml?><bugreport><product>xmltron</product><version>1.1</version><os>RTE</os><osversion>4.0</osversion><date scheme="ISO8601">1999-11-03</date><report><summary>doesn’t work</summary><detail>at all</detail></report><solution>none yet</solution></bugreport>

31

Make a Document TypeMake a Document Type

<!DOCTYPE bugreport [ <!-- declarations go here -->

]><bugreport> ...

Doctype and root element must match

32

Declarations for ElementsDeclarations for Elements

<!DOCTYPE bugreport [<!ELEMENT bugreport wait 'til next slide><!ELEMENT product #PCDATA><!ELEMENT version #PCDATA><!ELEMENT os #PCDATA><!ELEMENT osversion #PCDATA><!ELEMENT date #PCDATA><!ELEMENT report (summary, detail)><!ELEMENT summary #PCDATA><!ELEMENT detail #PCDATA><!ELEMENT solution #PCDATA>]>

33

Declaration for Root Declaration for Root ElementElement

<!DOCTYPE bugreport [<!ELEMENT bugreport (product, version, os, osversion, date, report, solution?)>

<solution> is optional, others required andmust be in this order.

34

Declarations for AttriburesDeclarations for Attribures

<!ATTLIST date scheme CDATA #IMPLIED>

"CDATA" instead of "PCDATA" means it isn't "parsed" for entities

35

Declarations for AttributesDeclarations for Attributes

"CDATA" instead of "PCDATA" means it "CDATA" instead of "PCDATA" means it isn't "parsed" for entities (no markup)isn't "parsed" for entities (no markup)

#IMPLIED means optional (value #IMPLIED means optional (value implied by document)implied by document)

separate ATTLIST declarations for the separate ATTLIST declarations for the same element are OKsame element are OK

internal ATTLIST declarations override internal ATTLIST declarations override externalexternal

<!ATTLIST date scheme CDATA #IMPLIED>

36

documents = contents + documents = contents + stylestyle

Extensible Stylesheet Language (XSL)Extensible Stylesheet Language (XSL) Specifications still in draftSpecifications still in draft But implementations keeping paceBut implementations keeping pace

37

<?xml version="1.0"?><?xml version="1.0"?><?xml-stylesheet type="text/css" href="xmlpartstyle.css"?><?xml-stylesheet type="text/css" href="xmlpartstyle.css"?><PARTS><PARTS> <TITLE>Computer Parts</TITLE><TITLE>Computer Parts</TITLE> <PART><PART> <ITEM>Motherboard</ITEM><ITEM>Motherboard</ITEM> <MANUFACTURER>ASUS</MANUFACTURER><MANUFACTURER>ASUS</MANUFACTURER> <MODEL>P3B-F </MODEL><MODEL>P3B-F </MODEL> <COST> 123.00</COST><COST> 123.00</COST> </PART></PART> <PART><PART> <ITEM>Video Card</ITEM><ITEM>Video Card</ITEM> <MANUFACTURER>ATI</MANUFACTURER><MANUFACTURER>ATI</MANUFACTURER> <MODEL>All-in-Wonder Pro</MODEL><MODEL>All-in-Wonder Pro</MODEL> <COST> 160.00</COST><COST> 160.00</COST> </PART></PART> <PART><PART> <ITEM>Sound Card</ITEM><ITEM>Sound Card</ITEM> <MANUFACTURER>Creative Labs</MANUFACTURER><MANUFACTURER>Creative Labs</MANUFACTURER> <MODEL>Sound Blaster Live</MODEL><MODEL>Sound Blaster Live</MODEL> <COST> 80.00</COST><COST> 80.00</COST> </PART></PART> <PART><PART> <ITEM> inch Monitor</ITEM><ITEM> inch Monitor</ITEM> <MANUFACTURER>LG Electronics</MANUFACTURER><MANUFACTURER>LG Electronics</MANUFACTURER> <MODEL> 995E</MODEL><MODEL> 995E</MODEL> <COST> 290.00</COST><COST> 290.00</COST> </PART></PART></PARTS></PARTS> Using a cascading style sheet, we will see Using a cascading style sheet, we will see

38

XPathXPath

Used to access part of XML document Used to access part of XML document Compact, non-XML syntax Compact, non-XML syntax Use a pattern expression to identify nodes Use a pattern expression to identify nodes

in an XML documentin an XML document Have a library of standard functions Have a library of standard functions W3C Standard W3C Standard

39

XPath ExampleXPath Example

Sample XMLSample XML The root elementThe root element

/STATES/STATES The SCODE of all STATE elements of STATES The SCODE of all STATE elements of STATES

element element /STATES/STATE/SCODE/STATES/STATE/SCODE

All the CAPTIAL element with a CNAME sub-element All the CAPTIAL element with a CNAME sub-element of the STATE element of the STATES elementof the STATE element of the STATES element /STATES/STATE/CAPITAL[CNAME=‘Atlanta’]/STATES/STATE/CAPITAL[CNAME=‘Atlanta’]

All CITIES elements in the XML documentAll CITIES elements in the XML document //CITIES//CITIES

40

More XPath ExampleMore XPath Example

Element AA with two ancestorsElement AA with two ancestors /*/*/AA/*/*/AA

First BB element of AA elementFirst BB element of AA element /AA/BB[1]/AA/BB[1]

All the CC elements of the BB elements All the CC elements of the BB elements which has an sub-element A with value ‘3’ which has an sub-element A with value ‘3’ /BB[A=‘3’]/CC/BB[A=‘3’]/CC

Any elements AA or elements CC of Any elements AA or elements CC of elements BBelements BB //AA | /BB/CC//AA | /BB/CC

41

Even More XPath ExampleEven More XPath Example

Select all sub-elements of elements BB of elements Select all sub-elements of elements BB of elements AAAA /BB/AA/*/BB/AA/* When you do not know the sub-elementsWhen you do not know the sub-elements Different from /BB/AADifferent from /BB/AA

Select all attributes named ‘aa’Select all attributes named ‘aa’ //@aa//@aa

Select all CITIES elements with an attribute named aaSelect all CITIES elements with an attribute named aa //CITIES[@aa]//CITIES[@aa]

Select all CITIES elements with an attribute named aa Select all CITIES elements with an attribute named aa with value ‘123’with value ‘123’ //CITIES[@aa = ‘123’]//CITIES[@aa = ‘123’]

42

AxisAxis

Context nodeContext node Evaluation of XPath is from left to rightEvaluation of XPath is from left to right The context node the current node (set) being The context node the current node (set) being

evaluatedevaluated AxisAxis

Specifies the relationship of the resulting Specifies the relationship of the resulting nodes relative to context nodenodes relative to context node

Example: Example: /child::AA – children of AA, abbreviated by /AA/child::AA – children of AA, abbreviated by /AA //AA/ancestor::BB – BB elements who are ancestor of //AA/ancestor::BB – BB elements who are ancestor of

any AA elementsany AA elements

43

AxesAxes

ancestorancestor: //BBB/ancestor::*: //BBB/ancestor::*   <AAA><AAA>

          <BBB/>           <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>                     <DDD><DDD>                <BBB/>                <BBB/>                    </DDD> </DDD>           <CCC/>           <CCC/>     </AAA></AAA>

44

AxesAxes

ancestorancestor: //BBB/ancestor::DDD: //BBB/ancestor::DDD   <AAA> <AAA>

          <BBB/>           <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>                    <DDD> <DDD>                <BBB/>                <BBB/>                    </DDD>  </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

45

AxesAxes

attributeattribute: Contains all attributes of the current node: Contains all attributes of the current node //BBB/attribute::* – abbreviated by //@//BBB/attribute::* – abbreviated by //@ <AAA> <AAA>

          <BBB           <BBB aa=‘1’aa=‘1’/> />           <CCC/>           <CCC/>           <BBB           <BBB aa=‘2’aa=‘2’ /> />           <BBB           <BBB aa=‘3’aa=‘3’ /> />                     <DDD> <DDD>                <BBB                <BBB bb=‘31’bb=‘31’ /> />           </DDD>           </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

//BBB/attribute::bb//BBB/attribute::bb

46

AxesAxes

childchild /AAA/DDD/child::BBB – child can be omitted for /AAA/DDD/child::BBB – child can be omitted for

abbreviationabbreviation   <AAA> <AAA>

          <BBB/>           <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>                     <DDD> <DDD>                             <BBB/>   <BBB/>           </DDD>           </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

47

AxesAxes

descendantdescendant /AAA/descendent::*/AAA/descendent::* <AAA> <AAA>

                   <BBB/>  <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>           <DDD>           <DDD>                <BBB/>                <BBB/>           </DDD>           </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

/AAA/descendent::CCC ?/AAA/descendent::CCC ?

48

AxesAxes

parentparent //BBB/parent::*//BBB/parent::* <AAA><AAA>

          <BBB/>           <BBB/>           <CCC/>           <CCC/>           <BBB/>           <BBB/>           <BBB/>           <BBB/>                     <DDD><DDD>                <BBB/>                <BBB/>                     </DDD></DDD>           < CCC/>           < CCC/>     </AAA></AAA>

//BBB/parent::DDD ?//BBB/parent::DDD ?

49

AxesAxes

descendant-or-selfdescendant-or-self followingfollowing following-siblingfollowing-sibling preceding: preceding: preceding-siblingpreceding-sibling selfself

50

PredicatesPredicates

Filters a element setFilters a element set A predicate is placed inside square brackets ( [ ] )A predicate is placed inside square brackets ( [ ] ) Example: //Example: //BBB[position() mod 2 = 0 ]BBB[position() mod 2 = 0 ]       <<AAAAAA> >

          <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <BBBBBB/> />           <          <CCCCCC/> />           <          <CCCCCC/> />           <          <CCCCCC/> />      </     </AAAAAA> >

51

PredicatesPredicates

//BBB[@aa=’31’]//BBB[@aa=’31’] <AAA> <AAA>

          <BBB aa=‘1’/>           <BBB aa=‘1’/>           <CCC/>           <CCC/>           <BBB aa=‘2’ />           <BBB aa=‘2’ />           <BBB aa=‘3’ />           <BBB aa=‘3’ />           <DDD>           <DDD>                               <BBB bb=‘31’ /><BBB bb=‘31’ />           </DDD>           </DDD>           <CCC/>           <CCC/>   </AAA>   </AAA>

Is it different from //BBB/attribute::bb?Is it different from //BBB/attribute::bb?

52

XQueryXQuery

XQuery is a general purpose query XQuery is a general purpose query language for XML data language for XML data

XQuery uses a XQuery uses a for … let … where .. resultfor … let … where .. result … … syntaxsyntax forfor SQL from SQL from wherewhere SQL where SQL where resultresult SQL select SQL select letlet allows temporary variables, and has allows temporary variables, and has no equivalent in SQLno equivalent in SQL

53

FLWR Syntax in XQuery FLWR Syntax in XQuery Simple FLWR expression in XQuery Simple FLWR expression in XQuery

find all accounts with balance > 400, find all accounts with balance > 400, with each result enclosed in an with each result enclosed in an <account-number> .. </account-<account-number> .. </account-number> tagnumber> tag forfor $x$x in in /bank-2/account/bank-2/account let let $acctno := $x/@account-$acctno := $x/@account-number number wherewhere $x/balance > 400 $x/balance > 400 return return <account-number> $acctno <account-number> $acctno </account-number></account-number>

54

Path Expressions and Path Expressions and FunctionsFunctions

The function The function distinct( )distinct( ) can be used to can be used to removed duplicates in path expression removed duplicates in path expression resultsresults

The functionThe function document(name)document(name) returns returns root of named documentroot of named document E.g. E.g. document(“bank-2.xml”)/bank-2/accountdocument(“bank-2.xml”)/bank-2/account

Aggregate functions such as Aggregate functions such as sum( )sum( ) and and count( )count( ) can be applied to path expression can be applied to path expression resultsresults

55

JoinsJoins Joins are specified in a manner very Joins are specified in a manner very

similar to SQLsimilar to SQL

for for $a $a inin /bank/account, /bank/account, $c $c inin /bank/customer,/bank/customer, $d $d inin /bank/depositor /bank/depositor

where where $a/account-number = $a/account-number = $d/account-number $d/account-number and and $c/customer-name = $c/customer-name = $d/customer-name$d/customer-name

return return <cust-acct> $c $a </cust-<cust-acct> $c $a </cust-acct>acct>

56

The same query can be expressed with the The same query can be expressed with the selections specified as XPath selections:selections specified as XPath selections: forfor $a $a inin /bank/account /bank/account $c $c inin /bank/customer /bank/customer

$d $d inin /bank/depositor[ /bank/depositor[ account-number = account-number = $a/account-number $a/account-number andand customer-name = customer-name = $c/customer-name$c/customer-name]] return return <cust-acct> $c $a</cust-acct><cust-acct> $c $a</cust-acct>

57

Changing Nesting StructureChanging Nesting Structure

<bank-1><bank-1> forfor $c $c inin /bank/customer /bank/customer returnreturn

<customer><customer> $c/*$c/* for for $d $d inin /bank/depositor[customer-name = /bank/depositor[customer-name =

$c/customer-name],$c/customer-name], $a $a inin /bank/account[account- /bank/account[account-

number=$d/account-number]number=$d/account-number] returnreturn $a $a

</customer></customer> </bank-1></bank-1>

58

XQuery Path ExpressionsXQuery Path Expressions

$c/text()$c/text() gives text content of an element gives text content of an element without any without any subelements/tagssubelements/tags

XQuery path expressions support the “–>” XQuery path expressions support the “–>” operator for dereferencing IDREFsoperator for dereferencing IDREFs Equivalent to the id( ) function of XPath, but Equivalent to the id( ) function of XPath, but

simpler to usesimpler to use Can be applied to a set of IDREFs to get a set of Can be applied to a set of IDREFs to get a set of

resultsresults June 2001 version of standard has changed “–June 2001 version of standard has changed “–

>” to “=>”>” to “=>”

59

Sorting in XQuery Sorting in XQuery Sortby Sortby clause can be used at the end of clause can be used at the end of

any expression. E.g. to return customers any expression. E.g. to return customers sorted by namesorted by name for for $c in /bank/customer$c in /bank/customer return return <customer> $c/* </customer> <customer> $c/* </customer> sortbysortby(name)(name)

60

Can sort at multiple levels of nesting (sort by Can sort at multiple levels of nesting (sort by customer-name, and by account-number within customer-name, and by account-number within each customer)each customer)

<bank-1><bank-1> for for $c in /bank/customer$c in /bank/customer returnreturn

<customer><customer> $c/* $c/* for for $d$d in in /bank/depositor[customer-/bank/depositor[customer-

name=$c/customer-name],name=$c/customer-name], $a $a in in /bank/account[account-/bank/account[account-

number=$d/account-number]number=$d/account-number] return return <account> $a/* </account> <account> $a/* </account> sortbysortby(account-number)(account-number)

</customer></customer> sortby sortby(customer-name)(customer-name) </bank-1></bank-1>

61

Application Program Application Program InterfaceInterface There are two standard application program There are two standard application program

interfaces to XML data:interfaces to XML data: SAX SAX (Simple API for XML)(Simple API for XML)

Based on parser model, user provides event handlers Based on parser model, user provides event handlers for parsing events for parsing events

E.g. start of element, end of elementE.g. start of element, end of element Not suitable for database applicationsNot suitable for database applications

DOM DOM (Document Object Model)(Document Object Model) XML XML data is parsed into a tree representation data is parsed into a tree representation Variety of functions provided for traversing the DOM Variety of functions provided for traversing the DOM

treetree E.g.: Java DOM API provides Node class with methodsE.g.: Java DOM API provides Node class with methods

getParentNode( ), getFirstChild( ), getParentNode( ), getFirstChild( ), getNextSibling( )getNextSibling( ) getAttribute( ), getData( ) (for text node) getAttribute( ), getData( ) (for text node) getElementsByTagName( ), … getElementsByTagName( ), …

Also provides functions for updating DOM treeAlso provides functions for updating DOM tree

62

Storage of XML DataStorage of XML Data XML data can be stored in XML data can be stored in

Non-relational data storesNon-relational data stores Flat filesFlat files

Natural for storing XMLNatural for storing XML But has all problems discussed in Chapter 1 (no But has all problems discussed in Chapter 1 (no

concurrency, no recovery, …)concurrency, no recovery, …) XML databaseXML database

Database built specifically for storing XML data, Database built specifically for storing XML data, supporting DOM model and declarative queryingsupporting DOM model and declarative querying

Currently no commercial-grade systemsCurrently no commercial-grade systems

Relational databasesRelational databases Data must be translated into relational formData must be translated into relational form Advantage: mature database systemsAdvantage: mature database systems Disadvantages: overhead of translating data and Disadvantages: overhead of translating data and

queriesqueries

63

Storage of XML in Storage of XML in Relational DatabasesRelational Databases

Alternatives:Alternatives: String RepresentationString Representation Tree RepresentationTree Representation Map to relationsMap to relations

64

String RepresentationString Representation Store each top level element as a string field of a Store each top level element as a string field of a

tuple in a relational databasetuple in a relational database Use a single relation to store all elements, orUse a single relation to store all elements, or Use a separate relation for each top-level element typeUse a separate relation for each top-level element type

E.g. account, customer, depositor relationsE.g. account, customer, depositor relations Each with a string-valued attribute to store the elementEach with a string-valued attribute to store the element

Indexing:Indexing: Store values of subelements/attributes to be indexed Store values of subelements/attributes to be indexed

as extra fields of the relation, and build indices on as extra fields of the relation, and build indices on these fieldsthese fields

E.g. customer-name or account-numberE.g. customer-name or account-number Oracle 9 supports Oracle 9 supports function indices function indices which use the which use the

result of a function as the key value. result of a function as the key value. The function should return the value of the required The function should return the value of the required

subelement/attributesubelement/attribute

65

String Representation String Representation (Cont.)(Cont.)

Benefits: Benefits: Can store any XML data even without DTDCan store any XML data even without DTD As long as there are many top-level elements As long as there are many top-level elements

in a document, strings are small compared to in a document, strings are small compared to full documentfull document

Allows fast access to individual elements.Allows fast access to individual elements.

DrawbackDrawback:: Need to parse strings to access Need to parse strings to access values inside the elementsvalues inside the elements Parsing is slow.Parsing is slow.

66

Tree RepresentationTree Representation Tree representation: Tree representation: model XML data as tree and store model XML data as tree and store

using relationsusing relations nodes(id, type, label, value)nodes(id, type, label, value) child (child-id, parent-id) child (child-id, parent-id)

Each element/attribute is given a unique identifierEach element/attribute is given a unique identifier Type indicates element/attributeType indicates element/attribute Label specifies the tag name of the element/name of Label specifies the tag name of the element/name of

attributeattribute Value is the text value of the element/attributeValue is the text value of the element/attribute The relation The relation child child notes the parent-child relationships in the notes the parent-child relationships in the

treetree Can add an extra attribute to Can add an extra attribute to child child to record ordering of children to record ordering of children

bank (id:1)

customer (id:2) account (id: 5)

customer-name(id: 3)

account-number (id: 7)

67

Tree Representation (Cont.)Tree Representation (Cont.)

Benefit: Can store any XML data, even Benefit: Can store any XML data, even without DTDwithout DTD

Drawbacks:Drawbacks: Data is broken up into too many pieces, Data is broken up into too many pieces,

increasing space overheadsincreasing space overheads Even simple queries require a large number of Even simple queries require a large number of

joins, which can be slowjoins, which can be slow

68

Mapping XML Data to Mapping XML Data to RelationsRelations Map to relationsMap to relations

If DTD of document is known, can map data to If DTD of document is known, can map data to relationsrelations

A relation is created for each element typeA relation is created for each element type Elements (of type #PCDATA), and attributes are Elements (of type #PCDATA), and attributes are

mapped to attributes of relationsmapped to attributes of relations More details on next slide …More details on next slide …

Benefits: Benefits: Efficient storageEfficient storage Can translate XML queries into SQL, execute Can translate XML queries into SQL, execute

efficiently, and then translate SQL results back efficiently, and then translate SQL results back to XMLto XML

Drawbacks: need to know DTD, Drawbacks: need to know DTD, translation overheads still presenttranslation overheads still present

69

Mapping XML Data to Mapping XML Data to Relations (Cont.)Relations (Cont.) Relation created for each element type containsRelation created for each element type contains

An id attribute to store a unique id for each elementAn id attribute to store a unique id for each element A relation attribute corresponding to each element attributeA relation attribute corresponding to each element attribute A parent-id attribute to keep track of parent elementA parent-id attribute to keep track of parent element

As in the tree representationAs in the tree representation Position information (iPosition information (ithth child) can be store too child) can be store too

All subelements that occur only once can become All subelements that occur only once can become relation attributesrelation attributes For text-valued subelements, store the text as attribute For text-valued subelements, store the text as attribute

valuevalue For complex subelements, can store the id of the For complex subelements, can store the id of the

subelementsubelement Subelements that can occur multiple times Subelements that can occur multiple times

represented in a separate tablerepresented in a separate table Similar to handling of multivalued attributes when Similar to handling of multivalued attributes when

converting ER diagrams to tablesconverting ER diagrams to tables

70

Mapping XML Data to Mapping XML Data to Relations (Cont.)Relations (Cont.) E.g. For E.g. For bank-1 bank-1 DTD with DTD with accountaccount elements elements

nested within nested within customercustomer elements, create elements, create relationsrelations customer(id, parent-id, customer-name, customer-customer(id, parent-id, customer-name, customer-

stret, customer-city)stret, customer-city) parent-idparent-id can be dropped here since parent is the sole root can be dropped here since parent is the sole root

elementelement All other attributes were subelements of type #PCDATA, and All other attributes were subelements of type #PCDATA, and

occur only onceoccur only once account (id, parent-id, account-number, branch-name, account (id, parent-id, account-number, branch-name,

balance)balance) parent-idparent-id keeps track of which customer an account occurs keeps track of which customer an account occurs

underunder Same account may be represented many times with different Same account may be represented many times with different

parentsparents