+ All Categories
Home > Documents > XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Email: [email protected]...

XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Email: [email protected]...

Date post: 26-Dec-2015
Category:
Upload: vernon-bradley
View: 220 times
Download: 2 times
Share this document with a friend
Popular Tags:
29
XML-to-Relational Schema Mapping Algorithm ODTDMap Speaker: Artem Chebotko* Email: [email protected] Wayne State University Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi
Transcript

XML-to-Relational Schema Mapping Algorithm ODTDMap

Speaker: Artem Chebotko*

Email: [email protected]

Wayne State University

Joint work with Mustafa Atay, Shiyong Lu and Farshad Fotouhi

2

Introduction

• XML has emerged as the standard for representing and exchanging data on the World Wide Web.

• The increasing amount of XML documents requires the need to store and query XML documents efficiently.

3

Current approaches of storing and querying XML documents

• Native XML repositories, e.g., Software AG’s Tamino, eXcelon’s XIS.

• XML-enabled commercial database systems such as SQL Server, Oracle, and DB2

• Using RDBMS/ODBMS to store and query XML documents.

4

Issues of the relational approach

• Schema Mapping– XML data model needs to be mapped into the relational

model• Data Mapping

– XML documents need to be shredded and composed into tuples to be inserted into the relational database

• Query Mapping– XML queries need to be translated into SQL queries

• Reverse Data Mapping– Query results need to be tagged to XML format.

5

Our contributions

• We propose a schema mapping algorithm, ODTDMap, which generates a relational schema from an XML DTD for storing and querying ordered XML documents.

• Improvements over the existing algorithms– Losslessness

– Efficient support for XML queries

– Completeness (recursion, set-valued attributes DTD operators)

6

Outline of the talk

• Introduction of XML DTDs

• Mapping DTDs to relational schemas– Simplifying DTDs– Creating and inlining DTD graphs– Generating relational schemas

• An example

• Conclusions and future work

7

An overview of DTDs A DTD example

<!DOCTYPE memo [

<!ELEMENT memo (to, from, date, subject?, body)>

<!ATTLIST memo security CDATA>

<!ATTLIST memo lang CDATA>

<!ELEMENT to (#PCDATA)>

<!ELEMENT from (#PCDATA)>

<!ELEMENT date (#PCDATA)>

<!ELEMENT subject (#PCDATA)>

<!ELEMENT body (para+)>

<!ELEMENT para (#PCDATA)>

]

8

DTD: Document Type Defintion

• <!DOCTYPE root-element [ doctype-declaration...

• <!ELEMENT element-name content-model>, content model: “|”, “,”, “*”, “+”, “?”

• <!ATTLIST element-name attr-name attr-type attr-default ...>

9

DTD: Document Type Definition (con’t)

• <!ATTLIST element-name attr-name attr-type attr-default ...>declares which attributes are allowed or required in which elements attribute types:

– CDATA: any value is allowed (the default) – (value|...): enumeration of allowed values – ID, IDREF, IDREFS: ID attribute values must be unique (contain "element

identity"), IDREF attribute values must match some ID (reference to an element)

– ENTITY, ENTITIES, NMTOKEN, NMTOKENS, NOTATION: just forget these... (consider them deprecated)

• attribute defaults: – #REQUIRED: the attribute must be explicitly provided – #IMPLIED: attribute is optional, no default provided – "value": if not explicitly provided, this value inserted by default – #FIXED "value": as above, but only this value is allowed

10

Mapping DTDs to relational schemas

• Simplifying DTDs

• Creating and inlining DTD graphs

• Generating relational schemas

11

Simplifying DTDs

• A DTD might be very complex due to nesting, e.g.,

<ELEMENT a ((b+, c*, d?)?, (e?, f, (g*, h?)+)?)>• An XML query language is concerned about:

– The parent-child relationships between XML elements

– The relative order relationships between siblings (add an ordinal attribute to each relation)

12

DTD simplifications rules1. e+ e*

2. e? e

3. (e1 | … | en) (e1, … ,en)

4. (a) (e1,… ,en)* (e1*, … ,en

*) (b) e** e*

5. (a) …, e, …, e, … …,e*, …,… (b) …, e, …, e*, … …,e*, …,… (c) …, e*, …, e, … …,e*, …,… (d) …, e*, …, e*, … …,e*, …,…

13

Example of simplifying a DTD

<ELEMENT a ((b+, c*, d?)?, (e?, f, (g*, h?)+)?)>

simplified to

<ELEMENT a (b*, c*, d, e, f, g*, h*)>

14

Creating and inlining DTD graphs

• We create a DTD graph based on the simplified DTD. • Definition 3.2 (DTD graph) The structure of a DTD can

be represented by a labeled graph, in which nodes represent elements and attributes, and edges represent their parent-child relationships. The edges are labeled by either `*' (star edge) or `, ' (normal edge) where the label `,' is not shown for simplicity.

• Idea: inline a child c to its parent p if p can contain at most one occurrence of c.

• Rationale: inlined elements will produce a relation.

15

Inlinable node and subtree, shared node

• Definition 3.3 (Inlinable node) Given a DTD graph, a node is inlinable if and only if it has exactly one incoming edge and that edge is a normal edge.

• Definition 3.4 (Inlinable subtree) Given a DTD graph and a node e in the graph, e and all other inlinable nodes that are reachable from e by normal edges constitute a subtree. This subtree is called the inlinable subtree for the node e (it is rooted at e).

• Definition 3.5 (Shared node) Given a DTD graph, a node is called a shared node if it has more than one incoming edge.

16

Inlining

• Case 1: Node a is connected to b by a normal edge and b has no other incoming edges, inlining b to a.

• Case 2: Node a is connected to b by a normal edge but b has other incoming edges, b is a shared node, no inlining.

• Case 3: Node a is connected to b by a star edge, no inlining.

17

Inlining (con’t)

18

Inlining DTD graphs

19

Complexity of inlining

• Theorem 3.7 (Time Complexity)

The time complexity of our inlining algorithm is O(n) where n is the number of elements in the input DTD.

20

The inlining procedure

21

The inlining procedure (con’t)INCORRECT

22

The inlining procedure (con’t)CORRECT

23

Generating relational schema

24

Generating schema mapping info.

• Definition 3.8 ( Mapping) is a mapping from X to R, where X is the set of XML element and attribute types in the input XML DTD, and R is the set of relations in the relational database. Given an XML element type e, (e) will return the corresponding relation that is used to store e. Similarly, given an XML attribute type a of element type e, (e.a) will return the corresponding relation that is used to store a of e.

25

A complete example

26

DTD graphInlined DTD graph

27

Generated relational schema

28

Conclusions

• We defined the schema mapping algorithm ODTDMap, which has several improvements over the existing ones.

• It is lossless in the sense that one can reconstruct original XML document in the given document order, based on the target relational schema generated by ODTDMap.

• It has efficient support for recursive queries and schemas.• It defines how to map set-valued XML attributes.• Experimental results showed good performance and

scalability of the algorithm.

29

Future work

• Extending our work to XML Schema to support data types other than string type.

• Maintain the ID/IDREF/IDREFS in terms of key and foreign key constraints.


Recommended