+ All Categories
Home > Documents > 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

Date post: 22-Dec-2015
Category:
View: 213 times
Download: 0 times
Share this document with a friend
Popular Tags:
36
1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane
Transcript
Page 1: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

1

Introduction to XML Algebra

Based on talk prepared for CS561 by Wan Liu and Bintou Kane

Page 2: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

2

Data Model data model ~ core data structures

and data types supported by DBMS relational database is a table (set-

oriented) data model XML format is a tree-structured

hierarchical model

Page 3: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

3

Why XML Algebra?

It is common to translate a query language into an algebra.

First, the algebra is used to give a semantics for the query language.

Second, the algebra is used to support query optimization.

Page 4: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

5

NIAGARA Title : Following the paths of XML

Data: An algebraic framework for XML query evaluation

By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.

Univ. of Wisconsin

Page 5: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

6

Outline

Concepts of Niagara Algebra

Operations

Optimization

Page 6: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

7

Goals of Niagara Algebra

Be independent of schema information Query on both structure and content Generate simple, flexible, yet powerful

algebraic expressions Allow re-use of traditional optimization

techniques

Page 7: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

8

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice No = 1>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>AT&T</carrier>

<total>$0.75</total>

</invoice>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

Page 8: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

9

XML Data Model and Tree Graph

Example:Invoice_Document

Invoice Invoice…

numbercarrier total number

carriertotal

2 AT&T $0.25 1 Sprint $1.20

<Invoice_Document> <invoice> <number>2</number> <carrier>Sprint</carrier> <total>$0.25</total> </invoice>

<invoice><number>1</number> <carrier>Sprint</carrier> <total>$1.20</total> </invoice>

</Invoice_Document>

Ordered Tree Graph,

Semi structured Data

Page 9: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

10

XML Data Model [GVDNM01]

Collection of bags of vertices. Vertices in a bag have no order. Example:

Root invoice.xml invoice invoice.account_number

<invoice>Invoice-element-content

</invoice>

< account_number >element-content

</ account_number >

[Root“invoice.xml”, invoice, invoice. account_number ]

Page 10: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

11

Data Model

Bag elements are reachable by path expressions.

Path expression consists of two parts: An entry point A relative forward part

Example: account_number:invoice

Page 11: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

12

Operators

Source S , Follow , Select , Join , Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product .

Page 12: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

13

Source Operator S

Input : a list of documents Output :a collection of singleton bags

Examples :

S (*) All Known XML documentsS (invoice*.xml) All XML documents whose filename match “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to schema.dtd

Page 13: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

14

Follow operator Input : a path expression in entry

point notation Functionality : extracts vertices

reachable by path expression Output : a new bag that consists of

the extracted vertex + all contents of original bag (in case of unnesting follow)

Page 14: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

15

Follow operator (Example*)

Root invoice.xml invoice

<invoice>Invoice-element-content

</invoice>

Root invoice.xml invoice invoice.carrier

<invoice>Invoice-element-content

</invoice>

<carrier>carrier -element-content

</carrier >

(carrier:invoice)*Unnesting Follow

{[Root invoice.xml , invoice]}

{[Root invoice.xml , invoice, invoice.carrier]}

Page 15: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

16

Select operator

Input : a set of bags Functionality : filters the bags of a

collection using a predicate Output : a set of bags that conform

to the predicate Predicate : Logical operator (,,), or simple

qualifications (,,,,,)

Page 16: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

17

Select operator (Example)

invoice.carrier =Sprint

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}

{[Root invoice.xml , invoice],… }

Page 17: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

18

Join operator Input: two collections of bags Functionality: Joins the two

collections based on a predicate Output: the concatenation of pairs of

pages that satisfy the predicate

Page 18: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

19

Join operator (Example)

Root invoice.xml invoice<invoice>

Invoice-element-content</invoice>

Root customer.xml customer<customer>

customer-element-content</customer>

account_number: invoice =number:customer

Root invoice.xml invoice Root customer.xml customer<invoice>

Invoice-element-content</invoice>

<customer>customer-element-content

</customer>

{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}

{[Root invoice.xml , invoice, Root customer.xml , customer]}

Page 19: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

20

Expose operator

Input: a list of path expressions of vertices to be exposed

Output: a set of bags that contains vertices in the parameter list with the same order

Page 20: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

21

Expose operator (Example)

Root invoice.xml invoice. bill_period invoice.carrier

<invoice>carrier-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

(bill_period,carrier)

{[Root invoice.xml , invoice.bill_period, invoice.carrier]}

Root invoice.xml invoice invoice.carrier invoice.bill_period

<invoice>Invoice-element-content

</invoice>

<carrier>bill_period -element-content

</carrier >

{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}

<invoice>carrier-element-content

</invoice>

Page 21: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

22

Vertex operator

Creates the actual XML vertex that will encompass everything created by an expose operator

Example :

(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]

Page 22: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

23

Other operators Group : is used for arbitrary

grouping of elements based on their values Aggregate functions can be used with

the group operator (i.e. average) Rename : Changes entry point

annotation of elements of a bag. Example: (invoice.bill_period,date)

Page 23: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

24

Example: XML Source Documents

Invoice.xml

<Invoice_Document>

<invoice>

<account_number>2 </account_number>

<carrier>AT&T</carrier>

<total>$0.25</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<carrier>Sprint</carrier>

<total>$1.20</total>

</invoice>

<invoice>

<account_number>1 </account_number>

<total>$0.75</total>

</invoice>

<auditor> maria </auditor>

</Invoice_Document>

Customer.xml

<Customer_Document>

<customer>

<account>1 </account>

<name>Tom </name>

</customer >

<customer>

<account>2 </account>

<name>George </name>

</customer >

</Customer _Document>

Page 24: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

25

Xquery ExampleList account number, customer name, and

invoice total for all invoices that has carrier = “Sprint”.

FOR $i in (invoices.xml)//invoice,

$c in (customers.xml)//customer

WHERE $i/carrier = “Sprint” and

$i/account_number= $c/account

RETURN

<Sprint_invoices>

$i/account_number,

$c/name,

$i/total

</Sprint_invoices>

Page 25: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

26

Example: Xquery output

<Sprint_Invoice>

<account_number>1 </account_number>

<name>Tom </name>

<total>$1.20</total>

</Sprint_Invoice >

Page 26: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

27

Algebra Tree Execution

customer (2) customer(1) Invoice (1) invoice (2) invoice (3)

Source (Invoices.xml) Source (cutomers.xml)

Follow (*.invoice) Follow (*.customer)

Select (carrier= “Sprint” )

invoice (2)

Join (*.invoice.account_number=*.customer.account)

invoice(2) customer(1)

Expose (*.account_number , *.name, *.total )

Account_number name total

Page 27: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

28

Optimization with Niagara

Optimizer based on Niagara algebra:

Use the operation more efficiently Produce simpler expressions by

combining operations

Page 28: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

29

Language Convention A and B are path expressions A< B -- Path Expression A is

prefix of B AnB --- Common prefix of path

A and B AńB --- Greatest common of

path A and B ┴ --- Null path Expression

Page 29: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

30

Heuristics using Rewrite Rules

Allow optimization based on path selectivity

When applying un-nesting following operation Φμ

Page 30: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

31

Φμ(A) [Φμ(B)]=Φμ (B)[Φμ (A)]

TRUE when exists C such that C < A && C < B and C = AńB

Or AnB = ┴

Interchangeability of Follow operation

Page 31: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

32

Application of Rule on Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)] *

=?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)] **

Page 32: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

33

Application of Rule on Invoice

Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]

?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]

Equivalent because both share the common prefix “invoice”.

Case AńB = invoice

Page 33: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

34

Benefit of Rule Application NOTE: let us assume that acc_Num is required for each invoice

element, while carrier is not required for invoice element

THEN:Φμ(acc_Num:invoice)[Φμ(carrier:invoice)]

?=Φμ(carrier:invoice)[Φμ(acc_Num:invoice)]

Then what algebra tree do we prefer?

Φμ(acc_Num:invoice)[Φμ(acc_Num:customer)]

make more sense than ** Why?

Page 34: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

35

Discussion

Reduction of Input Size on firstSub-operation:

Φμ(carrier:invoice)

Page 35: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

36

Should we/can we apply the rule below?

Φμ(acc_Num:invoice)[Φμ(acc_Num:Customer)]

Page 36: 1 Introduction to XML Algebra Based on talk prepared for CS561 by Wan Liu and Bintou Kane.

37

“acc_Num:invoice” and“acc_Num:customer” are two totally different paths

Case is: AnB = ┴

So yes, rule is valid.


Recommended