Date post: | 15-Jan-2016 |
Category: |
Documents |
View: | 216 times |
Download: | 0 times |
1
XML Algebra
Comparison between:XPERANTONIAGARA
2
Part I NIAGARA
XML Query Optimization
XML Algebra Data Model Operator Query Plan Equivalent Rules
XPERANTO XML Query to SQL XML Algebra
Data Model Operator Query Plan Composition Rules
Translation Example
3
<?xml version=”1.0” encoding=”US-ASCII” ?> <!DOCTYPE invoice [<!ELEMENT invoice (account_number,
bill_period, carrier+, itemized_call*, total)>
<!ELEMENT account_number (#PCDATA)><!ELEMENT bill_period (#PCDATA)><!ELEMENT carrier (#PCDATA)><!ELEMENT itemized_call EMPTY><!ATTLIST itemized_call
no ID #REQUIREDdate CDATA #REQUIREDnumber_called CDATA #REQUIREDtime CDATA #REQUIREDrate (NIGHT|DAY) #REQUIREDmin CDATA #REQUIREDamount CDATA #REQUIRED>
<!ELEMENT total (#PCDATA)>]>
<invoice>
<account_number>555 777-3158 573 234 3</account_number>
<bill_period>Jun 9 - Jul 8, 2000</bill_period>
<carrier>Sprint</carrier>
<itemized_call no=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” />
<itemized_call no=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” />
<itemized_call no=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” />
<total>$0.35</total>
</invoice>
Example of Telephone Bill
4
Example XQueryUser XQuery: <summary>{
FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)
LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate]
WHERE $itemized_call/@number_called LIKE ‘973%’
RETURN<rate>$rate</rate><number_of_calls>count($itemized_call)</number_of_calls>
}</summary>
Count number of itemized_calls in calling area 973 grouped by the calling rate.
5
NIAGARA Title : Following the paths of XML
Data: An algebraic framework for XML query evaluation
By : Leonidas Galanis, Efstratios Viglas, David J. DeWitt, Jeffrey. F. Naughton, and David Maier.
6
Goals Be independent of schema
information Query on both structure and
content Generate simple, flexible, yet
powerful algebraic expressions Allow re-use of traditional
optimization techniques
7
Data Model A collection of bags of vertices. The vertices in the bag have no order. Example:
Root invoice.xml invoice invoice.account_number
<invoice>Invoice-element-content
</invoice>
< account_number >carrier -element-content
</ account_number >
[Root“invoice.xml”, invoice, invoice. account_number ]
8
Data Model Bag elements are reachable by
path expressions. The path expression consists of
two parts : An entry point A relative forward part
Example : account_number:invoice
9
Operators Source S , Follow , Select , Join
, Rename , Expose , Vertex , Group , Union , Intersection , Difference - , Cartesian Product .
10
Source Operator S Input : a list of documents Output :a collection of singleton bags Examples : S (*) All Known XML documentsS (invoice*.xml) All XML documents whose filename matches “invoice*.xmlS (*,schema.dtd) All known XML documents that conform to
schema.dtd
11
Follow operator Input : a path expression in entry
point notation Functionality : extracts vertices
reachable by path expression Output : a new bag that consist of
the extracted vertex + all the contents of the original bag (in care of unnesting follow)
12
Follow operator (Example*)
Root invoice.xml invoice
<invoice>Invoice-element-content
</invoice>
Root invoice.xml invoice invoice.carrier
<invoice>Invoice-element-content
</invoice>
<carrier>carrier -element-content
</carrier >
(carrier:invoice)*Unnesting Follow
{[Root invoice.xml , invoice]}
{[Root invoice.xml , invoice, invoice.carrier]}
13
Select operator Input : a set of bags Functionality : filters the bags of a
collection using a predicate Output : a set of bags that conform
to the predicate Predicate : Logical operator (,,), or
simple qualifications (,,,,,)
14
Select operator (Example)
invoice.carrier =Sprint
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
{[Root invoice.xml , invoice], [Root invoice.xml , invoice], ……………}
{[Root invoice.xml , invoice],… }
15
Join operator Input : two collections of bags Functionality :Joins the two
collections based on a predicate Output :the concatenation of pairs
of pages that satisfy the predicate
16
Join operator (Example)
Root invoice.xml invoice<invoice>
Invoice-element-content</invoice>
Root customer.xml customer<customer>
customer-element-content</customer>
account_number: invoice =number:customer
Root invoice.xml invoice Root customer.xml customer<invoice>
Invoice-element-content</invoice>
<customer>customer-element-content
</customer>
{[Root invoice.xml , invoice]} {[Root customer.xml , customer]}
{[Root invoice.xml , invoice, Root customer.xml , customer]}
17
Expose operator Input : a list of path expressions of
vertices to be exposed Output : a set of bags that contains
vertices in the parameter list with the same order
18
Expose operator (Example)
Root invoice.xml invoice. bill_period invoice.carrier
<invoice>carrier-element-content
</invoice>
<carrier>bill_period -element-content
</carrier >
(bill_period,carrier)
{[Root invoice.xml , invoice.bill_period, invoice.carrier]}
Root invoice.xml invoice invoice.carrier invoice.bill_period
<invoice>Invoice-element-content
</invoice>
<carrier>bill_period -element-content
</carrier >
{[Root invoice.xml , invoice, invoice.carrier, invoice.bill_period]}
<invoice>carrier-element-content
</invoice>
19
Vertex operator Creates the actual XML vertex that
will encompass everything created by an expose operator
Example :
(Customer_invoice)[((account)[invoice.account_number], (inv_total)[invoice.total])]
20
Other operators Group : is used for arbitrary
grouping of elements based on their values Aggregate functions can be used with
the group operator (i.e. average) Rename : Changes the entry point
annotation of the elements of a bag. Example: (invoice.bill_period,date)
21
Example XQueryUser XQuery: <summary>{
FOR $rate IN distinct(document(“invoice”)/invoice/itemized_call@rate)
LET $itemized_call := document(“invoice”)/invoice/itemized_call[@rate=$rate]
WHERE $itemized_call/@number_called LIKE ‘973%’
RETURN<rate>$rate</rate><number_of_calls>count($itemized_call)</number_of_calls>
}</summary>
Count number of itemized_calls in calling area 973 grouped by the calling rate.
22
Query Plan: Algebra
υ(summary)[
ε(υ(rate)[rate]
υ(number_of_calls)[number])
[
ρ(rate:invoice.itemized_call, rate),
ρ(count(invoice.itemized_call), number)
[γ(rate:invoice.itemized_call, count(invoice.itemized_call))
[σ number called:invoice.itemized_call ►”973%”
[Φμ(invoice.itemized_call)
[s(invoice.xml)]]]]]]
23
Equivalent Rules 14 equivalent rules so far. Definition of Auxiliary Operators for
Equiv. A > B: Path expression A is a prefix of B ┴ : The null path expression A∏B : The greatest common prefix of path
expressions A and B A∏B : The common prefix of path
expressions A and B
.
24
Equivalent Rules Examples Rule applications
Follow ordering Φμ(A) [Φμ(B)] = Φμ (B)[Φμ (A)]
iff C < A, C < B: C = A∏B, or A∏B = ┴.
A BB
C
A ...
X X
25
Equivalent Rules Examples Rule applications
Join commutability and associability (A B) C = (C B) A
26
Equivalent Rules Examples Rule applications
Selection distribution and interchangeability
σc[A B] = σc1[A] σc2[B] where c is a conjoin of the conditions c1
and c2, each of which only refers to one of the join inputs
27
Equivalent Rules Examples Rule applications
Elimination of unused bag elements ε(P)(J[A]) = J(ε(P[A]))
iff J uses only elements exposed by P
28
XPERANTO Goal:
XQuery SQL References:
J. Shanmugasundaram, et. Al. Querying XML Querying XML Views of Relational DataViews of Relational Data, VLDB 2001.
J. Shanmugasundaram, et. Al. Efficiently Publishing Relational Data as XML Documents, VLDB 2000.
J. Shanmugasundaram, Ph.D. Dissertation. July, 2001.
29
Query Processing Architecture
RDBMS
XQuery Parser
Query Rewrite & View Composition
ComputationPushdown Tagger Runtime
XQueryQuery Results
XPERANTO Query Engine
Tagger Graph
XQGM
XQGM
SQL Query Tuples
RDB
User XML View
XQuery
XQuery
User
30
Data ModelTables of A List of XML Fragments
<carrier> $carrier</carrier
$carriers
Groupby: $carrier = aggXMLFrags($carrier_entry)
$carrier_entry
Project: $carrier_entry = <carrier>$carrier</carrier>
$carrier
Select: $invoice_id = $id
Table: Carrier
$invoice_id $carrier $invoice_id
$carrier
$carrier
$carrier_entry
$carriers
<carrier> $carrier</carrier<carrier> $carrier</carrier>……….
31
Operators Table, Project, Select, Join, Groupby,
Orderby, Union, Unnest, View, Function
- Select, Project, join, groupby, orderby and union have the same semantics as their relational counterparts.
- Project : to invoke various function defined- Table/View : to refer to relational table or XML view- Unnest : to unnest XML list- Function : to invoke XQuery valued functions - Groupby : to create XML Fragments
32
XML Functions & Operators
XML Function Description Operators
1 cr8Elem(Tag, Atts, Clist) Creates an element with tag name Tag, attribute list Atts, and contents Clist
Project
2 cr8AttList(A1,…..An) Creates a list of attributes from the attributes passed as parameters
Project
3 cr8Att(Name, Val) Creates an attribute with name Name and value Val Project
4 cr8XMLFragList(C1,…Cn) Creates an XML fragment list from the content parameters
Project
5 aggXMLFrags© Aggregate XML function that creates an XML fragment list
Groupby
6 getTagName(Elem) Returns the element name of the Elem Project, Select
7 getAttributes(Elem) Returns the list of attributes of Elem Project, Select
8 getContents(Elem) Returns the XML fragment list of contents of Elem Project, Select
9 getAttName(Att) Returns the name of attribute Att Project, Select
10 getAttValue Returns the value of the attribute Att Project, Select
11 isElement(E) Returns true if E is an element, returns false otherwise Select
12 isText(T) Returns true if T is text, returns false otherwise Select
13 Unnest(List) Superscalar function that unnests a list Unnest
33
Operators - Examples
$elems
Project: $elems = getContents($invoice)
$count
Groupby: $count = count($itemized_call)
$elems
<account_number>508-753-2352</account_number> <bill_period>24 july – 23 august, 2001</bill_period> ………….. ………….. …………..
$count
3
$itemized_call
<itemized_call > </itemized_call>
<itemized_call > </itemized_call>
<itemized_call > </itemized_call>
$invoice
<invoice> <account_number>508-753-2352</account_number> <bill_period>24 july – 23 august, 2001</bill_period> …………… ………….. …………..</invoice>
34
Operators - Examples
$entries
Groupby: $entries = aggXMLFrags($entry)
$result
Project: $result = cr8Elem(summary, Att, $entries)
$entry
<rate> DAY </rate> <number_of_calls> 20 </number_of_calls>
<rate> NIGHT </rate> <number_of_calls> 23 </number_of_calls>
$entries
<rate> DAY </rate><number_of_calls> 20 </number_of_calls><rate> NIGHT </rate> <number_of_calls> 23 </number_of_calls>
$entries
<rate> DAY </rate><number_of_calls> 20 </number_of_calls><rate> NIGHT </rate><number_of_calls> 23 </number_of_calls>
$result
<summary> <rate> DAY </rate> <number_of_calls> 20 </number_of_calls> <rate> NIGHT </rate> <number_of_calls> 23 </number_of_calls></summary>
35
Operator - Examples
$elem
Unnest: $elem = unnest($elems)
$elems
<rate> DAY </rate><number_of_calls> 20 </number_of_calls><rate> NIGHT </rate><number_of_calls> 23 </number_of_calls>
$elem
<rate> DAY </rate><number_of_calls> 20 </number_of_calls><rate> NIGHT </rate><number_of_calls> 23 </number_of_calls>
36
XML Query
$rate
Navigate: $doc/invoice/itemized_call@rate
$doc
View: document(“invoice.xml”);
XQGM:
$itemized_call
Selection: $number LIKE ‘973%’
$itemized_call
Select: $rate = $irate
$entry
Project: $entry = <rate> $rate </rate> <number_of_calls> $count </number_of_calls>
$entries
Groupby: $entries = aggXMLFrags($entry)
$result
Project: $result = <summary> $entries </summary>
$rate
Select: distinct($rate)
$itemized_call
Navigate: $irate = $doc/invoice/itemized_call@rate$number = $doc/invoice/itemized_call@number_called
$irate
$count
Groupby: $count = count($itemized_call)
$rate
Join (Correlated):
$count
$number
User XQuery: <summary>{
FOR $rate IN
distinct(document(“invoice”)/invoice/itemized_call@rate)
LET $itemized_call :=
document(“invoice”)/invoice/itemized_call[@rate=$rate]
WHERE $itemized_call/@number_called
LIKE ‘973%’
RETURN<rate>$rate</rate><number_of_calls>count($itemiz
ed_call)</number_of_calls>
}</summary>
37
Navigation in XQGM
$invoice
XQGM:
$account_number
Select: getTagName($elem)=“account_number”
$elems
Project: $elems = getContents($invoice)
$elem
Unnest: $elem = unnest($elems)
$invoice
$account_number
Navigate: $invoice/account_number
38
Default XML View<invoice>
<row><id> 1 </id><account_number>555 777-3158 573 234
3</account_number><bill_period> Jun 9 – Jun 8, 2000 </bill_period><total>$0.35</total>
</row>
</invoice><carrier>
<row><invoice_id> 1 </invoice_id><carrier>Sprint</carrier>
</row>
</carrier>...
id account_number bill_period total
1 555 777-3158 573 234 3
Jun 9 – Jun 8, 2000
$0.35
invoice
invoice_id carrier
1 Sprint
carrier
invoice_id
no
date number_called
time rate min
amount
1 1 JUN 10
973 555-8888
10:17pm
NIGHT
1 0.05
1 2 JUN 13
973 650-2222
10:19am
DAY 1 0.15
1 3 JUN 15
206 365-9999
10:25pm
NIGHT
3 0.15
itemized_call
39
User Defined XML View
Id account_number bill_period total
1 555 777-3158 573 234 3
Jun 9 – Jun 8, 2000
$0.35
Invoice
Invoice_id Carrier
1 Sprint
Carrier
Invoice_id
No
Date Number_called
Time Rate Min
Amount
1 1 JUN 10
973 555-8888
10:17pm
NIGHT
1 0.05
1 2 JUN 13
973 650-2222
10:19am
DAY 1 0.15
1 3 JUN 15
206 365-9999
10:25pm
NIGHT
3 0.15
Itemized_call
<invoice>
<account_number>555 777-3158 573 234 3</account_number>
<bill_period>Jun 9 - Jul 8, 2000</bill_period>
<carrier>Sprint</carrier>
<itemized_call no=”1” date=”JUN 10” number_called=”973 555-8888” time=”10:17pm” rate=”NIGHT” min=”1” amount=”0.05” />
<itemized_call no=”2” date=”JUN 13” number_called=”973 650-2222” time=”10:19pm” rate=”DAY” min=”1” amount=”0.15” />
<itemized_call no=”3” date=”JUN 15” number_called=”206 365-9999” time=”10:25pm” rate=”NIGHT” min=”3” amount=”0.15” />
<total>$0.35</total>
</invoice>
40
User Defined XML View Cont.
Create view invoice as (FOR
$invoice IN view(“default”)/invoice/row
RETURN<invoice>
<account_number>$invoice/account_number</account_number><bill_period>$invoice/bill_period</bill_period>FOR
$carrier in view(“default”)/carrier/rowWHERE
$carrier/invoice_id = $invoice/idRETURN
<carrier>$carrier</carrier>FOR
$itemized_call in view(“default”)/itemized_call/rowWHERE
$itemized_call/invoice_id = $invoice/idRETURN
<itemized_call no=$itemized_call/no date=$itemized_call/date number_called=$itemized_call/number_called time=$itemized_call/time rate=$itemized_call/rate min=$itemized_call/min amount=$itemized_call/amount />
SORTBY (@no)<total>$invoice/total</total>
</invoice>
)
41
XML View XQGMCreate view invoice as (
FOR $invoice IN
view(“default”)/invoice/row
RETURN<invoice>
<account_number>$invoice/account_number</account_number>
<bill_period>$invoice/bill_period</bill_period>
FOR
$carrier in view(“default”)/carrier/row
WHERE
$carrier/invoice_id = $invoice/id
RETURN
<carrier>$carrier</carrier>
FOR
$itemized_call in view(“default”)/itemized_call/row
WHERE
$itemized_call/invoice_id = $invoice/id
RETURN
<itemized_call no=$itemized_call/no date=$itemized_call/date number_called=$itemized_call/number_called time=$itemized_call/time rate=$itemized_call/rate min=$itemized_call/min amount=$itemized_call/amount />
SORTBY (@no)
<total>$invoice/total</total>
</invoice>
)
$account_number
Join (Correlated):
$bill_period $total
$doc
Project: $doc = <invoice> <account_number> $account_number </account_number> <bill_period>$bill_period</bill_period>$carriers $itemized_calls<total>$total</total></invoice>
$carriers
Groupby: $carrier = aggXMLFrags($carrier_entry)
$carrier_entry
Project: $carrier_entry = <carrier>$carrier</carrier>
$carrier
Select: $invoice_id = $id
Table: Carrier
$invoice_id $carrier
Table: Invoice
$id $account_number $bill_period $total
$items
Subquery.
Table: Carrier
$invoice_id $carrier
$items $carriers
42
View Composition User Query XQGM + User View XQGM To cancel out the Navigation operators By using the composition rules
cr8Elem(invoice, cr8AttList(),cr8XMLFragList(
cr8Elem(account_number, cr8AttList(),cr8XMLFragList($account_number)),
cr8Elem(bill_period, cr8AttList(),cr8XMLFragList($bill_period)),
$carriers,$items,cr8Elem(total, cr8AttList(),
cr8XMLFragList($total)))
)
$account_number
Select: getTagName($elem)=“account_number”
$elems
Project: $elems = getContents($invoice)
$elem
Unnest: $elem = unnest($elems)
$invoice
43
12 Composition RulesFunction COMPOSES WITH REDUCTION
1 getTagName cr8Elem(Tag, Atts, Clist) Tag
2 getAttributes Cr8Elem(Tag, Atts, Clist)
Atts
3 getContents cr8Element(Tag, Atts, Clist)
Clist
4 getAttName cr8Att(Name, Val) Name
5 getAttValue cr8Att(Name, Val) Val
6 isElement cr8Element(Tag, Atts, Clist)
True
7 isElement Other than cr8Eleme False
8 isText PCDATA True
9 isText Other than PCDATA False
10
Unnest aggXMLFrags(C) C
11
Unnest cr8XMLFragList(C1, ..., Cn)
C1 U ... U Cn
12
Unnest cr8AttList(A1, ..., An) A1 U ... U An
44
View Composition Example
$account_number
Select: getTagName($elem)=“account_number”
$elems
Project: $elems = getContents($invoice)
$elem
Unnest: $elem = unnest($elems)
$account_number
Join (Correlated):
$bill_period $total
$invoice
Project: $invoice = <invoice> <account_number> $account_number </account_number> <bill_period> $bill_period </bill_period> $carriers $itemized_calls <total> $total </total> </invoice>
$items $carriers
$account_number
Join (Correlated):
45
Computation Pushdown Goal: XQGM SQLs + Tagger Graph Step1: Query Decorrelation
Correlated Join Out Unions Reference: P. Seshadri, et. Al. “Complex Query
Decorrelation”, ICDE 1996. Step2: Tagger Pull-Up
XQGM Tagger Run-Time Graph Use “Sorted Outer Union”
Reference: J. Shanmugasundaram, et. Al. “Efficiently Publishing Relational Data as XML Documents”.
Separation of SQL and Tagger Operations Semantically equivalent fragment by pattern.
46
ComparisonXPERANTO NIAGARA
Goal XQuery SQL XQuery Algebra
Algebra XQGM and Tagger Graph XML Algebra
Data Model Tables of a list of XML Fragments
A collection of bags of vertices
Operators* 10 operators with 13 functions
12 operators
Variable Binding Lot of temporary variables No variables.
Order Sensitive Semi-sensitive (missing orderby)
Regular Expression
No Support at operator level Support at operator level
Text-in-context No Support Support
Level of abstraction
Function level (lower) Logical level (higher)
Transition rules Composition rules & (ad-hoc) 1 Semantically equivalent pattern
(ad-hoc) Equivalent rules
Operation History
Not maintained Maintained
47
Conclusions and Future Work WE NEED OUR OWN ALGEBRA. More Reading
David Beech, et. Al. A Formal Data Model and Algebra for XML.
Mary Fernandez, et. Al. An Algebra for XML Query.