Post on 02-Jan-2016
description
transcript
2003. DSRG, Worcester Polytechnic Institute
1
Beyond the Rainbow:—— A Pot of Gold ala XML Database Projects
Faculty: Elke A. RundensteinerStudents: Xin Zhang, Li Chen, Hong Su, Bradford Pielech, Brian Murphy, Mukesh
Mulchandani, Luping Ding, Ling Wang, Kurt Deschler, Maged El Sayed, Katica Dimitrova,
Song Wang, Bintou Kane, …
2003. DSRG, Worcester Polytechnic Institute 2
Motivation
XML is new, and here to stay … Universal flexible representation of data De facto standard for information
exchange XQuery is useful, and here to stay…
Powerful query language for XML De facto standard for XML querying
Plentitude of relevant new issues …
2003. DSRG, Worcester Polytechnic Institute 3
Internet Internet
XML Paradigm
XML1
XML3RDB4
XML5 RDBL6
XMLn
XML2
WWW: global scale distributed information system for sharing data
XMLQueries AndUpdates
– searching– querying – integrating – restructuring– updating
2003. DSRG, Worcester Polytechnic Institute 4
Internet Internet
What We Aim For…
XML1
RDB3XML4
RDB5XML6
XMLn
XML2
XML Data ManagementMiddleware Technology
– efficient – flexible – scalable– lightweight– resource-sensitive– adaptive
2003. DSRG, Worcester Polytechnic Institute 5
WPI Project Directions
RAINBOW: Exploiting RDB for XML management: Algebraic-XQuery processing
XCube: Flexible XML Mapping Tool: Flexible loading/extracting XML to RDB via XQuery
Updating Virtual XML Views: Update decomposition and trigger-propagation
MASS: Native XML Query Engine: Multi-axis compressed order-preserving XML storage
2003. DSRG, Worcester Polytechnic Institute 6
WPI Project Directions
XCache: XML Query Caching: Cache containment and query rewriting
Materialized XML View Maintenance: Incremental algebraic maintenance strategy
SAXE: XML Incremental Updating & Evolution: Lightweight updating by update query rewriting
RAINDROP: XQuery-based Stream Processing: Adaptive on-fly multi-subscription optimization
2003. DSRG, Worcester Polytechnic Institute
7
THE RAINBOW PROJECT
2003. DSRG, Worcester Polytechnic Institute 8
XML meets Relational DBs
XML
1) Emerging web standard2) Flexible data
representation3) Powerful query
language
Relational Database
1) Widely used to store business data
2) Efficient, reliable, secure DBMS3) Mature query processing
techniques
The look and feel of an XML query system with maturity and technology support of RDB
+
2003. DSRG, Worcester Polytechnic Institute 9
<results><title>TCP/IP Illustrated</title><title>Data on the Web</title>
</results>
Running Example
Data on the Web002
TCP/IP Illustrated001
TitleBid
34.95002
65.95001
PriceBid
<prices><row>
<bid>001</bid><price>65.95</price>
</row><row>
<bid>002</bid><price>34.95</price>
</row></prices>
</dxv>
<dxv><book>
<row><bid>001</bid><title>TCP/IP Illustrated</title>
</row><row>
<bid>002</bid><title>Data on the Web</title>
</row></book>
<result>FOR $t IN
document(“prices.xml”)/book/titleRETURN
$t</result>
<prices><book>
<title>TCP/IP Illustrated</title><price>65.95</price>
</book><book>
<title>Data on the Web</title><price>34.95</price>
</book></prices>
<prices>FOR $book IN document(“dxv.xml”)/book/row
$prices IN document(“dxv.xml”)/prices/rowWHERE $book/bid = $prices/bidRETURN
<book>$book/title,$prices/price
</book></prices>
2003. DSRG, Worcester Polytechnic Institute 10
XML Default View
Fixed and straight-forward mapping scheme.
<DB> <BOOKS> <ROW> <Cover>Paperback</Cover> <TITLE>Texas Holdem'</TITLE> <AUTHORS>David Sklansky, Straight Flush</AUTHORS> </ROW> <ROW> <Cover>Paperback</Cover> <TITLE>Dracula</TITLE> <AUTHORS>Bram Stoker</AUTHORS> </ROW> </BOOKS> <…> …</DB>
Cover Title AuthorsPaperback Texas Holdem' David Slansky, Straight FlushPaperback Dracula Bram Stroker
Books
XML Default View
2003. DSRG, Worcester Polytechnic Institute 11
Generic Loading
FUNCTION Q1($root){LET $maintag := gettag($root)RETURN
<$maintag $root/@*>FOR $actual IN $root/*LET $innertag := gettag($actual)RETURN
IF ($actual/element())THEN
Q1($actual)ELSE
<$innertag $actual/@*>IF ($actual/text())THEN
<PCDATA value=$actual/text()/>ELSE
""</$innertag>
</$maintag>}
Knowledge of schema of XML document to be loaded helps to reduce unnecessary parts.
2003. DSRG, Worcester Polytechnic Institute 12
Instantiation
XML Schema
Schema
XQueryExpression
XQueryExpressio
n(recursive)
XQueryExpression
XQueryExpressio
n(flat)
InstantiatorInstantiator
Generic loading XQuery expression recursive.+ It works for every XML document. - Many recursive calls return no value.- Unnecessary FOR-loops, IF-clauses, and getName()-fct.
2003. DSRG, Worcester Polytechnic Institute 13
Instantiation (Example)
FUNCTION Q1($root){<BOOKLIST>FOR $book IN $root/BOOKRETURN
<BOOK $book/@cover><TITLE>
<PCDATA value=$book/TITLE/text()/></TITLE><AUTHOR>
FOR $name IN $book/AUTHOR/NAMERETURN
<NAME><PCDATA value=$book/A…/>
</NAME></AUTHOR>
</BOOK></BOOKLIST>
}
Short, non-recursive, more efficient … But: XML schema dependent!
(First Step of CLOCK mapping scheme)Instantiated Loading Query
2003. DSRG, Worcester Polytechnic Institute 14
Flexible Mapping Management
RDB
Default
ViewReverserReverser
RDBDefault
View
XQuery(Load)
XQuery(Extract)
XML’
H
XML
Relation
Relation’
GgF
f
1
2
2003. DSRG, Worcester Polytechnic Institute 15
XCube in a Nutshell
Easy-to-use (no new transformation language). Flexible (interchangeable XQuery expressions). Adaptable (to workload, data specifics, …). General (Schema independent). Extendable (with new mapping schemes). Tunable (Loading manager).
1. Generic XQuery loading expressions 2. XQuery load expression instantiation
2003. DSRG, Worcester Polytechnic Institute 16
Tuples
XA
T M
erger
SQL Generator
RDBMS
User XQuery
SQL
XA
T G
enerator
XAT Executor
User Query Results in XML
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
View XAT
User XAT
Architecture
XAT
XAT: XML Algebra Tree
Virtual XML DocumentVirtual XML DocumentVirtual XML Document
View XAT
User XAT
XAT
Virtual XML DocumentVirtual XML DocumentXML Document
2003. DSRG, Worcester Polytechnic Institute 17
XQuery-Level Optimization
XAT - XML Algebra Tree Model XAT Algebraic Query Plan
Optimization XAT Query Plan Reduction
2003. DSRG, Worcester Polytechnic Institute 18
T<results>$t</result>col3
Agg
S”prices.xml”R0
R0, book/title$t
col31:
2:
3:
6:
7:
User Query
User XML Algebra Tree (XAT)
<result>FOR $t IN
document(“prices.xml”)/book/titleRETURN
$t</result>
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
2003. DSRG, Worcester Polytechnic Institute 19
$book, titlecol10T<prices>col5</prices>
col4
S“dxv.xml” R1
R1, /book/row$book
Agg
T<book> [col10][col12] </book>col5
S“dxv.xml” R3
R3, /prices/row$prices
$prices, pricecol12
11:
12:
22:
23:
25:
14:
15:
20:
21:
31:
$book, bidcol6
$prices, bidcol7
27:
28:
col6=col726:
View Query
View XML Algebra Tree (XAT)
<prices>FOR $book IN document(“dxv.xml”)/book/row
$prices IN document(“dxv.xml”)/prices/rowWHERE $book/bid = $prices/bidRETURN
<book>$book/title,$prices/price
</book></prices>
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
2003. DSRG, Worcester Polytechnic Institute 20
T<results>$t</result>col3
Agg
col4 R0
R0, book/title$t
col31:
2:
3:
6:
7:$book, title
col10
T<prices>col5</prices>col4
S“dxv.xml” R1
R1, /book/row$book
Agg
T<book> [col10][col12] </book>col5
S“dxv.xml” R3
R3, /prices/row$prices
$prices, pricecol12
11:
12:
22:
23:
25:
14:
15:
20:
21:
31:
$book, bidcol6
$prices, bidcol7
27:
28:
col6=col726:
User QueryView Query
Merged XML Algebra Tree (XAT)
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
2003. DSRG, Worcester Polytechnic Institute 21
XQuery-Level Optimization
XML Algebra Representation: XAT XAT Query Plan Rewriting XAT Query Plan Reduction
2003. DSRG, Worcester Polytechnic Institute 22
XAT Rewrite Query Optimization at Logic Algebra
Level. Goals:
Redundancy Elimination. Computation Pushdown.
Technique: Equivalence Rewrite Rules. Heuristics:
Pushdown Navigates Remove Construction of Intermediate Result Combine Multiple Operators.
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
2003. DSRG, Worcester Polytechnic Institute 23
T<results>$t</result>col3
Agg
col4 R0
R0, book/title$t
col31:
2:
3:
6:
7: $book, titlecol10
T<prices>col5</prices>col4
S“dxv.xml” R1
R1, /book/row$book
Agg
T<book> [col10][col12] </book>col5
S“dxv.xml” R3
R3, /prices/row$prices
$prices, pricecol12
11:
12:
22:
23:
25:
14:
15:
20:
21:
31:
$book, bidcol6
$prices, bidcol7
27:
28:
col6=col726:
User Query View Query
Before Navigation Pushdown
2003. DSRG, Worcester Polytechnic Institute 24
31:
$book, bidcol6
27:
R1, /book/row$book14:
S“dxv.xml” R115:
$book, titlecol1023:
$prices, bidcol7
28:
R3, /prices/row$prices20:
S“dxv.xml” R321:
$prices, pricecol12
25:
T<results>$t</result>col3
Agg
col31:
2:
3:
R0, book/title$t
6:
col6=col726:
T<prices>col5</prices>R011:
Agg
12:
T<book> [col10][col12] </book>col522:
After Navigation PushdownView QueryUser Query
2003. DSRG, Worcester Polytechnic Institute 25
31:
$book, bidcol6
27:
R1, /book/row$book14:
S“dxv.xml” R115:
$book, titlecol1023:
$prices, bidcol7
28:
R3, /prices/row$prices20:
S“dxv.xml” R321:
$prices, pricecol12
25:
T<results>$t</result>col3
Agg
col31:
2:
3:
R0, book/title$t
6:
col6=col726:
T<prices>col5</prices>R011:
Agg
12:
T<book> [col10][col12] </book>col522:
Remove any Taggers?View QueryUser Query
2003. DSRG, Worcester Polytechnic Institute 26
col31:
T<results>$t</result>col32:
Agg3:
col6=col726:
After Tagger Cancel Out
31:
$book, bidcol6
27:
R1, /book/row$book14:
S“dxv.xml” R115:
$book, title$t23:
$prices, bidcol7
28:
R3, /prices/row$prices20:
S“dxv.xml” R321:
$prices, pricecol12
25:
View QueryUser Query
2003. DSRG, Worcester Polytechnic Institute 27
After Making Join
JOIN col6=col731:
$book, bidcol6
27:
R1, /book/row$book14:
S“dxv.xml” R115:
$book, title$t23:
$prices, bidcol7
28:
R3, /prices/row$prices20:
S“dxv.xml” R321:
$prices, pricecol12
25:
col31:
T<results>$t</result>col32:
Agg3:
View QueryUser Query
2003. DSRG, Worcester Polytechnic Institute 28
XQuery-Level Optimization
XML Algebra Representation: XAT XAT Query Plan Rewriting XAT Query Plan Reduction
2003. DSRG, Worcester Polytechnic Institute 29
XAT Cleanup Why:
SQL engine cannot reduce redundancy in XQuery.
How: Data Redundancy by Schema Cleanup
Each operator produced, consumed and modified some columns.
Minimum schema is then computed. Tree Redundancy by Unused Operator Cutting
Cutting matrix generation. Required columns analysis. Operator cutting.
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
2003. DSRG, Worcester Polytechnic Institute 30
XAT Operator Properties Produced
Desc: New column generated by operator. Example: , S, T
Consumed Desc: Columns required by operator. Example: ,
Modified Desc: Columns modified by operator. Example: , ,
2003. DSRG, Worcester Polytechnic Institute 31
Schema Computation
{R3}{}{R3}2021
{R3, $prices}{R3}{$prices}2820
{R3, $prices, col7}{$prices}{col7}2528
{R3, $prices, col7, col12}{$prices}{col12}3125
{R1}{}{R1}1415
{R1, $book}{R1}{$book}2714
{R1, $book, col6}{$book}{col6}2327
{R1, $book, col6, $t}{$book}{$t}3123
{R1, $book, col6, $t, R3, $prices, col7, col12}
{col6, col7}
{}331
{R1, $book, col6, $t, R3, $prices, col7, col12}
{}{}23
{col3, R1, $book, col6, $t, R3, $prices, col7, col12}
{$t}{col3}12
{col3}{col3}{}1
Old SchemaConsumedProducedParentNode
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
2003. DSRG, Worcester Polytechnic Institute 33
Schema Computation
{R3}P2021
{$prices}CP2820
{$prices, col7}
CP2528
{col7, col12}
CP3125
{R1}P1415
{$book}CP2714
{$book, col6}
CP2327
{col6, $t}CP3123
{$t}CC331*
{$t}23
{col3}CP12
{col3}C1
New SchemaR3$pricescol12R1$bookcol7col6$tcol3Parent()#
*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.
Intuition: Don’t keep anything that’s not used later.
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
2003. DSRG, Worcester Polytechnic Institute 34
Schema Cleanup ResultNode
Original Schema Minimum Schema
1 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}
{col3}
2 {col3, R1, $book, col6, $t, R3, $prices, col7, col12}
{col3}
3 {R1, $book, col6, $t, R3, $prices, col7, col12}
{$t}
31 {R1, $book, col6, $t, R3, $prices, col7, col12}
{$t}
23 {R1, $book, col6, $t} {col6, $t}
27 {R1, $book, col6} {$book, col6}
14 {R1, $book} {$book}
15 {R1} {R1}
25 {R3, $prices, col7, col12} {col7, col12}
28 {R3, $prices, col7} {$prices, col7}
20 {R3, $prices} {$prices}
21 {R3} {R3}
2003. DSRG, Worcester Polytechnic Institute 35
XAT Cleanup Schema Cleanup
Each operator produced, consumed and modified some columns.
Minimum schema is then computed. Unused Operator Cutting
Cutting matrix generation. Required columns analysis. Operator cutting.
2003. DSRG, Worcester Polytechnic Institute 36
Cutting Matrix Purpose:
Get rid of unused operators. Equations:
Propagation of modified Propagation of required
Identify cuttable node.
2003. DSRG, Worcester Polytechnic Institute 37
Matrix Computation
# Parent()
col3
$t
col6
col7
$book
R1
col12
$prices
R3
Cut?
1 C
2 1 P C
3 2 - - - - - - - - -
31*
3 C C
23 31 P C
27 23 P C
14 27 P C
15 14 P
25 31 P C
28 25 P C
20 28 P C
21 20 P*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
2003. DSRG, Worcester Polytechnic Institute 38
Matrix Computation (Cont.1)
P2021
CP2820
CP2528
CP3125
P1415
CP2714
CP2327
CP3123
CC331*
-------M-23
CP12
RRRR1
Cut?R3$pricescol12R1$bookcol7col6$tcol3Parent()#
*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
Intuition: Give me only the required columns in order to get the final result.
2003. DSRG, Worcester Polytechnic Institute 39
Matrix Computation (Cont. 2)
# Parent()
col3
$t
col6
col7
$book
R1
col12
$prices
R3
Cut?
1 R R R R
2 1 P C
3 2 - M - - - - - - -
31*
3 C C X
23 31 P C
27 23 P C X
14 27 P C
15 14 P
25 31 P C X
28 25 P C X
20 28 P C X
21 20 P X*We assume Join didn’t modify $t. Otherwise, only node 25 will be deleted.
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
2003. DSRG, Worcester Polytechnic Institute 40
XAT after Cutting
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
Agg
col3
14:
15:
23:
1:
3:
T<results>$t</result>col32:
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27:
28:
14:
15:
20:
21:
31:
23:25:
1:
2:
3:
Reduced To
2003. DSRG, Worcester Polytechnic Institute 41
SQL Generated
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
Agg
col3
14:
15:
23:
1:
3:
T<results>$t</result>col32:
$book, title$t
S“dxv.xml” R1
R1, /book/row$book
JOIN col6=col7
S“dxv.xml” R3
R3, /prices/row$prices
$book, bidcol6
$prices, bidcol7
$prices, pricecol12
T<results>$t</result>col3
Agg
col3
27: 28:
14:
15:
20:
21:
31:
23: 25:
1:
2:
3:
SELECT “$book”.title as “$t”, “$book”.bid as “col6”,“$prices”.price as “col12”,“$prices”.bid as “col7”
FROM book “$book”,prices “$prices”
WHERE “col6”=“col7”
SELECT “$book”.title as “$t”, FROM book “$book”,
XA
T M
erger
SQL Generator
User XQuery XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT
2003. DSRG, Worcester Polytechnic Institute 42
XQuery-Level Optimization
XML Algebra Representation: XAT XAT Query Plan Rewriting XAT Query Plan Reduction
2003. DSRG, Worcester Polytechnic Institute 43
Performance Gain in Execution
0
10000
20000
30000
40000
50000
10 510 1010 1510 2010 2510# of Elements in XML dataset
Tim
e (m
s)
None Rewrite Cleanup Rewrite+Cleanup
2003. DSRG, Worcester Polytechnic Institute 44
Rainbow Engine Overhead
1%42%
2%
55%
Generation(ms)
Rewrite(ms)
Decorrelation(ms)
Cleanup(ms)
XA
T M
erger
SQL Generator
User XQuery
XA
T G
enerator
XAT Executor
XAT Optimizer
XAT
XAT
View XQuery
XA
T D
ecorrelator
XAT
View XAT
User XAT
XAT
View XAT
User XAT XAT
Rewrite
XAT Cleanup
Total:32,522 ms
Ack.: XQuery using Kweelt Parser
2003. DSRG, Worcester Polytechnic Institute 45
http://davis.wpi.edu/dsrg/rainbowhttps://sourceforge
.net/projects/rainbow-engine/
2003. DSRG, Worcester Polytechnic Institute 46
Related Work XPERANTO[VLDBJ2000]: XQGM vs. XAT
Xquery Views over RDB, Extension by UDFs for XML features
SilkRoute[IEEE2001(24:2)]: Xquery Views over RDB, Generate SQL Efficiently
AGORA[VLDB2000]: Syntax level rewriting.
2003. DSRG, Worcester Polytechnic Institute 47
Summary Efficient XQuery Processing XML Algebra Tree (XAT) XAT Optimization:
Rewrite by using equivalent rules Cleanup
Schema cleanup Operator cutting
Prototype system implementation.
2003. DSRG, Worcester Polytechnic Institute 48
What’s Next in Rainbow?
RAINBOW I: Rainstore and more. Go “physical” : XQuery processing and optimization
RAINBOW II: Optimization Multi-query optimization using materialized views
Updating Virtual XML Views: Update decomposition and trigger-propagation
Materialized XML View Maintenance: Incremental algebraic maintenance strategy
Distributed Integration Engine : Distribute query processing across remote servers
2003. DSRG, Worcester Polytechnic Institute 49
What’s Next : Raindrop ?
On-the-fly stream processing (automata) Constraint-driven query optimization Multi-query optimization (subscriptions) On-the-fly query plan migration Resource-sensitive rescheduling