XQuery Reloaded
Roger Bamford, Vinayak Borkar, Matthias Brantner, Peter M. Fischer, Daniela Florescu, David Graf, Donald Kossmann, Tim Kraska, Dan Muresan, Sorin Nasoi, Markos Zacharioudakis
Systems Group/ETH Zurich 27.09.2009
XQuery
Google searches for “XQuery” (normalized)
27.09.2009 2 Systems Group/ETH Zurich
0
10
20
30
40
50
60
70
80
90
100
2004
-01-
04
2004
-03-
14
2004
-06-
13
2004
-08-
29
2004
-11-
07
2005
-01-
16
2005
-03-
27
2005
-06-
26
2005
-09-
04
2005
-11-
13
2006
-01-
22
2006
-04-
02
2006
-06-
11
2006
-08-
20
2006
-10-
29
2007
-01-
07
2007
-03-
18
2007
-05-
27
2007
-08-
05
2007
-10-
14
2007
-12-
23
2008
-03-
02
2008
-05-
11
2008
-07-
20
2008
-09-
28
2008
-12-
07
2009
-02-
15
2009
-04-
26
2009
-07-
05
Google searches for “SQL” (normalized)
27.09.2009 3 Systems Group/ETH Zurich
0
10
20
30
40
50
60
70
80
90
100
2004
-01-
04
2004
-03-
14
2004
-05-
23
2004
-08-
01
2004
-10-
10
2004
-12-
19
2005
-02-
27
2005
-05-
08
2005
-07-
17
2005
-09-
25
2005
-12-
04
2006
-02-
12
2006
-04-
23
2006
-07-
02
2006
-09-
10
2006
-11-
19
2007
-01-
28
2007
-04-
08
2007
-06-
17
2007
-08-
26
2007
-11-
04
2008
-01-
13
2008
-03-
23
2008
-06-
01
2008
-08-
10
2008
-10-
19
2008
-12-
28
2009
-03-
08
2009
-05-
17
2009
-07-
26
XQuery folklore
27.09.2009 4 Systems Group/ETH Zurich
XML and XQuery are slow XQuery is complicated Legacy of XML, Namespaces, Schema, Xpath Bad products No people ...
XQuery folklore
27.09.2009 5 Systems Group/ETH Zurich
XML and XQuery are slow partly true, but products are catching up highly optimizable (like SQL; better than Java) [Boncz06]
XQuery is complicated is skiing more complicated than snowbording? try to process Web pages, RSS Feeds with Java!
Legacy of XML, Namespaces, Schema, XPath yes, but there is no alternative (relevance!)
Bad products huge investments: research projects, big players
No people courses at all top places (ETH, Stanford, etc.)
.... [http://www.ibm.com/developerworks/xml/library/x-xqmyth.html]
This talk
Clear your mind and get rid of prejudices
Explain XQuery in a nutshell XQuery today and tomorrow Introduce Zorba – the MySQL for XQuery Show some fancy usages of Zorba
Not in this talk: How to design an XQuery processor. If you are interested in that, talk to one of us after the talk
27.09.2009 6 Systems Group/ETH Zurich
The roots of XQuery… Origins go back to the QL workshop 1998 Standardized by the W3C (recommendation since 2007) Started out as a query language Closed data model, composition
results of expressions can be input for expressions
Compliant with other standards XML, XML Schema, XPath, Web Services, ...
Example query: for $empl in //employees
let $name := $emp/name where $x/salary > 5000 order by $name return $name
27.09.2009 7 Systems Group/ETH Zurich
Today, XQuery is much more….
XQuery = Query + Update + Fulltext + Scripting + Streaming + Libraries
X XQuery is the only language for XML, but that does not mean that
XML is all it can do CSV, JSON, HTML, … spectrum: structured data to unstructured text
Query XQuery has joins, group-by, sorting, etc., but that does not mean
that it is only good for the DB by now, full-fledged programming language modules for structured programming
The name „XQuery“ is a disnomer!!!
27.09.2009 8 Systems Group/ETH Zurich
XQuery is alive
Most successful in the middle-tier Data integration Configuration Reporting
27.09.2009 9 Systems Group/ETH Zurich
but also in the database world
(Oracle has 8000+ customers reporting XQuery bugs)
…why else should so many people from different companies care to have his/her name on the paper
XQuery in the future: Gartner’s Top Ten disruptive technologies for 2008 to 2012 Cloud computing and cloud/web platforms Multi-core and hybrid processors Virtualisation and fabric computing Social networks and social software Web mashups User interface Ubiquitous computing Contextual computing Augmented reality Semantics 27.09.2009 10 Systems Group/ETH Zurich
What is needed: A programming language for the Web
Machine-to-Machine Communication between/inside distributed systems across company boundaries
Machine-to-Human Communication over the browser, event-based interaction
Variety of workloads: Updates, OLTP/OLAP/Streaming queries, structured and semi-structured data, varying and evolving over time
27.09.2009 11 Systems Group/ETH Zurich
IMHO XQuery is the best starting point XQuery is not perfect, but solves many of the problems
XQuery in the future: Gartner’s Top Ten disruptive technologies for 2008 to 2012 Cloud computing and cloud/web platforms Multi-core and hybrid processors Virtualisation and fabric computing Social networks and social software Web mashups User interface Ubiquitous computing Contextual computing Augmented reality Semantics 27.09.2009 12 Systems Group/ETH Zurich
XQuery mashups
„Gartner predicts that web mashups, which mix content from publicly available sources, will be the dominant model (80 percent) for the creation of new enterprise applications.“
Requires to integrate different source XQuery is made for the web
Works natively with XML and JSON Has native support for REST and WebServices
XQuery is a programming language
27.09.2009 13 Systems Group/ETH Zurich
XQuery and cloud computing/Web platforms
27.09.2009 14 Systems Group/ETH Zurich
Outgoing XML message
communication with the world XML
XML Protocol (SOAP) XML Schema validation
XSLT/ XQuery evaluation
XML Java/C# XML
Java/C# SQL Java/C#
application logic Java/C#, JavaScript
application logic, data validation error handling,
caching, replication and distribution
data persistence and integrity; transactions
SQL queries and updates integrity constraints
triggers, transactions, etc.
Incoming XML message
XQuery and cloud computing/Web platforms
27.09.2009 15 Systems Group/ETH Zurich
communication with the world, application logic, data persistence and integrity; transactions
XQuery (XML Schema, XML Protocols,… )
No impedance mismatch Reduce the numbers of “hops” XQuery made for the web-standards
Outgoing XML message
Incoming XML message
XQuery and new hardware / multi-cores
Requires a highly parallelizable programming language
Declarative (functional) programming model
XQuery is highly optimizable and parallelizable (as well as SQL is highly optimizable and parallelizable)
Made for bulk processing
27.09.2009 16 Systems Group/ETH Zurich
If you do not buy all that: Still, XML is out there and it will not disappear
XML is best choice for communication data general: web services (SOAP, WSDL), REST, RSS specific: XBRL, HL7, ebXML, RosettaNet, ...
XML is best choice for meta-data and code configuration files, XMI (Eclipse), XForms (apps), XMP (photography),
XAML / InfoPath (UIs), ... XML is best choice for documents
XHTML, SVG, OpenXML (Office), UBL (business)
XQuery is the cheapest way to process XML (and JSON)
27.09.2009 17 Systems Group/ETH Zurich
What is missing is a MySQL for XQuery
27.09.2009 18 Systems Group/ETH Zurich
Zorba
Intended to be the MySQL for XQuery Developed by the FLWOR Foundation A team of 5 full-time programmers and several volunteers Contributing companies/organisations
http://www.zorba-xquery.com/
27.09.2009 19 Systems Group/ETH Zurich
What makes Zorba different?
Not a “research” project!!! Open-source “Apache License” Zorba is a query processor like MySQL Exchangeable store
comes with an in-memory store feel free to implement other stores (like 28msec)
Zorba is designed for compliance first! Feature complete (except for full-text)
full-standard and not just the researchy features
C++, interfaces to all main languages (Java, PHP, …)
27.09.2009 20 Systems Group/ETH Zurich
Zorba Architecture
27.09.2009 21 Systems Group/ETH Zurich
Zorba libraries/features
27.09.2009 22 Systems Group/ETH Zurich
XQuery 1.1, Update, Scripting Debugger Eval JSON REST/HTTP XQDoc PDF E-Mail Collections Tidy ....
Events
Cool stuff with Zorba: XQuery in the browser
27.09.2009 23 Systems Group/ETH Zurich
Web browser Zorba
DOM Data access and manipulation Store
Plug In
http://www.xqib.org
Cool stuff with Zorba: 28msec WebServer
27.09.2009 24 Systems Group/ETH Zurich
Zorba Zorba Zorba
http://www.28msec.com
Cool stuff with Zorba: Xadoop
Hadoop + Zorba Position XQuery as an alternative to Pig/Hive/etc. Well-suited for semi-structured data Use cases
pre-processing financial data analysis of blogs/wikipedia etc. ….
Pre-packed on EC2
So far, just a toy project with open ending...
27.09.2009 25 Systems Group/ETH Zurich
XQuery Benchmarking Service
http://xqbench.org
Supports several benchmarks XMark TPC-W like scenario
Automatic test infrastructure Browse results Allows to post: Your queries + Your documents
Main goal: put pressure on the vendors and give use cases for the open-source community
27.09.2009 26 Systems Group/ETH Zurich
Conclusion
Zorba as the MySQL for XQuery XQuery combines a database and a programming language XQuery is ready for the next step
FLWOR Foundation: http://www.flworfound.org Zorba XQuery Processor: http://www.zorba-xquery.com XQuery in the Browser: http://www.xqib.org XQuery Benchmarking Service: http://www.xqbench.org XQuery Eclipse Plug-In: http://www.xqdt.org Xadoop (not yet online): http://www.xadoop.org/
27.09.2009 27 Systems Group/ETH Zurich