XML Processing Moves Forward XSLT 2.0 and XQuery 1.0 Michael Kay Prague 2005.

Post on 11-Jan-2016

218 views 0 download

transcript

XML Processing Moves Forward XSLT 2.0 and XQuery 1.0

Michael Kay

Prague 2005

2

About me

• Database background• Started using XML in 1998 for

content management applications• Author of XSLT Programmer’s

Reference• Developer of Saxon XSLT

processor• Member of W3C XSL and XQuery

Working Groups• Founded SAXONICA March 2004

3

Contents

• A tour of the new specs

• What’s significant about XSLT 2.0

• A quick demo

• Why XQuery?

4

The QT Specification Family

XSLT 2.0 XQuery 1.0

XPath 2.0

Data Model

XML Schema

Functionsand

Operators

5

XSLT 1.0XPath 1.0

Standards maturity

Maturity

Time

XQueryXSLT 2.0XPath 2.0

XMLSchema

XML

REC

CR

6

XML Schema

A family of standards

XPath 1.0

XPath 2.0

XQuery 1.0

XSLT 1.0

XSLT 2.0

7

XSLT and XQuery

Documents Data

XSLT

XQuery

8

What’s new in XSLT 2.0

• New Processing Model

• Major Features– grouping– regular expressions– functions– schema support

• Many “minor” features

9

Some “minor” features

XSLT 2.0• Temporary trees

• Multiple Output Files

• Format date/time

• Tunnel parameters

• Declared variable types

• Multi-mode templates

• xsl:next-match

• conditional compilation

• XHTML serialization

• xsl:namespace

• separator=“,”

• character maps

XPath 2.0

• Sequences

• if..then..else

• for $x in X return f($x)

• some/every

• except/intersect

• $n is $m

Function library• String functions

• Regex functions

• Date/time arithmetic

• URI handling

• min(), max(), avg()

10

Handling unstructured text

• unparsed-text() function– reads a text file into a string

• tokenize() function– splits a string into substrings

• xsl:analyze-string– parses a string and generates markup

11

Regular expression functions

• matches()test if a string matches a regexif (matches($in, ‘[A-Z]{3}[0-9]{3}’)

• tokenize()split a string into substringsregex matches the separatorfor $s in tokenize($in, ‘,\s?’) ...

• replace()replace every occurrence of a matchreplace($in, ‘\s’, ‘%20’)

12

Grouping

• Takes any sequence as input• Divides the items into groups• Applies processing to each group

group-by: items with a common value for a grouping key

group-adjacent:adjacent items with a common grouping key

group-starting-with:pattern to match first item in each group

group-ending-with:pattern to match last item in each group

13

Grouping by Value

<xsl:for-each-group select=“book” group-by=“publisher”> <xsl:sort select=“current-grouping-key()”/> <h2>Publisher: <xsl:value-of select=“current-grouping-key”/> </h2> <xsl:for-each select=“current-group()”/> <xsl:sort select=“title”/> <p>author: <xsl:value-of select=“author”/></p> <p>title: <xsl:value-of select=“title”/></p> </xsl:for-each></xsl:for-each-group>

14

User-defined Functions

• Written like named templates• Called from XPath• Return a result

<xsl:function name=“ged:date-to-ISO” as=“xs:date”><xsl:param name=“in” as=“ged:date”/><xsl:sequence select=“xs:date(concat( substring($in, 8, 4), ‘-’ format-number(index-of((“JAN”, “FEB”, ...), substring($in, 4, 3)), ’00’), ‘-’, substring($in, 1, 2)))”/></xsl:function>

<xsl:sort select=“ged:date-to-ISO(@birth-date)”/>

15

XQuery 1.0

• Designed to query XML databases

• Also handles in-memory transformations

• Well supported by database vendors

16

XQuery ExampleJoin two tables

xquery version 1.0;

<results> { for $p in doc ("auction.xml")/site/people/person let $a := for $t in doc("auction.xml") /site/closed_auctions/closed_auction where $t/buyer/@person = $p/@id return $t return <item person="{$p/name}"> {count ($a)} </item>} </results>

XMark Q8

17

XSLT Equivalent

<result xsl:version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:for-each select="/site/people/person"> <xsl:variable name="a" select="/site/closed_auctions/closed_auction [buyer/@person = current()/@id]"/> <item person="{name}"> <xsl:value-of select="count($a)"/> </item> </xsl:for-each></result>

XMark Q8

18

Optimization

• With multi-GB databases, using indexes is essential

• XQuery does not have template rules

• This makes it possible to do static analysis and join optimization

19

XMark Q8 results (msecs)

1Mb

1503

160

33

90

Xalan

xt

MSXML

Saxon 8.4

XSLT

XQuerySaxon 8.4

Qizx

Galax

136

351

1870

4Mb

11006

2253

519

1340

1575

711

6672

10Mb

65855

16414

4248

11126

11947

1813

16625

O(n2)

O(n)

20

Two can play at that game!

Xalan

xt

MSXML

Saxon 8.5

1Mb

1503

160

33

27

XSLT

XQuerySaxon 8.5

Qizx

Galax

16

351

1870

4Mb

11006

2253

519

26

16

711

6672

10Mb

65855

16414

4248

45

31

1813

16625

O(n2)

O(n)

caveat: this is one query only!

21

Conclusions

• XSLT 2.0 and XQuery 1.0 are nearly ready

• XSLT 2.0 has many powerful new features, making new applications possible

• XQuery 1.0 designed for optimization against very large databases