Introduction to XSLT - TEItei.oucs.ox.ac.uk/Talks/2011-07-dhox/presentations/xsl-01.pdf · XSLT The...

Post on 06-Aug-2020

17 views 0 download

transcript

Introduction to XSLT

TEI@Oxford

July 2011

Summer School 2011 1/59

Publishing XML files using XSLT

Our work will be divided into four partsBasic XSLT Target: make HTML from TEI documents

More complex XSLT Target: making more complex HTML,sorting and summarizing

Using TEI XSL family Target: customize existing library ofstylesheets

TEI, ODT, DOCX, and ePub Target: transforming TEI to andfrom word-processor and epub formats

… depending on how fast or slow we go, and what the class wantsto talk about …It is assumed that we are working on TEI XML documents.

Summer School 2011 2/59

What is the XSL family?

XPath: a language for expressing paths through XML treesXSLT: a programming language for transforming XMLXSL FO: an XML vocabulary for describing formatted pages

Summer School 2011 3/59

XSLT

The XSLT language isexpressed in XMLuses namespaces to distinguish output from instructionspurely functionalreads and writes XML trees

It was designed to generate XSL FO, but now widely used togenerate HTML.

Summer School 2011 4/59

What is a transformation?Take this:.

.

. ..

.

.

<persName><forename>Milo</forename><surname>Casagrande</surname>

</persName><persName><forename>Corey</forename><surname>Burger</surname>

</persName><persName><forename>Naaman</forename><surname>Campbell</surname>

</persName>

and make this:.

.

. ..

.

.

<item n="1"><name>Burger</name>

</item><item n="2"><name>Campbell</name>

</item><item n="3"><name>Casagrande</name>

</item>

Summer School 2011 5/59

A text example

Take this.

.

. ..

.

.

<div type="recipe" n="34"><head>Pasta for beginners</head><list><item>Pasta</item><item>Grated cheese</item>

</list><p>Cook the pasta and mix with the cheese</p>

</div>

and make this.

.

. ..

.

.

<html><h1>34: Pasta for beginners</h1><p>Ingredients: Pasta Grated cheese</p><p>Cook the pasta and mix with the cheese</p>

</html>

Summer School 2011 6/59

How do you express that in XSL?.

.

. ..

.

.

<xsl:stylesheetxpath-default-namespace="http://www.tei-

c.org/ns/1.0" version="2.0"><xsl:template match="div"><html><h1><xsl:value-of select="@n"/>:

<xsl:value-of select="head"/></h1><p>Ingredients:<xsl:apply-templates select="list/item"/></p><p><xsl:value-of select="p"/>

</p></html>

</xsl:template></xsl:stylesheet>

Note: the namespace declaration linking xsl: tohttp://www.w3.org/1999/XSL/Transform is not shownin these examples.

Summer School 2011 7/59

Structure of an XSL file

.

.

. ..

.

.

<xsl:stylesheetxpath-default-namespace="http://www.tei-

c.org/ns/1.0" version="2.0"><xsl:template match="div">

<!-- .... do something with div elements....--></xsl:template><xsl:template match="p">

<!-- .... do something with p elements....--></xsl:template>

</xsl:stylesheet>

The div and p are XPath expressions, which specify which bit ofthe document is matched by the template.Any element not starting with xsl: in a template body is put intothe output.

Summer School 2011 8/59

The Golden Rules of XSLT

...1 If there is no template matching an element, we go on andprocess the elements inside it

...2 If there are no elements to process by Rule 1, any text insidethe element is output

...3 Children elements are not processed by a template unless youexplicitly say so

...4 xsl:apply-templates select="XX" looks fortemplates which match element ”XX”; xsl:value-ofselect="XX" simply gets any text from that element

...5 The order of templates in your program file is immaterial

...6 You can process any part of the document from any template

...7 Everything is well-formed XML. Everything!

Summer School 2011 9/59

Important magic

Our examples and exercises all start with two important attributeson <stylesheet>:

<xsl:stylesheetxmlns:xsl="http://www.w3.org/1999/XSL/Transform"xpath-default-namespace="http://www.tei-c.org/ns/1.0"version="2.0">....

which indicates...1 In our XPath expressions, any element name without a

namespace is assumed to be in the TEI namespace...2 We want to use version 2.0 of the XSLT specification. This

means that we must use the Saxon processor for our work.

Summer School 2011 10/59

A simple test file

.

.

. ..

.

.

<text><front><div><p>Material up front</p>

</div></front><body><div><head>Introduction</head><p rend="it">Some sane words</p><p>Rather more surprising words</p>

</div></body><back><div><p>Material in the back</p>

</div></back>

</text>

Summer School 2011 11/59

Feature: apply-templates.

.

. ..

.

.

<xsl:stylesheet version="2.0"xpath-default-namespace="http://www.tei-c.org/ns/1.0">

<xsl:template match="/"><html><xsl:apply-templates/>

</html></xsl:template>

</xsl:stylesheet>

.

.

. ..

.

.

<xsl:template match="TEI"><xsl:apply-templates select="text"/>

</xsl:template>

.

.

. ..

.

.

<xsl:template match="text"><h1>FRONT MATTER</h1><xsl:apply-templates select="front"/><h1>BODY MATTER</h1><xsl:apply-templates select="body"/>

</xsl:template>

Summer School 2011 12/59

Feature: value-of

Templates for paragraphs and headings:.

.

. ..

.

.

<xsl:template match="p"><p><xsl:apply-templates/>

</p></xsl:template><xsl:template match="div"><h2><xsl:value-of select="head"/>

</h2><xsl:apply-templates/>

</xsl:template><xsl:template match="div/head"/>

Notice how we avoid getting the heading text twice.Why did we need to qualify it to deal with just <head> inside<div>?

Summer School 2011 13/59

More complex patterns

The select attribute can point to any part of the document. UsingXPath expressions, we can find:

/ the root of document (outside the root element)* any elementtext()name an element called ‘name’@name an attribute called ‘name’

Example of complete path in <value-of>:.

.

. ..

.

.

<xsl:value-ofselect="/TEI/teiHeader/fileDesc/titleStmt/title"/>

Summer School 2011 14/59

XPath

XPath is the basis of most other XML querying and transformationlanguages.

It is a syntax for accessing parts of an XML documentIt uses a path structure to define XML elementsIt has a library of standard functionsIt is a W3C Standard and one of the main components ofXQuery and XSLT

Summer School 2011 15/59

Example text

.

.

. ..

.

.

<body n="anthology"><div type="poem"><head>The SICK ROSE </head><lg type="stanza"><l n="1">O Rose thou art sick.</l><l n="2">The invisible worm,</l><l n="3">That flies in the night </l><l n="4">In the howling storm:</l>

</lg><lg type="stanza"><l n="5">Has found out thy bed </l><l n="6">Of crimson joy:</l><l n="7">And his dark secret love </l><l n="8">Does thy life destroy.</l>

</lg></div>

</body>

Summer School 2011 16/59

XML Structure

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Really attributes (and text) are separate nodes!

Summer School 2011 17/59

/body/div/head

body type=“anthology”

div type= “poem”

div type= “shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

XPath locates any matching nodes

Summer School 2011 18/59

/body/div/lg ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 19/59

/body/div/lg

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 20/59

/body/div/@type ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

@ = attributes

Summer School 2011 21/59

/body/div/@type

body type=“anthology”

div type= “poem”

div

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

type=“poem”

type=“shortpoem”

Summer School 2011 22/59

/body/div/lg/l ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 23/59

/body/div/lg/l

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 24/59

/body/div/lg/l[@n=“2”] ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Square Brackets Filter Selection

Summer School 2011 25/59

/body/div/lg/l[@n=“2”]

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 26/59

/body/div[@type=“poem”]/head ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 27/59

/body/div[@type=“poem”]/head

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 28/59

//lg[@type=“stanza”] ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

// = any descendant

Summer School 2011 29/59

//lg[@type=“stanza”]

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 30/59

//div[@type=“poem”]//l ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 31/59

//div[@type=“poem”]//l

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 32/59

//l[5] ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Square brackets can also filter by counting

Summer School 2011 33/59

//l[5]

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 34/59

//lg/../@type ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Paths are relative: .. = parent

Summer School 2011 35/59

//lg/../@type

body type=“anthology”

div type= “poem”

div

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

type=“poem”

type=“shortpoem”

Summer School 2011 36/59

//l[@n > 5] ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Numerical operations can be useful.

Summer School 2011 37/59

//l[@n > 5]

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 38/59

//div[head]/lg/l[@n=“2”] ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Notice the deleted <head> !

Summer School 2011 39/59

//div[head]/lg/l[@n=“2”]

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 40/59

//l[ancestor::div/@type=“shortpoem”] ?

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

ancestor:: is an unabbreviated axis name

Summer School 2011 41/59

//l[ancestor::div/@type=“shortpoem”]

body type=“anthology”

div type=“poem”

div type=“shortpoem”

head

head

lg type=“stanza”

lg type=“couplet”

l n=“4”

l n=“6”

l n=“2”

l n=“3”

l n=“7”

l n=“1”

l n=“8”

l n=“5”

l n=“1”

lg type=“stanza”

l n=“2”l n=“2”

l n=“2”

Summer School 2011 42/59

XPath: More About Paths

A location path results in a node-setPaths can be absolute (/div/lg[1]/l)Paths can be relative (l/../../head)Formal Syntax: (axisname::nodetest[predicate])For example:child::div[contains(head, 'ROSE')]

Summer School 2011 43/59

XPath: Axes

ancestor:: Contains all ancestors (parent, grandparent, etc.)of the current node

ancestor-or-self:: Contains the current node plus all itsancestors (parent, grandparent, etc.)

attribute:: Contains all attributes of the current nodechild:: Contains all children of the current node

descendant:: Contains all descendants (children,grandchildren, etc.) of the current node

descendant-or-self:: Contains the current node plus all itsdescendants (children, grandchildren, etc.)

Summer School 2011 44/59

XPath: Axes (2)

following:: Contains everything in the document after theclosing tag of the current node

following-sibling:: Contains all siblings after the currentnode

parent:: Contains the parent of the current nodepreceding:: Contains everything in the document that is

before the starting tag of the current nodepreceding-sibling:: Contains all siblings before the current

nodeself:: Contains the current node

Summer School 2011 45/59

Axis examplesancestor::lg = all <lg> ancestorsancestor-or-self::div = all <div> ancestors orcurrentattribute::n = n attribute of current nodechild::l = <l> elements directly under current nodedescendant::l = <l> elements anywhere under currentnodedescendant-or-self::div = all <div> children orcurrentfollowing-sibling::l[1] = next <l> element at thislevelpreceding-sibling::l[1] = previous <l> element atthis levelself::head = current <head> element

Summer School 2011 46/59

XPath: Predicates

child::lg[attribute::type='stanza']child::l[@n='4']child::div[position()=3]child::div[4]child::l[last()]child::lg[last()-1]

Summer School 2011 47/59

XPath: Abbreviated Syntax

nothing is the same as child::, so lg is short forchild::lg@ is the same as attribute::, so @type is short forattribute::type. is the same as self::, so ./head is short forself::node()/child::head.. is the same as parent::, so ../lg is short forparent::node()/child::lg// is the same as descendant-or-self::, so div//l isshort for child::div/descendant-or-self::node()/child::l

Summer School 2011 48/59

Example of context-dependent matches

Compare.

.

. ..

.

.

<xsl:template match="head"> ....</xsl:template>

with.

.

. ..

.

.

<xsl:template match="div/head"> ...</xsl:template><xsl:template match="figure/head"> ....</xsl:template>

Summer School 2011 49/59

Priorities when templates conflict

It is possible for it to be ambiguous which template is to be used:.

.

. ..

.

.

<xsl:template match="person/name">…</xsl:template><xsl:template match="name">…</xsl:template>

when the processor meets a <name>, which template is used?

Summer School 2011 50/59

Solving priorities

There is a priority attribute on <template>; the higher the value,the more inclined the XSLT engine is to use it:.

.

. ..

.

.

<xsl:template match="name" priority="1"><xsl:apply-templates/>

</xsl:template><xsl:template match="person/name" priority="2"> A name</xsl:template>

Summer School 2011 51/59

Template priority generally

The more normal rule is that the most specific template wins..

.

. ..

.

.

<xsl:template match="*"><!-- ... --></xsl:template><xsl:template match="tei:*"><!-- ... --></xsl:template><xsl:template match="p"><!-- ... --></xsl:template><xsl:template match="div/p"><!-- ... --></xsl:template><xsl:template match="div/p/@n"><!-- ... --></xsl:template>

Summer School 2011 52/59

Pushing and pullingXSLT stylesheets can be characterized as being of two types:

push In this type of stylesheet, there is a different templatefor every element, communication via<xsl:apply-templates> and the overall result isassembled from bits in each template. It is sometimeshard to visualize the final design. Common fordata-oriented processing where the structure is fixed.

pull In this type, there is a master template (usuallymatching /) with the main structure of the output,and specific <xsl:for-each> or <xsl:value-of>commands to grab what is needed for each part. Thetemplates tend to get large and unwieldy. Commonfor document-oriented processing where the inputdocument structure varies.

Summer School 2011 53/59

Attribute value template

What if we want to turn... ..

. .<ref target="http://www.oucs.ox.ac.uk/">OUCS</ref>

into... ..

. .<a href="http://www.oucs.ox.ac.uk/"/>

? What we cannot do is.

.

. ..

.

.

<xsl:template match="ref"><a href="@target"><xsl:apply-templates/>

</a></xsl:template>

This would give the @href attribute the value ‘@target’.

Summer School 2011 54/59

For example

Instead we use {} to indicate that the expression must beevaluated:.

.

. ..

.

.

<xsl:template match="ref"><a href="{@target}"><xsl:apply-templates/>

</a></xsl:template>

This would give the @href attribute whatever value the attribute@target has!

Summer School 2011 55/59

Feature: for-eachIf we want to avoid lots of templates, we can do in-line loopingover a set of elements. For example:.

.

. ..

.

.

<xsl:template match="listPerson"><ul><xsl:for-each select="person"><li><xsl:value-of select="persName"/>

</li></xsl:for-each>

</ul></xsl:template>

contrast with.

.

. ..

.

.

<xsl:template match="listPerson"><ul><xsl:apply-templates select="person"/>

</ul></xsl:template><xsl:template match="person"><li><xsl:value-of select="persName"/>

</li></xsl:template>

Summer School 2011 56/59

Feature: ifWe can make code conditional on a test being passed:.

.

. ..

.

.

<xsl:template match="person"><xsl:if test="@sex='1'"><li><xsl:value-of select="persName"/>

</li></xsl:if>

</xsl:template>

contrast with.

.

. ..

.

.

<xsl:template match="person[@sex='1']"><li><xsl:value-of select="persName"/>

</li></xsl:template><xsl:template match="person"/>

The @test can use any XPath facilities.

Summer School 2011 57/59

Feature: choose

We can make a multi-value choice conditional on what we find inthe text:.

.

. ..

.

.

<xsl:template match="person"><xsl:apply-templates/><xsl:choose><xsl:when test="@sex='1'">(male)</xsl:when><xsl:when test="@sex='2'">(female)</xsl:when><xsl:when test="not(@sex)">(no sex specified)</xsl:when><xsl:otherwise>(unknown sex)</xsl:otherwise>

</xsl:choose></xsl:template>

Summer School 2011 58/59

Summary

Now you can...1 Write templates which match any element or attribute...2 Pick out text from anywhere...3 Write code conditional on something in the text

Summer School 2011 59/59