+ All Categories
Home > Documents > XML and XSL A report on the workshop given by Shaoping Moss on October 16, 2004 Presented by ASIS&T...

XML and XSL A report on the workshop given by Shaoping Moss on October 16, 2004 Presented by ASIS&T...

Date post: 29-Dec-2015
Category:
Upload: sara-lewis
View: 213 times
Download: 0 times
Share this document with a friend
39
XML and XSL XML and XSL A report on the workshop A report on the workshop given by Shaoping Moss on given by Shaoping Moss on October 16, 2004 October 16, 2004 Presented by ASIS&T members Presented by ASIS&T members Caryn Anderson, Prairie Clayton & Kara Schwartz Caryn Anderson, Prairie Clayton & Kara Schwartz At Simmons College, November 1, 2004 At Simmons College, November 1, 2004 .with additional examples from .with additional examples from a a real-life project real-life project
Transcript

XML and XSLXML and XSL

A report on the workshop given by A report on the workshop given by Shaoping Moss on October 16, 2004Shaoping Moss on October 16, 2004

Presented by ASIS&T membersPresented by ASIS&T members

Caryn Anderson, Prairie Clayton & Kara SchwartzCaryn Anderson, Prairie Clayton & Kara Schwartz

At Simmons College, November 1, 2004At Simmons College, November 1, 2004

…….with additional examples from a .with additional examples from a

real-life projectreal-life project

Topics discussedTopics discussed

SGML, XML, and HTMLSGML, XML, and HTML

XML and XSL BasicsXML and XSL Basics

XML in Libraries and AcademicsXML in Libraries and Academics

XML in Future Web DevelopmentXML in Future Web Development

Slide content courtesy of Shaoping Moss.Slide content courtesy of Shaoping Moss.

Markup LanguagesMarkup Languages

Address the structure of a document.Address the structure of a document.Identify different components of the document.Identify different components of the document.Convey information to software that will allow it Convey information to software that will allow it to:to:– Index the data for searching.Index the data for searching.– Render the data.Render the data.– Transform the data.Transform the data.

SGML, XML, and HTML are all markup SGML, XML, and HTML are all markup languages.languages.

Slide content courtesy of Shaoping Slide content courtesy of Shaoping Moss.Moss.

Document, Structure, and FormatDocument, Structure, and Format

A document is:A document is:– ““A record which contains information , originally an inscribed or A record which contains information , originally an inscribed or

written record but now considered to include any format in which written record but now considered to include any format in which information might be held (e.g. map, manuscript, tape, video, information might be held (e.g. map, manuscript, tape, video, software).” (software).” (International Encyclopedia of Information and International Encyclopedia of Information and Library ScienceLibrary Science))

– A collection of small elements, which can be headings, A collection of small elements, which can be headings, subheadings, paragraphs, quotations, etc…subheadings, paragraphs, quotations, etc…

Structure vs FormatStructure vs Format– Structure is about the content of the document.Structure is about the content of the document.– Format is about the way a document looks.Format is about the way a document looks.

Slide content courtesy of Shaoping Slide content courtesy of Shaoping Moss.Moss.

What is SGML?What is SGML?

Stands for Standard Generalized Markup Stands for Standard Generalized Markup Language.Language.Initiated by Charles Goldfarb at IBM in the Initiated by Charles Goldfarb at IBM in the 1960s.1960s.Adopted as a standard of the International Adopted as a standard of the International Organization for Standardization(ISO Organization for Standardization(ISO 8879) in 1986.8879) in 1986.

Slide content courtesy of Shaoping Moss.Slide content courtesy of Shaoping Moss.

SGML and Its SubdivisionsSGML and Its Subdivisions

SGML is composed of tag-set building rules.SGML is composed of tag-set building rules.SGML has given birth to other sets of SGML has given birth to other sets of subdivisions:subdivisions:– HTML and XML.HTML and XML.– CALS for defense.CALS for defense.– BOEING for commercial airlines.BOEING for commercial airlines.– C-H for publishing.C-H for publishing.– OED for Old English Dictionary.OED for Old English Dictionary.– TEI guidelines for the Text Encoding Initiative.TEI guidelines for the Text Encoding Initiative.– EAD for Encoded Archival Descriptions.EAD for Encoded Archival Descriptions.

Slide content courtesy of Shaoping Slide content courtesy of Shaoping Moss.Moss.

HTML DevelopmentHTML Development

HTML stands for Hypertext Markup Language.HTML stands for Hypertext Markup Language.

HTML was developed by Tim Berners-Lee at a HTML was developed by Tim Berners-Lee at a physics lab near Geneva, Switzerland in 1992.physics lab near Geneva, Switzerland in 1992.

Its simplicity has contributed to the rapid growth Its simplicity has contributed to the rapid growth of the World Wide Web in the 1990s.of the World Wide Web in the 1990s.

HTML version 4 came out in 1997.HTML version 4 came out in 1997.

XHTML 1.0 is the latest HTML standard.XHTML 1.0 is the latest HTML standard.

Slide content courtesy of Shaoping Slide content courtesy of Shaoping Moss.Moss.

HTML ProblemsHTML Problems

Easy HTML coding has made it harder for Easy HTML coding has made it harder for browsers to handle.browsers to handle.

Tags are predefined in HTML.Tags are predefined in HTML.

Format and content are mixed and content Format and content are mixed and content is hard to reuse.is hard to reuse.

Slide content courtesy of Shaoping Moss.Slide content courtesy of Shaoping Moss.

What is XML?What is XML?

XML is a new Web standard developed by the World XML is a new Web standard developed by the World Wide Web Consortium in 1998.Wide Web Consortium in 1998.XML stands for eXtensible Markup Language.XML stands for eXtensible Markup Language.XML was designed to describe data.XML was designed to describe data.XML tags are not predefined in XML.XML tags are not predefined in XML.XML separates format from content and semantic XML separates format from content and semantic structure.structure.Data encoded in XML can function much like a traditional Data encoded in XML can function much like a traditional database.database.XML content can be output in many formats, such as XML content can be output in many formats, such as XHTML, text, Word documents, PDF, etc…XHTML, text, Word documents, PDF, etc…

Slide content courtesy of Shaoping Moss.Slide content courtesy of Shaoping Moss.

The Display of the DocumentMy First XML

Chapter 1: Introduction to XML What is HTML? What is XML?Chapter 2: XML Syntax Elements must have a closing tag Elements must be properly nested

Slide content courtesy of Shaoping Moss.

An HTML DocumentAn HTML Document

Slide content courtesy of Shaoping Moss.

An HTML document describes the book:

…<h1>My First XML</h1>

<h2>Introduction to XML</h2><p>What is HTML?</p><p>What is XML?</p>

<h2>XML Syntax</h2><p>Elements must have a closing tag.</p><p>Elements must be properly nested.</p>…

An XML DocumentAn XML Document

Slide content courtesy of Shaoping Moss, 2004

An XML document describes the book:

…<book> <title>My First XML</title> <chapter>Introduction to XML <para>What is HTML?</para> <para>What is XML?</para> </chapter> <chapter>XML Syntax <para>Elements must have a closing tag.</para> <para>Elements must be properly nested.</para> </chapter></book>…

HTML Elements/TagsHTML Elements/Tags

Original slide content courtesy of Shaoping Moss.

An HTML document describes the book:

…<h1>My First XML</h1>

<h2>Introduction to XML</h2><p>What is HTML?</p><p>What is XML?</p>

<h2>XML Syntax</h2><p>Elements must have a closing tag.</p><p>Elements must be properly nested.</p>…

Are:

•defined by HTML standard

•always the same

•can be used in any order

XML Elements/TagsXML Elements/TagsAn XML document describes the book:

…<book> <title> My First XML</title> <chapter> Introduction to XML <para> What is HTML?</para> <para> What is XML? </para> </chapter> <chapter> XML Syntax <para> Elements must have a closing tag. </para> <para> Elements must be properly nested. </para> </chapter></book>…

Are:

•defined by user/groups (DTD/Schema)

•different for each DTD/Schema

•hierarchical (tree structure)

Original slide content courtesy of Shaoping Moss.

XML is flexible and extensible XML is flexible and extensible An XML document describes the book for a different user group:

…<manuscript> <name> My First XML </name> <part> Introduction to XML <section> What is HTML? </section> <section> What is XML? </section> </part> <part> XML Syntax

<section> Element Rules </section> <para> Elements must have a closing tag. </para> <para> Elements must be properly nested. </para> </part></manuscript>…

Instead of “book”

Extend to accommodate greater detail of “part” “section” AND “paragraph”

Original slide content courtesy of Shaoping Moss.

Slide content courtesy of Shaoping Moss.

Differences between HTML and XMLXML is not a replacement for HTML.

XML and HTML were designed with different goals.

- XML was designed to describe data and to focus on what data is.

- HTML was designed to display data and to focus on how data looks.

HTML structure and tags are very loose while XML structure and tags are strict:

- XML documents must be well-formed.

- XML elements must be properly nested.

- All XML elements must be closed.

- Tag names must be case consistent.

Differences HTML XML

Content

Format

Selection & Organization

- Held in generic containers (<h1>, <p>, etc.)

-In the default format of the content tag OR

-As defined by a Cascading Style Sheet (internal or external)

-All content always included (no option to easily select or suppress content – must manually change document)

-Content only displayed in the order written (to change order you must manually change document

-Held in specific containers that describe what the data is (<book>, <chapter>, etc.)

-XSLT files define the formats of each section (i.e. font, color, size, etc.)

-multiple XSLTs for same XML

-XSLT selects and determines order of display of content

-Multiple XSLTs for same XML (one to produce just book title list, one to display full text, one for citations, etc.)

Differences HTML XML

Analogy

What you can get

Address List in plain WORD document

One document of your list of contacts with all the information that you have for each person in the order you typed it.

Address List in database or MAIL MERGE data file

• Friends & Family with full addresses for Holiday cards• E-mail list of just Professional contacts for announcing new product• Special formatting of whole list for better display on PDA• Etc. etc. etc. all from SAME XML document

How to Build an XML file family

1. Establish the Document Type Definition (DTD) or Schema

2. Write a well-formed XML document that holds your data in the containers established by your DTD/Schema

3. Validate your XML document to make sure you conformed to your DTD/Schema

4. Build as many different XSL documents as you need to select data from your XML file, organize it the way you want it to appear, and format it so it looks the way you want.

Now you can link your XML file to whatever XSL you want

to get the kind of display you want at any given time.

The XML family unit of files and languages

XML

Where the data is held DTD or

Schema

The organizational chart for the

data

XSL

Instructions for using XML data

and displaying it

Uses XSLT to select data from .xml file

and format it

Uses XSL-PATH to access certain

spots in the .xml file

Uses XSL-FO for specifying formatting

semantics (?)File types: .dtd .xml (schemas)

File type: .xml File type: .xsl

For validation during creation

http://www.mysite.org/myfile.xml

WEB PAGE

Languages used in XSLT documents during creation

1. Calls the .xml file

2. Calls .xsl for display

instructions

3. Looks in .xml for content

4. Returns content to .xsl

5. Displays content to browser

Uses HTML for formatting

The DTD or Schema

<!ELEMENT booklist (book+)><!ELEMENT book (booktitle,author+,country,publisher,price,year)><!ELEMENT booktitle(#PCDATA)<!ELEMENT author(#PCDATA)><!ELEMENT country(#PCDATA)><!ELEMENT publisher(#PCDATA)><!ELEMENT price(#PCDATA)><!ELEMENT year(#PCDATA)>

+ means there can be as many of this element as you want

The DTD establishes the hierarchy of elements/tags.

Original file content courtesy of Shaoping Moss.

The XML document<?xml version="1.0" encoding="UTF-8" ?><!DOCTYPE list SYSTEM "dtdforbooklist.dtd"><?xml-stylesheet type="text/xsl" href="xslforbooklist.xsl"?><booklist>

<book><booktitle>HTML and XHTML:the Definitive Guide</booktitle><author>Chuck Musciano</author><author>Bill Kennedy</author><country>USA</country><publisher>O’ Reilly</publisher><price>19.95</price><year>2000</year>

</book><book>

<booktitle>XHTML 1.0 Language Sourcebook</booktitle><author>Ian S. Graham</author><country>USA</country><publisher>John Wiley and Sons</publisher><price>30.00</price><year>2000</year>

</book></booklist>

This is what DTD is being used.

This is what XSL is being used.

Original file content courtesy of Shaoping Moss.

The XSL document<?xml version="1.0" encoding="UTF-8"?><xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"><xsl:template match="/"><html><body><h1>My Book Collection</h1><table border="1"> <tr bgcolor="#9acd32"> <th>Title</th> <th>Author</th> <th>Publisher</th> <th>Country</th> <th>Price</th> </tr><xsl:for-each select="booklist/book"><xsl:sort select="publisher"/><xsl:if test="year>1995"> <tr> <td><xsl:value-of select="booktitle"/></td> <td><xsl:value-of select="author"/></td> <td><xsl:value-of select="publisher"/></td> <td><xsl:value-of select="country"/></td> <td><xsl:value-of select="price"/></td> </tr></xsl:if></xsl:for-each></table></body></html></xsl:template></xsl:stylesheet>

“xsl:template” is XSLT for “use the template below”

“xsl:for-each” with the “select” instruction is XSLT for “select from each of the books in the booklist”

“match” is X-PATH for “link to” or “start with” and “/” means the root element (“booklist” in this case)

“xsl:sort” with the “select” instruction is XSLT for “sort by publisher”

“xsl:if” with the “test” instruction is XSLT for “only those books when the year is later than 1995”

This is basic HTML for the template…

“xsl:value-of” with the “select” instruction is XSLT for “use the data from this element”

You must close your XSLT commands

You must close the HTML tags of your template

The Web Page

Original file content courtesy of Shaoping Moss.

Done! – not so hardDone! – not so hard

•Logical

•Flexible

•Extensible

•Interoperable!!

XML in LibrariesXML in Libraries

Use XML to mapping MARC to MARC XML, HTML, or Use XML to mapping MARC to MARC XML, HTML, or MODS formatsMODS formats

MARC XML Conversion StylesheetsMARC XML Conversion StylesheetsUse XML to improve searching of archival finding aids Use XML to improve searching of archival finding aids and to catalog Web sites- Five College Archives & and to catalog Web sites- Five College Archives & Manuscript Collections.Manuscript Collections.http://asteria.fivecolleges.edu/index.htmlhttp://asteria.fivecolleges.edu/index.htmlXML-based eScholarship.XML-based eScholarship.http://escholarship.cdlib.org/http://escholarship.cdlib.org/Use XML for interlibrary loan.Use XML for interlibrary loan.XML-based database systems.XML-based database systems.

Slide content courtesy of Shaoping Moss.Slide content courtesy of Shaoping Moss.

XML in AcademicsXML in Academics

Text Encoding Initiative(TEI)Text Encoding Initiative(TEI)http://www.tei-c.org/http://www.tei-c.org/

Initially launched in 1987, TEI is an internationally and Initially launched in 1987, TEI is an internationally and interdisciplinary standard for encoding, keeping and analyzing interdisciplinary standard for encoding, keeping and analyzing textual content & structure of digital texts.textual content & structure of digital texts.

This standard is designed for use with a broad range of text types, This standard is designed for use with a broad range of text types, especially in the humanities. It is widely used in libraries, archives, especially in the humanities. It is widely used in libraries, archives, and by publishers and researchers for online research and teaching and by publishers and researchers for online research and teaching and for the storage and exchange of large and small text collections.and for the storage and exchange of large and small text collections.

Since 1987, TEI projects have mushroomed in all humanities Since 1987, TEI projects have mushroomed in all humanities disciplines, including language, literature, history, classics, social disciplines, including language, literature, history, classics, social science and computer science.science and computer science.

Slide content courtesy of Shaoping Moss.Slide content courtesy of Shaoping Moss.

TEI projectsTEI projects

Women Writers Project.Women Writers Project. http://http://www.wwp.brown.eduwww.wwp.brown.edu

Perseus Digital Library.Perseus Digital Library. http://www.perseus.tufts.edu/http://www.perseus.tufts.edu/

Early American Fiction Collection.Early American Fiction Collection.http://etext.lib.virginia.edu/eaf/pubindex.htmlhttp://etext.lib.virginia.edu/eaf/pubindex.html

American Memory Project- Historical Collections for the American Memory Project- Historical Collections for the National Digital Library. National Digital Library. http://lcweb2.loc.gov/ammem/ammemhome.htmlhttp://lcweb2.loc.gov/ammem/ammemhome.htmlThe Newton Papers Project. The Newton Papers Project.

http://www.newtonproject.ic.ac.ukhttp://www.newtonproject.ic.ac.uk

Slide content courtesy of Shaoping Moss.Slide content courtesy of Shaoping Moss.

XML is Going to Be EverywhereXML is Going to Be Everywhere

TEI guidelines for the Text Coding InitiativeTEI guidelines for the Text Coding Initiative

http://www.tei-c.org/Guidelines2/index.htmlhttp://www.tei-c.org/Guidelines2/index.html

EAD for Encoded Archival DescriptionsEAD for Encoded Archival Descriptions

http://www.loc.gov/ead/http://www.loc.gov/ead/

The Dublin Core Metadata Initiative (DCMI)The Dublin Core Metadata Initiative (DCMI)

http://dublincore.org/http://dublincore.org/

MARC XML-MARC 21 XML SchemaMARC XML-MARC 21 XML Schema

http://www.loc.gov/standards/marcxml/http://www.loc.gov/standards/marcxml/

MODS XML- Metadata Object Description SchemaMODS XML- Metadata Object Description Schema

http://www.loc.gov/standards/modshttp://www.loc.gov/standards/mods

Slide content courtesy of Shaoping Moss.Slide content courtesy of Shaoping Moss.

XML is Going to Be EverywhereXML is Going to Be Everywhere

Resource Description Framework (RDF)Resource Description Framework (RDF)Information and Content Exchange (ICE)Information and Content Exchange (ICE)Online Information Exchange (ONIX)Online Information Exchange (ONIX)Metadata for Images in XML (MIX)Metadata for Images in XML (MIX)XML/EDI (Electronic Data Interchange)XML/EDI (Electronic Data Interchange)Bioinformatic Sequence Markup Language Bioinformatic Sequence Markup Language (BSML)(BSML)Mathematical Markup Language (MathML)Mathematical Markup Language (MathML)

Slide content courtesy of Shaoping Moss.Slide content courtesy of Shaoping Moss.

XML in Future Web DevelopmentXML in Future Web Development

XML is a cross-platform, software and hardware XML is a cross-platform, software and hardware independent tool for transmitting information.independent tool for transmitting information.XML will be as important to the future of the Web XML will be as important to the future of the Web as HTML has been to the foundation of the Web.as HTML has been to the foundation of the Web.XML will become the most common tool for all XML will become the most common tool for all data manipulation and data transmission.data manipulation and data transmission.Every serious Web technology is now expected Every serious Web technology is now expected to define its relationship to XML.to define its relationship to XML.

Slide content courtesy of Shaoping Moss. Slide content courtesy of Shaoping Moss.

XML in Future Web DevelopmentXML in Future Web Development

““Every serious Web technology is now Every serious Web technology is now expected to define its relationship to XML.”expected to define its relationship to XML.”

- Catherine Ebenezer in - Catherine Ebenezer in Trends in Trends in Integrated Library SystemsIntegrated Library Systems..

Slide content courtesy of Shaoping Moss. Slide content courtesy of Shaoping Moss.

Shaoping MossShaoping Moss

Information Technology Consultant

Research and Instructional Support

Mount Holyoke College

Email: [email protected]

Phone: 413.538.3034

Fax: 413.538.3112

We are grateful to Shaoping Moss for being such an excellent instructor and giving us permission to use her slides and materials in this presentation.

So this XML stuff is rad and all but could I So this XML stuff is rad and all but could I see why I’d want to learn it and not just an see why I’d want to learn it and not just an

encoding set like EAD?encoding set like EAD?

Well, suppose you’ve got a batch of metadata on your hands. Not just any metadata, but some weird set of information that can’t really be shoehorned into your pal MARC 21. You need some way of organizing the metadata. It would be nice if you could make the metadata look all pretty and whatnot, while you’re at it.

Here’s where XML comes in!1. Get your metadata together,

having done all the sexy stuff like data dictionary creation first

2. Define labels for everything

3. Match related terms, including subordinates

4. Define your rules (Y can only appear after X, and if you have X and Y, you must have Z, but Q is optional, etc)

5. You’ve pretty much just made up a schema right there

6. Wait, what was that about making it pretty?

Oh, right, it should be attractive. Well, then you just start playing with XSL.

<LINK REL="STYLESHEET" TYPE="text/css" HREF="./games.css" TITLE="MASTER"/>

Specifically, you tell the XSL to go look at the plain ol’ stylesheet you’ve adapted from a thousand other HTML pages.

So then you’ve got this.

Hey, wait. I thought you said this was all cross-platform and cross-browser. How come this isn’t parsing in my browser? And how do I search individual records? You mean I have to hand encode every record?

Well, yes. You can write your own parser, export encoded records from a database, or create a search engine if you like. You’ll just need more than a semester’s worth of practice to do it.


Recommended