+ All Categories
Home > Documents > UFCEKG-20-2 Data, Schemas & Applications Lecture 3 Data Representation, XML & RSS.

UFCEKG-20-2 Data, Schemas & Applications Lecture 3 Data Representation, XML & RSS.

Date post: 23-Dec-2015
Category:
Upload: pauline-cameron
View: 235 times
Download: 0 times
Share this document with a friend
Popular Tags:
24
UFCEKG-20-2 Data, Schemas & Applications Lecture 3 Data Representation, XML & RSS
Transcript

UFCEKG-20-2 Data, Schemas & Applications

Lecture 3Data Representation, XML & RSS

Last week:

o introduction to the webo uri schemas & encodingo http protocolo media typeso request / response cycleo get, post, put and deleteo introduction to mashupso simple mashup example with forms

WWW : definition

The World Wide Web (abbreviated as WWW or W3, commonly known as the Web), is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks.

Wikipedia : World Wide Web

Concept originally proposed by Sir Tim Berners-Lee (1989) based on earlier hypertext systems. Berners-Lee and Belgian computer scientist Robert Cailliau proposed in 1990 to use hypertext "to link and access information of various kinds as a web of nodes in which the user can browse at will", and they publicly introduced the project in December of the same year.

Problem : How to encode data for communication

Bank of America Market Data Mirrors

Competing constraintso Data must be serialised into a character stream o Communicate the meaning of the data as well as the

datao Error-freeo Minimimal sizeo Handle Multi-Lingual text

Solutions

o Card file based o csvo xls - Excel file formato XMLo SQL export o JSON - JavaScript Object Notation

The Medabar in Asmara, Eritrea Google Map

Card-based

Exampleso ATCO-CIF for timetables o IGES for Computer-Aided Design

Characteristicso Based on old 80-column punched cardso Muliple record typeso Fixed field widthso No formal language to define the format

Exampleso Alveston (Bristol) weather datao World Health Organization(WHO) - generated estimates of TB mortality, prevalenc

e, incidence (including incidence of HIV+TB) and case detection rate. o 1000 Songs - Google Spreadsheet

Characteristicso Data values separated by a common separator character - space, comma or tabo Column position is significanto Lines separated by newlines - coding depends on OS - linefeed (x0A) Unix or

carriage-return (x0D), line feed - Windows, carriage-return on old Macs o Separator must not occur in data values, or some other convention needed -

Quotes around value, an escape charactero Column headings may be the first lineo Only tables - all lines the sameo All columns required - problem for space-separated data

CSV

Data with optional data and repeated data need more complex structures. Many have been developed for specific domains

o MARC library catalogue recordso EDIFACT for commercial Electronic Data interchange (EDI)o EDIF LISP -based nested datao EXIF data embedded in a JPEG image

Tagged record structures

XML

A generic data format based on tagged elements in a tree structure.

Developed from GML, via SGML.

GML,a document markup language developed by Charles Goldfarb at IBM in 1969.

Exampleso Alveston WDL config fileo UWE news RSS feed

Tree with Buddhist prayer flags

XML domain vocabulariesXML defines only the rules for a well-formed document. The allowable tags, their structuring and order in a document, range of allowable values and the meaning of those tags depends on the XML application - called a vocabulary.There are now hundreds of XML vocabularies designed for every sort of datao XHTML - the version of HTML which conforms to XMLo SVG - graphicso TransExchange for timetableso RSS and Atom for news

XML domain vocabularies

There are also vocabularies for languages for processing XML

o XSLT - for transforming XML documentso XSL-FO - for transforming to PDF documentso XML Schema - for defining XML vocabularieso XProc - for defining XML Pipelines

XML processing vocabularies

I want to disseminate news about my project/company, and allow interested people to read it. e.g. the university wants to spread the news about successful staff

Solution 1 : HTML pagePublish a page of news on the website in HTML

Problemso how do visitors know when its changed?o news from different universities cannot be easily

combined – (why?)

Problem: News dissemination

Encourage interested users to subscribe to your company newsletter.

Problemso Subscription is a barriero Clutters up email boxeso can look like spamo List management and emailing overhead

Solution : email

UWE makes up its own set of additional tags

Solution : Create XML document for news

<newsItem date=‘2007-10-2’> <newsTitle>UWE best in West</newsTitle> <newsBody>UWE wins tiddlewinks again</newsBody> <contact>[email protected]</Contact></newsItem>

Problemso someone has to design this languageo has to be translated to HTML to displayo s reader has to understand multiple new tags from

different sourceso needs to be distinguished from standard HTML

ProblemHow to distinguish in a document XML tags from different vocabularies ?

Solutiono define a (global) unique URI for the vocabularyo use an arbitrary prefix - news: for all tags in the same

vocubulary - unique within a document o link the prefix to the vocabulary in the document

Aside: Namespaces

<h1>UWE news</h1><p> <news:item xmlns="http://www.uwe.ac.uk/news" date="2007-10-2“> <news:Title>UWE best in West</news:Title> <news:Body>UWE wins tiddlewinks again</news:Body> <news:Contact>[email protected]</news:Contact> </news:item></p>

o Standardize on one (or several !) standard tagso Tags are machine-readable to identify news items in a list

of web siteso RSS 2.0

o Really Simple Syndicationo Rich Site Summary

o Atom - a more recent format o Differences - dates (RFC 822 v RFC 3339 timestamps),

multi-lingual content

Characteristicso Structure: rss / channel / item Treeo Items in reverse chronological ordero Few mandatory tagso Namespaces allow additional vocabularies to be added

Solution : RSS

Example RSS - UWE news<?xml version="1.0" encoding="iso-8859-1"?><rss version="2.0"><channel> <title>UWE News</title><link>http://www.uwe.ac.uk</link><description>Latest UWE press releases</description><image> <url>http://info.uwe.ac.uk/common/assets/2004Design/logoNoBorder.gif</url> <title>University of the West of England</title> <link>http://www.uwe.ac.uk</link></image><pubDate>Fri, 13 Oct 2008 15:15:44 GMT</pubDate><item> <title>New research looks to transport users for solutions</title> <link>http://info.uwe.ac.uk/news/uwenews/article.asp?item=1363</link> <description>'Ideas in Transit' is a new initiative which will look to transport users' experiences and creativity as a source of innovation to tackle the UK's transport problems.... </description></item>

Example RSS - BBC Finance News<?xml version="1.0" encoding="ISO-8859-1" ?><?xml-stylesheet title="XSL_formatting" type="text/xsl“ href="/shared/bsp/xsl/rss/nolsol.xsl"?> <rss version="2.0" xmlns:media="http://search.yahoo.com/mrss"><channel> <title>BBC News | Business | UK Edition</title> <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/default.stm</link> <description>Visit BBC News for up-to-the-minute news, breaking news, video, audio and feature stories. BBC News provides trusted World and UK news as well as local and regional perspectives. Also entertainment, business, science, technology and health news. </description> <language>en-gb</language> <lastBuildDate>Mon, 13 Oct 2008 14:28:30 GMT</lastBuildDate> <copyright>Copyright: (C) British Broadcasting Corporation, see http://news.bbc.co.uk/1/hi/help/rss/4498287.stm for terms and conditions of reuse </copyright> <docs>http://www.bbc.co.uk/syndication/</docs> <ttl>15</ttl> <image> <title>BBC News</title> <url>http://news.bbc.co.uk/nol/shared/img/bbc_news_120x60.gif</url> <link>http://news.bbc.co.uk/go/rss/-/1/hi/business/default.stm</link> </image> <item> <title>UK banks receive &#163;37bn bail-out</title> <description>The UK government says it is to inject a total of up to &#163;37bn into Royal ….. </item>

ProblemHow to keep track of multiple feeds

RSS aggregation

Solutionhttp://www.youtube.com/watch?v=0klgLsSxGsU&feature=player_embedded#t=0s

o Application needed which is stateful – remembers what items you have read

o Integrates multiple feeds into one ‘magazine’o Polls RSS providers on a regular basis

Feed integrators Bloglines, Google Reader, reduce the load on the provider and provide some filtering There is an RSS reader integrated into MyUWE

RSS Aggregation with Bloglines

o strings enclosed in tags which provide a humanly readable name for the element - so-called self-describing

o elements may be nested to create hierarchical data structures

o element tags may be repeated o element names can be relative to their parent o element structure can be formally defined

XML Characteristics

o Element names provide a clue about the meaning of the data, but not enough

o names are ambiguouso names may be misleadingo what units?o what accuracy?o what origin? - leads to need for meta-data

o who createdo wheno what license to useo why

Aside: Self -describing

XML documents are tree-structures, with each node bounded by an open and a closing tago Element: the opening tag, attributes, the body of the

element and the closing tag. Elements are not elemental!o tag name: the name in angle brackets - must conform to

rules, may have a prefixo Attribute: a name="value" pair attached to an element.

Names follow the same rules as tag names. o Parent: all elments except the root have one parento Child: an element nested in another parent elemento Root: every document has a single root element with no

parento Mixed Content: an element may contain a mixure of text

and other elements

XML terminology

o A single root elemento Tags must be properly nestedo An element must be closed:

o Open and closing tag <p>... </p> o Empty element <br /> or <hr size="3"/>

Other formatting rules o XML names are case sensitive, no spaces, restricted character seto Attribute values must be single or double-quotedo Special characters coded as references &#10 (a line feed) &gt; > o Some characters have special meaning e.g. < is the start of a tag-

within XML data, & is the first character of an entity reference. In XML data these have to be encoded as &lt; and &amp; or enclosed in <[CDATA[ ....]]>

o Preferably use standard formats for representing values e.g. 2008-10-14 for a date

Basic XML rules


Recommended