Culture Geeks Feb talk: Adventures in Linked Data Land

transcript

Adventures in Linked Data Land

Richard Light Consultancy

Culture Geeks, 25 February 2009

Discovering Linked Data

Four principles of Linked Data (Tim B-L):

● Use URIs to identify resources

● Use HTTP URIs so that people can look them up

● Provide useful information about the resource

● Include links to other URIs in your data

Discovering dbPedia

● Extraction of Linked Data from Wikipedia● Statements in info boxes (mainly) become RDF

triples:

<rdf:Description rdf:about="http://dbpedia.org/resource/Berlin_Marathon">

<dbpprop:location rdf:resource="http://dbpedia.org/resource/Berlin"/>

</rdf:Description>

Note the URLs

Browsing Linked Data

● View RDF as a web page:http://dbpedia.org/page/Berlin

● Navigate from one data source to another

● Specialist Linked Data browsers/plugins:– DISCO– Marbles– Openlink Data Explorer– Tabulator

dbPedia page for Berlin

OpenLink Data Explorer: What

OpenLink Data Explorer: Where

Querying Linked Data

● SPARQL query language: http://www.w3.org/TR/2008/REC-rdf-sparql-query-

20080115/

● And SPARQL XML results format:http://www.w3.org/TR/rdf-sparql-XMLres/

● “SPARQL end-points”:http://dbpedia.org/sparql http://dbtune.org/bbc/peel/sparql http://data.linkedmdb.org/sparql

dbPedia SPARQL endpoint page

Asking interesting questions

● German musicians born in Berlin:●

So what do we have here?

● An initiative to generate lots of Linked Data

● A Linked Data Cloud, containing a growing number of RDF datasets

● A hard-to-use query language capable of very precise and powerful querying

Where do museums come into this picture?

The Wordsworth Trust

● Typical museum collection: about 60,000 objects

● Major collection of manuscripts (notebooks, letters, etc.)

● Objects published to the Web from a ModesXML database

● Unwise enough to allow me Remote Desktop access ...

Typical collections object

GRMDC.C104.2

Same object represented as RDF

Same object represented as XTM

One identifier; three “views”

● This object has a single persistent identifier:http://collections.wordsworth.org.uk/object/GRMDC.C104.2

● This maps to different views depending on the “Accept” header in the HTTP request:

– application/rdf+xml >> RDF– application/xtm+xml >> XTM Topic Map– Otherwise >> HTML (human-readable)

● Achieved through a custom 404 “page not found” handler

“Page not found” handler (1)

● All URLs are fictitious, so they generate a 404

● Modified a generic smart 404 handler from:http://evolvedcode.net/content/code_smart404/

● Added support for “303 See other” redirects

● added wild card matching to re-format URLs

● Generic URL, plus requested Accept format, determine initial “303 See other” mapping, e.g.:http://collections.wordsworth.org.uk/object/GRMDC.C104.2 +Accept: application/rdf+xml=http://collections.wordsworth.org.uk/object/rdf/GRMDC.C104.2

● When this is passed back in, the 404 handler has to generate the required RDF directly

● Can't just keep redirecting requests!

● Redirect rules declare mappings:

● Generic URL plus a supported Accept type generates a “303 See other” redirect

● If it comes back as a page request, it is further redirected with a “301 Moved permanently” to the object's web page

● If it comes back as an RDF or XTM request, the record is fetched as XML and subjected to an XSLT transform by the handler

What has been learnt?● The Linked Data paradigm encourages simple

RDF triples: no “blank nodes”

● For an object, this becomes a simple metadata set, very analogous to the PNDS DCAP format

● The properties involved need to encapsulate the whole relation between object and data, e.g.<p:title>Ulswater from Pooley Bridge</p:title><p:technique>drawn</p:technique><p:maker>Farington, Joseph (1747-1821)</p:maker><p:technique>engraved</p:technique><p:maker>Middiman, Samuel (1750-1831)</p:maker>

Properties: which framework?

● I have used dbPedia properties (for Linked Data compatibility):http://dbpedia.org/property/title http://dbpedia.org/property/maker

● A viable alternative would be PNDS DCAP:http://purl.org/dc/elements/1.1/title http://purl.org/dc/elements/1.1/creator

● One framework which doesn't fit is the CIDOC CRM:E21 Physical Thing – E12 Production – E39 Actor = “creator”

The problem of URIs

● Good Linked Data requires URIs everywhere

● Most of my museum RDF resolves to strings

● One exception is Geonames lookup:Ullswater

becomeshttp://www.geonames.org/2635191/

● In the absence of a central “people” registry, should be minting URIs for people myself

Does it work? - yes, sort of

Data Explorer place view

Implementation details

● HTML needed a “back link” to RDF to keep OpenLink Explorer happy:<link rel="alternate" type="application/rdf+xml"

href="http://collections.wordsworth.org.uk/object/data/GRMDC.C104.2" title="RDF" />

● Result is totally unfindable: need a search or harvesting mechanism:– OAI support (possible)– SPARQL end-point (harder)

Conclusions

● Implementing an RDF Linked Data front-end to a museum database is feasible if:– You can generate multiple outputs from your database

(XML is sufficient)– You can implement a suitable URL rewriter or 404

handler

● It's easy (and a good idea) to mint and publish URIs for your collection objects

● It's less clear where all the other URIs we'll need will come from

LD: foothills of the Semantic Web

● Linked Data is a very modest start

● It's not obvious how this will scale

● Full Semantic Web will involve machine-driven processes

● Judging by where we are today, that will be a while coming ...

Ask Multimap where Lancaster is

Get a Netbook delivered ...

Culture Geeks Feb talk: Adventures in Linked Data Land

Technology