Date post: | 01-Nov-2014 |
Category: |
Technology |
Upload: | valcartei |
View: | 1,914 times |
Download: | 1 times |
Adventures in Linked Data Land
Richard Light Consultancy
Culture Geeks, 25 February 2009
Discovering Linked Data
Four principles of Linked Data (Tim B-L):
● Use URIs to identify resources
● Use HTTP URIs so that people can look them up
● Provide useful information about the resource
● Include links to other URIs in your data
Discovering dbPedia
● Extraction of Linked Data from Wikipedia● Statements in info boxes (mainly) become RDF
triples:
<rdf:Description rdf:about="http://dbpedia.org/resource/Berlin_Marathon">
<dbpprop:location rdf:resource="http://dbpedia.org/resource/Berlin"/>
</rdf:Description>
Note the URLs
Browsing Linked Data
● View RDF as a web page:http://dbpedia.org/page/Berlin
● Navigate from one data source to another
● Specialist Linked Data browsers/plugins:– DISCO– Marbles– Openlink Data Explorer– Tabulator
dbPedia page for Berlin
OpenLink Data Explorer: What
OpenLink Data Explorer: Where
Querying Linked Data
● SPARQL query language: http://www.w3.org/TR/2008/REC-rdf-sparql-query-
20080115/
● And SPARQL XML results format:http://www.w3.org/TR/rdf-sparql-XMLres/
● “SPARQL end-points”:http://dbpedia.org/sparql http://dbtune.org/bbc/peel/sparql http://data.linkedmdb.org/sparql
dbPedia SPARQL endpoint page
Asking interesting questions
● German musicians born in Berlin:●
So what do we have here?
● An initiative to generate lots of Linked Data
● A Linked Data Cloud, containing a growing number of RDF datasets
● A hard-to-use query language capable of very precise and powerful querying
Where do museums come into this picture?
The Wordsworth Trust
● Typical museum collection: about 60,000 objects
● Major collection of manuscripts (notebooks, letters, etc.)
● Objects published to the Web from a ModesXML database
● Unwise enough to allow me Remote Desktop access ...
Typical collections object
GRMDC.C104.2
Same object represented as RDF
Same object represented as XTM
One identifier; three “views”
● This object has a single persistent identifier:http://collections.wordsworth.org.uk/object/GRMDC.C104.2
● This maps to different views depending on the “Accept” header in the HTTP request:
– application/rdf+xml >> RDF– application/xtm+xml >> XTM Topic Map– Otherwise >> HTML (human-readable)
● Achieved through a custom 404 “page not found” handler
“Page not found” handler (1)
● All URLs are fictitious, so they generate a 404
● Modified a generic smart 404 handler from:http://evolvedcode.net/content/code_smart404/
● Added support for “303 See other” redirects
● added wild card matching to re-format URLs
“Page not found” handler (2)
● Generic URL, plus requested Accept format, determine initial “303 See other” mapping, e.g.:http://collections.wordsworth.org.uk/object/GRMDC.C104.2 +Accept: application/rdf+xml=http://collections.wordsworth.org.uk/object/rdf/GRMDC.C104.2
● When this is passed back in, the 404 handler has to generate the required RDF directly
● Can't just keep redirecting requests!
“Page not found” handler (3)
● Redirect rules declare mappings:
“Page not found” handler (4)
● Generic URL plus a supported Accept type generates a “303 See other” redirect
● If it comes back as a page request, it is further redirected with a “301 Moved permanently” to the object's web page
● If it comes back as an RDF or XTM request, the record is fetched as XML and subjected to an XSLT transform by the handler
What has been learnt?● The Linked Data paradigm encourages simple
RDF triples: no “blank nodes”
● For an object, this becomes a simple metadata set, very analogous to the PNDS DCAP format
● The properties involved need to encapsulate the whole relation between object and data, e.g.<p:title>Ulswater from Pooley Bridge</p:title><p:technique>drawn</p:technique><p:maker>Farington, Joseph (1747-1821)</p:maker><p:technique>engraved</p:technique><p:maker>Middiman, Samuel (1750-1831)</p:maker>
Properties: which framework?
● I have used dbPedia properties (for Linked Data compatibility):http://dbpedia.org/property/title http://dbpedia.org/property/maker
● A viable alternative would be PNDS DCAP:http://purl.org/dc/elements/1.1/title http://purl.org/dc/elements/1.1/creator
● One framework which doesn't fit is the CIDOC CRM:E21 Physical Thing – E12 Production – E39 Actor = “creator”
The problem of URIs
● Good Linked Data requires URIs everywhere
● Most of my museum RDF resolves to strings
● One exception is Geonames lookup:Ullswater
becomeshttp://www.geonames.org/2635191/
● In the absence of a central “people” registry, should be minting URIs for people myself
Does it work? - yes, sort of
Data Explorer place view
Implementation details
● HTML needed a “back link” to RDF to keep OpenLink Explorer happy:<link rel="alternate" type="application/rdf+xml"
href="http://collections.wordsworth.org.uk/object/data/GRMDC.C104.2" title="RDF" />
● Result is totally unfindable: need a search or harvesting mechanism:– OAI support (possible)– SPARQL end-point (harder)
Conclusions
● Implementing an RDF Linked Data front-end to a museum database is feasible if:– You can generate multiple outputs from your database
(XML is sufficient)– You can implement a suitable URL rewriter or 404
handler
● It's easy (and a good idea) to mint and publish URIs for your collection objects
● It's less clear where all the other URIs we'll need will come from
LD: foothills of the Semantic Web
● Linked Data is a very modest start
● It's not obvious how this will scale
● Full Semantic Web will involve machine-driven processes
● Judging by where we are today, that will be a while coming ...
Ask Multimap where Lancaster is
Get a Netbook delivered ...