Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | margery-french |
View: | 214 times |
Download: | 0 times |
Publishing the British National Bibliography
as Linked Open Data
Corine DeliotMetadata Standards Analyst
British Library
CIG EventBirmingham, 25 November 2013
© The British Library Board 2013
www.bl.uk 2
Overview• Motivations and approach
• The modelling process and the data model
• Technical process: from MARC 21 to RDF
• Linking to external datasets
• Outcomes – datasets/platform/access
• Plans for future developments
• Use of the BNB data
• Benefits
• Challenges
www.bl.uk 3
Motivations
• Publishing our data for others to re-use
• Looking beyond library audiences
• Taking part in the Linked Data conversation
www.bl.uk 4
How?
• Pragmatic, bottom-up approach
• Using existing staff
• Building on existing skills
• Using existing tools as much as possible
www.bl.uk 5
Why BNB?
• General bibliography - not a unique institutional catalogue
• Consistent format - over 60 years
• Size & range of content - 3 million records on all subjects in many languages
• Control of metadata – publishable as CC0.
© Waldir/ Wikimedia Commons/ CC BY-SA-3.0Usage terms: http://creativecommons.org/licenses/by-sa/3.0/
www.bl.uk 6
The modelling process (I)
• identify our objects of interest, i.e. what does the MARC record says about “things in the world”
e.g. Bibliographic resources, people, organizations, places, subjects, etc.
• Assign URIs to identify these objects of interests
www.bl.uk 7
URIs: Things to think about
• Create our own URIs or use existing ones? e.g. http://viaf.org/viaf/96994048
http://id.loc.gov/authorities/names/n78095332
• Create opaque or transparent URIs?• e.g. http://viaf.org/viaf/96994048 or
http://dbpedia.org/resource/William_Shakespeare
• What pattern? URI pattern guidance from the UK Cabinet Office
“Designing URI Sets for the UK Public Sector”
• Create valid, i.e. syntax conformant URIs
www.bl.uk 8
URI patterns
• http://bnb.data.bl.uk/id/resource/{control-number}
• http://bnb.data.bl.uk/id/resource/{BNB-number}
• http://bnb.data.bl.uk/id/person/{person-name}
• http://bnb.data.bl.uk/id/organization/{organization-name}
• http://bnb.data.bl.uk/id/concept/lcsh/{topic}
• http://bnb.data.bl.uk/id/concept/ddc/{edition-number}/{dewey-number}
www.bl.uk 9
URI patterns
• http://bnb.data.bl.uk/id/resource/008043929
• http://bnb.data.bl.uk/doc/resource/008043929
• http://bnb.data.bl.uk/doc/resource/008043929.rdf
• http://bnb.data.bl.uk/doc/resource/008043929.ttl
• http://bnb.data.bl.uk/doc/resource/008043929.json
• http://bnb.data.bl.uk/doc/resource/008043929.html
www.bl.uk 10
The modelling process (II)
• Describe these objects of interest, i.e. use classes
• and how they relate to each other, i.e. use properties
Use classes and properties from existing RDF vocabularies
Define our own classes and properties when required; documented in the British Library Terms RDF schema
www.bl.uk 11
RDF Vocabularies
• Bibliographic Ontology
• Bio: a Vocabulary for Biographical Information
• British Library Terms
• Dublin Core
• Event Ontology
• FOAF: Friend of a Friend
• ISBD
• Org: an Organisation Ontology
• OWL
• RDA
• RDF
• RDF Schema
• SKOS
• WGS84 Geo Positioning
www.bl.uk 12
RDF Vocabularies
• Bibliographic Resource Dublin Core Bibliographic Ontology ISBD British Library Terms
• Event Event Ontology British Library Terms
• Person/Organization FOAF: Friend of a Friend Bio: a Vocabulary for
Biographical Information Org: an Organisation
Ontology RDA
• Place WGS84 Geo Positioning
• Concept SKOS British Library Terms
• RDF• RDF Schema• OWL
www.bl.uk 13
The British Library Terms RDF Schema
@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .
• Existing property not quite right (e.g. not granular enough)
e.g. dcterms:identifier vs blt:bnb
www.bl.uk 14
The British Library Terms RDF Schema
@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .
Property or class required by specific feature of the model
e.g. blt:publication and blt:PublicationEvent (rdfs:subclass of event:Event)
www.bl.uk 15
The British Library Terms RDF Schema
@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .
For pragmatic reasons, e.g. facilitate searching and navigating through the graph
e.g. blt:TopicLCSH and blt:TopicDDC
e.g. blt:hasCreated owl:inverseOf dcterms:creator
www.bl.uk 18
Data Model Features (II): Publication as an event@prefix dc:<http://purl.org/dc/elements/1.1/> .
@prefix dcterms:<http://purl.org/dc/terms> .
<BibResource> dc:publisher “Publisher” ;
dcterms:issued “Date” ;
?:placeOfPublication “Place” .
@prefix blt:<http://www.bl.uk/schemas/bibliographic/blterms#> .
@prefix event:<http://purl.org/NET/c4dm/event.owl#> .
<BibResource> blt:publication <PublicationEvent> . <PublicationEvent> event:place <Place> ;
event:agent <Publisher> ; event:time <Year> .
Usual approach
Event-based approach
www.bl.uk 19
Data model features (III)
• Birth and death are modelled as biographical events
• extensive use of foaf:focus to relate “things in the world” (e.g. people, organizations, places) to their SKOS concepts.
e.g. “London”, the capital of England and the UK as a single “thing in the world” may be the “focus” of multiple concepts belonging to different concept schemes, e.g. thesauri (LCSH, Rameau, etc.)
<Thing-as-Concept> foaf:focus <Thing in the World> .
http://efoundations.typepad.com/efoundations/2011/09/things-their-conceptualisations-skos-foaffocus-modelling-choices.html by Pete Johnston
www.bl.uk 20
MARC to RDF Conversion Workflow
Full BNB MARC21
File
Transform to RDF/XML using
XSLT
Load to Linked Data Platform
Generate RDF Triple Dump
BNB RDF/XML file
Select records
Convert to pre-composed UTF-8
Normalise for improved
matching & transforms
Create BL URIs and add external
URIs by matching
MARCPre-Processing
Load to BL Downloads page
Process• Selection• Character set conversion• Pre-processing• URI generation• Data transformation• Create & load triples• Produce VoiD descriptions
Tools• Catalogue Bridge Utilities • MARC Global/MARC Report http://www.marcofquality.com/• Jena Eyeball http://jena.sourceforge.net/Eyeball/
www.bl.uk 21
Linking to external sources (I)
To give our data broader context we linked to:
• General resources:• GeoNames• Lexvo• RDF Book
Mashup
• Library resources:• LCSH• VIAF• Dewey.info• MARC language
and country codes
www.bl.uk 22
Linking to external sources (II)
Techniques included:
• Automatic generation from
record data
• Auto text match with linked data dumps
• Crosswalk matching for coded data
© Silverspoon/ Wikimedia Commons/ CC BY-SA-3.0Usage terms: http://creativecommons.org/licenses/by-sa/3.0/
www.bl.uk 23
Outcomes
• Two datasets – Books and Serials - and their VoID descriptions, accessible at:
• BNB Linked data platform: http://bnb.data.bl.uk
• SPARQL endpoint: http://bnb.data.bl.uk/sparql
• SPARQL editor: http://bnb.data.bl.uk/flint
• Bulk downloads: http://www.bl.uk/bibliographic/download.html
Updated monthly Serializations available:
RDF/XML, N-Triples
“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”Usage terms: http://creativecommons.org/licenses/by-sa/3.0/
www.bl.uk 28
Platform change
• 2011 - initial Talis platform
• 2013 – data migration to TSO platformhttp://www.tso.co.uk/our-expertise/technology/openup-platform
Tendering process Migration of data and services over a couple of months
www.bl.uk 29
Plans for Future Developments
• Refine and extend the model
• Investigate frbr-ization
• Link to other external sources• Geonames at city level
• ISNI, LC/NACO, DBpedia
• DNB bibliographic resources
• Expand scope beyond current BNB
• Improve developer support
www.bl.uk 30
Use of the BNB data
• Statistics e.g. Number of hits on the SPARQL endpoint e.g. Number of downloads on the BL webpage
• BNB data used in pilot projects e.g. Linked Open BNB data used as test data for a semantic
search demonstrator.
• Anecdotal evidence
• Use is difficult to assess; part and parcel of the data being open and available for all to use.
www.bl.uk 31
Benefits of Linked Open Data
• We have learnt a lot about the practical aspects of working with linked data.
• The data model got some attention. Re-used by Danish Bibliographic Centre (DBC) Stanford Linked Data Workshop Technology Plan
““…ensure resulting model retains the BL’s high-level focus and its web derived, transparent structure for representing facts about people, organizations, places, events, and topics”
• LOD raised the Library’s profile internally and externally
• LOD helped us focus our legacy data enhancement activities
www.bl.uk 32
Challenges
Converting MARC data into RDF!
• Publication event approach: transforming transcribed text into data
• URI creation from string may result in duplication changes over time may also produce duplication.
• Legacy data issues e.g. inconsistency of the data e.g. cataloguers using inadequate input tools for diacritics
• This is (relatively) new, nobody has all the answers
www.bl.uk 33
For further information
http://bnb.data.bl.uk
http://www.bl.uk/bibliographic/datafree.html
Thank you.
Questions?
http://twitter.com/#!/BLMetadata