Establishing the Connection: Creating a Linked Data Version of the BNB
Neil Wilson
Head of Metadata Services
Changing ExpectationsPublic Sector Metadata
The Web has accelerated development of a collaboration culture & fostered expectations that information & content should be as freely available as the Internet itself
Many wider benefit arguments have been advanced for public bodies to make their data freely available
2009 saw an increasing Government commitment to the principle of opening up public data for wider re-use.
The “Putting the Frontline First: Smarter Government” report required “the majority of government-published information to be reusable, linked data by June 2011”
3
Developing an Open Metadata StrategyChoices and Challenges
When developing an open metadata strategy we wanted to:
Try and break away from library specific formats e.g. MARC and use more cross domain XML based standards e.g. DC, RDF etc
Develop the new formats with communities using the metadata
Get some form of attribution while also adopting a licensing model appropriate to the widest re-use of the metadata
Adopt a multi track approach addressing the needs of: Traditional libraries Researchers wanting to ‘data mine’ catalogues & new linked data developers & users
…And deliver the above with decreasing resources
4
First Steps Toward An Open Metadata Strategy During 2010 We…
Developed a capability to supply metadata using RDF/XML standards used in the wider web community
Conducted trials with a range of new users including: the UK Intellectual Property Office & UNESCO
Developed a free Z39.50 MARC record download service for libraries to assist with derived cataloguing etc
Hosted a linked data workshop with 40 representatives from key international organisations
5
Current Status Since August 2010 We Have:
Created a new email enquiry point for BL metadata issues: [email protected]
Signed up nearly 400 organisations worldwide to the free MARC21 Z39.50 service
Worked with JISC, Talis & other linked data implementers on technical challenges, standards & licensing issues
Begun to offer sets of RDF/XML metadata under a Creative Commons 0 (CC0) license
Supplied multi-million record sets to organisations including: the Open Bibliography Project, the Open Library & Wikimedia Commons
6
Library Metadata & The Promise of Linked Data
Traditional library metadata uses a self contained, proprietary document based model
The Semantic Web uses a more dynamic data based model to establish relationships between data elements via links
By migrating from traditional models libraries could begin to:
Integrate their resources in the web, increasing visibility & reaching new users
Offer users a richer resource discovery experience
Transition from costly specialist technologies & suppliers & widen their choice of options
Traditional Library Metadata
Properties
‘Semantic’ Metadata
Properties
Proprietary, library specific standards
Passive
Self contained
Linear text -‘Read’ by users as result of database query
Offers end result
Open Standards
Dynamic/Reactive
Links to external resources
Micro Portal - Interacts with users & systems in response to queries Offers options for further inquiry
7
Our Linked Data Journey…What to Offer?
Wanted to offer data allowing useful experimentation & advancing discussions from theory to practice
Why BNB?
General database of published output and not an institutional catalogue of unique items
Mass produced works on all subjects, many with internationally recognised identifiers e.g. ISBN
Reasonably uniform format across 60 years of publication
Significant amount of data – 3 million records in various languages
8
Our Linked Data Journey…What do we need to get there?
Wanted to undertake the work as an extension of existing activities and as an opportunity to develop expertise using:
Existing staff – librarians rather than IT experts
As many pre-existing tools or technologies as possible
Standard PC hardware for conversion
Library MARC21 data as a starting point
Established linked data resources to connect to
A proven platform that would enable us to concentrate on the data issues
9
Our Linked Data Journey…First stage: How To Migrate the Metadata?
From a flat catalogue card model to something more appropriate…
Preliminaries: Staff training in linked data
modelling concepts & increased familiarisation with RDF & XML concepts
Experience of working with: JISC Open Bibliography Project & Others
Feedback on initial MARC to XML conversion work
Incremental approach adopted Open Data License RDF/XML Format Add External Links Re-model Create Linked Data
10
Our Linked Data Journey… Second stage: Selecting trusted resources to link to
To begin placing library data in a wider context & supplement or replace literal values in records
Looked for library sites: Dewey Info LCSH SKOS VIAF
Plus more general sites: GeoNames Lexvo RDF Book Mashup
11
Our Linked Data Journey…Third Stage: Matching and Generating Links
Three main approaches used:
Automatic Generation of URIs from elements in records e.g. DDC
Matching of text in records with linked data dumps e.g. personal names to VIAF & subjects to LCSH to identify URIs
Two stage crosswalk/matching process for some coded information e.g. MARC country & language codes for GeoNames
12
Our Linked Data Journey…MARC to RDF Conversion Workflow
1) SelectionIn-house utilities / MARC ReportExclusions (CIP; multiparts; serials)
2) Pre-processingMARC GlobalNormalise data values, Remove trailing punctuationMove/copy data values to improve machine matching/transformation
3) Character set conversion In-house utilities
Decomposed UTF-8 converted to precomposed for conformancewith W3C recommendations
4) URI creation In-house utilities Create BL URIs in MARC fields) Harvest URIs from external sources
5) Data Transformation MARC Report & MARC 21/RDF XSLT Convert to RDF & Insert URI prefixes
MARC to RDF Conversion Consists of multipleautomated steps using a range of tools
13
Full BNB MARC21
File
Transform to RDFXML using
XSLT
Load to Linked Data Platform
Generate RDF Triple Dump
BNB RDF/XML file
Select single volume
published books only
Normalise for improved
matching & transforms
Convert to pre-composed UTF-8
Create BL URIs and add external
URIs by matching
MARCPre-Processing
Our Linked Data Journey…MARC to RDF Conversion Workflow
14
Our Linked Data Journey…Which took us from here...
15
Our Linked Data Journey…Via here...
16
Our Linked Data Journey…To here...
17
bnb.data.bl.ukPreview Options
bnb.data.bl.uk/sparql bnb.data.bl.uk/describe bnb.data.bl.uk/search
.
Includes: BNB Books 2005-11 485,000 records 18,000,000 RDF Triples
18
bnb.data.bl.ukSample ‘Labelled Concise Bound Description’
19
Our Linked Data Journey…Journey’s End…Point?
Preview Details at:
http://www.bl.uk/bibliographic/datafree.html
Roadmap for next steps includes: Staged release over coming
months for: books, serials, multi-parts etc
Aiming to update on a monthly basis once complete
Documentation & further refinement of data model
Looking at RDF triple dump option
What else might be offered?
20
Lessons Learned on the Journey General
It is a new way of thinking
Legacy data wasn’t designed for this purpose so starting can be problematic
There are many opinions…but few real certainties Everyone is learning & multiple solutions exist so you may be the best judge
Don’t reinvent the wheel...there are often tools or experience you can use. Start simple & develop in line with evolving staff expertise
Give careful thought to data modelling & sustainability issues e.g.
Where possible use cross domain standards e.g. ISO codes in data
Select relevant & stable targets when providing links if you are doing so
21
Lessons Learned on the JourneyData Issues
Reality check by offering samples for feedback to wider groups
Be prepared for some technical criticism in addition to positive feedback & try to continually improve in response
Conversion inevitably identifies hidden data issues…& creates new ones!
…But it’s often better to release an imperfect something than a perfect nothing!
22
Lessons Learned Along The WayStaff and Resource Issues
It can be a steep learning curve so:
Look for training opportunities to develop staff skills to support new open metadata standards
Cultivate a culture of enquiry & innovation among staff to widen perspectives on new possibilities
Look into collaborative pilot projects with peer organisations to share resources & expertise
See what tools are already out there that can save you development time or assist in checking data
2323
Final Thoughts…For Others Contemplating a Similar Journey
It’s never going to be perfect first time
We expect to make mistakes
We aim to learn from them
We hope others will learn something too
… and that everyone benefits from the experience
So if anyone is thinking of undertaking a similar journey…..
Just do it!
24
Any Questions…?
bnb.data.bl.uk/sparqlbnb.data.bl.uk/describebnb.data.bl.uk/search
Images from