Canadian Cohort: Moving Towards Linked Data Implementation at UAL
Ian Bigelow, University of Alberta Libraries
With thanks to Danoosh Davoodi, Sharon Farnel and Abigail Sparling for the
reuse of several slides from previous presentations
Playmobil (2019). Roman Warriors’ Ship. Retrieved from: https://www.playmobil.ca/en/roman-warriors-ship/5390.html
At the Annual Meeting of the American
Library Association (ALA) in June 2018, LC
confirmed that BIBFRAME will be their
replacement for MARC
Alea iacta est
Moving Forward with Linked Data at the UAL
Linked data implementation as a strategic priority
“In order to reap the benefits of full participation in the linked open data environment,
UAL should continue to take steps towards complete conversion of existing library
data to linked open data. This would involve a full transition of workflows for resource
description/metadata creation to linked open data, transitioning all library systems for
resource discovery so they work with linked open data formats, and developing new
workflows, both internal and with associated vendors and partners, to support these
steps.”¹
What might this mean for you? … What do you think?
1. Moving Forward with Linked Data at UAL
5
What is Linked Data
The Semantic Web
“The Semantic Web will bring structure to the
meaningful content of web pages. It is not a
separate Web but an extension of the existing one,
in which information is given well-defined
meaning, better enabling computers and people to
work in cooperation”
Berners-Lee, T., Hendler, J., Lassila, O. (2001). The Semantic Web.
ScientificAmerican.com
lod-cloud.net: https://creativecommons.org/licenses/by/4.0/7
What is linked data?
“The collection of interrelated datasets on
the Web can be referred to as Linked
Data”.
https://www.w3.org/standards/semanticweb/data.html
Jakov M. Vežić (https://www.facebook.com/photo.php?fbid=10214499563312325&set=gm.1820667984616319&type=3&theater)8
Principles of linked data1. Use URIs (Uniform Resource Identifiers) to
name things
2. Use http URIs so that people and machines
can look up those names
3. When a person or a machine looks up a URI,
provide useful information using Web
standards such as RDF, SPARQL, JSON
4. Include links to other URIs so that a person
or machine can discover other things
5. Use an open license*
* appropriate openness; open data ≠ linked data
Berners-Lee, T. (2006). https://www.w3.org/DesignIssues/LinkedData.html9
Google Knowledge GraphAn intelligent model of
entities and relationships
Designed to enhance
search and discovery in
three ways:
1. find the right thing
2. get the best summary
3. go deeper and
broader
10
Linked Data for Libraries
Linked Data and Libraries● Many examples of linked data outside of libraries
● Uptake in libraries has been slow and uneven
○ paradigm shift
○ lack of skills and expertise
○ lack of practical starter projects
○ challenge of data conversion
○ changes in workflows
○ lack of system support
● But things are shifting
○ viable alternatives to MARC exist and are being implemented
○ standards and workflows are being rebuilt to facilitate this shift
○ moves in repository communities to linked data for interoperability
○ linked data potential for managing knowledge production lifecycle
What is MARC?Machine Readable Cataloguing
● A transition from the catalogue card to working in an online catalogue
● Developed in the 1960s by Henriette Avram
● MARC has been around for a very long time and it is still the primary encoding
format for bibliographic metadata for libraries worldwide … for now
Whither MARC?● “MARC must die!” - Roy Tennant, October 2002
○ “I wanted librarianship to wake up to the fact that our foundational standard was no
longer serving us like it should”
● What is the future of MARC in an increasingly digital, interconnected
information environment?
○ functions to a point but leaves library records siloed
○ focuses on records that are independently understandable
○ data not easily parsed
“So what has happened over the last 15 years? For starters, no one seems to
think it’s controversial anymore. The Library of Congress has not only
admitted that MARC’s days are numbered, they are actively working to
develop a linked data replacement”. (Roy Tennant, “MARC Must Die” 15 Years On -
http://hangingtogether.org/?p=6221)
Why linked data? Why for libraries?● facilitate data integration and enable
interconnection of previously disconnected
datasets
● addition of each new dataset increases value
of existing datasets (the network effect)
● browsing through data is easier with URIs
● increased use and pressure to improve data
quality
● data as a service increases usability
● use of flexible and extensible data models
● compatibility with existing standards
● encourages openness, sharing, and reuse
● enhanced discovery experiences
● make rich library data actionable
● enhanced integration across collections
based on flexible and extensible data models
and shared principles
● opportunities for modeling different
worldviews to enable more contextually
appropriate descriptions
● enable enhanced collaborations across
libraries, archives, museums
● enhanced capabilities for researcher identity
and research output management
● streamlined workflows for metadata creation
and enhancement
15
What is BIBFRAME
BIBFRAME● Initiative of Library of Congress and
community partners and collaborators in
2011
● Provides a foundation for the future of
bibliographic description on and of the web
● Based on linked data principles and
standards
● Goes beyond “replacing” MARC
○ different model for expressing and
connecting bibliographic data
● https://www.loc.gov/bibframe
● http://bibframe.org/
● Three core levels of abstraction
○ Work
○ Instance
○ Item
● Additional key concepts
○ Agents
○ Subjects
○ Events
● Consists of RDF classes and properties
○ members of a class share certain
characteristics and may have subclasses
○ properties describe characteristics of
resources as well as relationships among
resources
BIBFRAME 2.0
19
BIBFRAME Ontology● List view
○ entire vocab on a single page
○ lists classes and properties
○ http://id.loc.gov/ontologies/bibframe.html
● Category view
○ all properties sorted into several broad categories such as identifiers, relationships, etc.
○ http://id.loc.gov/ontologies/bibframe-category.html
● RDF
○ full OWL ontology
○ http://id.loc.gov/ontologies/bibframe.rdf
LD4P Cohort
Linked Data for Production (LD4P)For the past two years, Linked Data for Production has been focusing on:
● developing standards, guidelines, and infrastructure to communally produce
metadata as linked open data
● developing end-to-end workflows to create linked open data in a technical services
production environment
● extending the BIBFRAME ontology to describe library resources in specialized
domains and formats
● engaging the broader library community to ensure a sustainable and extensible
environment
22
LD4P Phase 2 and the LD4P Cohort
A collaborative project among four institutions (Cornell, Harvard, Stanford, and the University of Iowa) and the
Program for Cooperative Cataloging (PCC), this phase of LD4P will have seven broad goals:
1. The creation of a continuously fed pool of linked data expressed in BIBFRAME-based application profiles.
2. The development of an expanded cohort of libraries (the LD4P Cohort) capable of the creation and reuse
of linked data through the creation of a cloud-based sandbox editing environment.
3. The development of policies, techniques and workflows for the automated enhancement of MARC data
with identifiers to make its conversion to linked data as clean as possible.
4. The development of policies, techniques, and workflows for the creation and reuse of linked data and its
supporting identifiers as libraries’ core metadata.
5. Better integration of library metadata and identifiers with the Web through collaboration with Wikidata.
6. The enhancement of a widely-adopted library discovery environment (Blacklight) with linked-data based
discovery techniques.
7. The orchestration of continued community collaboration through the development of an organizational
framework called LD4.
23
LD4P Cohort MembershipUniversity of Alberta
University of California, Davis
University of California, San Diego
Casalini Libri
University of Chicago
University of Colorado
Cornell University
Duke University
Frick Art Reference Library
Harry Ransom Center
Harvard University
University of Iowa
Library of Congress
University of Michigan
University of Minnesota
National Library of Medicine
Northwestern University
PCC
University of Pennsylvania
Princeton University
Stanford University
Texas A&M University
University of Washington
Yale University
UAL LD4P Cohort Project Summary1. Enhancement of conversion, reconciliation and enrichment processes for MARC to BIBFRAME
2. Exploration of new forms of authority control based on URIs - Utilizing MARC and BIBFRAME data
enriched with URIs
3. Conversion of Monographs Team Operations - In order to make optimal use of current staffing and the
current level of development of BIBFRAME, we plan to work on original creation of data in the shared
RDF pool for Monographs. Thinking about this as a starting point for fuller implementation (across other
teams) we aim to convert the operations/workflows of our Monographs Team
4. Community building:
a. To help foster a wider community of linked data experimentation and implementation in Canada,
UAL will work with other Canadian participants to liaise with the cataloguing community and
standards organizations in Canada (CFLA, CCC, CCM, CLDI)
b. As a member of the NEOS consortium, which includes a shared catalogue and services related to
cataloguing, UAL will engage NEOS members in aspects of this work to transition towards linked
data, so that we can move forward together.
Sinopia Exercise … not yetLookups and profiles in place, but currently in a transitional state with a new version
and need to set-up local profiles.
Timelines for a public launch are tabled to be discussed at ALA Annual in Washington.
While LC is providing training for the LD4P Cohort, once this is complete and Sinopia
launched the PCC will be working with Cohort members to broaden the reach of
Sinopia. Key takeaways:
1. The software is open
2. Sinopia will be made available for wider testing
3. PCC will be working on a wider training strategy
4. Learn more!
SVDE
SHARE-VDE is a community-driven initiative to implement linked
data. While the aim is a more general focus on transitioning traditional
GLAM institution data thus far the project focus has been on moving
from MARC to BIBFRAME.
The process enriches library data with additional information and
relationships, previously unexpressed with MARC, and converts
bibliographic and authority data in linked data.
A virtual discovery platform with a four-layered adaptation of the
BIBFRAME data model was developed to provide a linked data
discovery option.
SHARE Virtual Discovery Environment (SVDE)
The main areas of the SHARE-VDE project:● Enrichment of MARC record with URIs
● Conversion from MARC to RDF using the BIBFRAME vocabulary (and other additional ontologies as needed)
● Data publication according to the BIBFRAME data model
● Batch/automated data updating procedures
● Batch/automated data dissemination to libraries
● Progressive implementation of further use cases in the priority order defined by the community
SHARE-VDE is a collaborative endeavour, based on the requirements and perceptions of libraries, developed by:
- Casalini Libri, provider of bibliographic and authority data as member of the Program for Cooperative Cataloguing - @Cult, provider of ILS, Discovery tools and Semantic web solutions for the cultural heritage sector
- with input and active participation from an international group of 22 Research Libraries and influenced by the vision of the LD4P initiative
The collaborative initiative is steered by the library community
Casalini SHARE VDE (SVDE) Project: Vendor Supported and Community Driven Development
Involvement in phase 1 and/or 2 has included:
● Stanford University
● University California Berkeley
● Yale University
● Library of Congress
● University of Chicago
● University of Michigan Ann Arbor
● Harvard University
● Massachusetts Institute of Technology
● Duke University
● Cornell University
● Columbia University
● University of Pennsylvania
● Pennsylvania State University
● Texas A&M University,
● University of Alberta
● University of Toronto
Work thus far has culminated in the creation of an
experimental linked data discovery environment as
well as the return of the NEOS catalogue in MARC
enriched with URI and BIBFRAME.
Phase 3a will see the implementation of the SVDE
platform with the full UAL/NEOS catalogue with
ongoing updates. This will allow us to continue
with data experimentation and analysis, provide a
training tool to familiarize ourselves with this kind
of data/work, and continue progression towards
linked data implementation.
37
Participating Institutions
SVDE Full Members
Duke University
New York University
Stanford University
University of Alberta – NEOS consortium
University of Chicago
University of Michigan at Ann Arbor
University of Pennsylvania
Yale University
National Libraries
Library of Congress
National Library of Medicine
National Library of Norway
LD4P Cohort
Cornell University
Frick Art Reference Library
Harry Ransom Center
Harvard University
Northwestern University
Princeton University
UC Davis
UC San Diego
University Colorado at Boulder
University of Minnesota
University of Texas A&M
University of Washington
SVDE Transformation Council
«The SHARE-VDE Transformation Council's role is to provide insight and analysis of the MARC to BIBFRAME transformation to make recommendations for improvements based on member library data analysis, and project documentation. Initial recommendations are based on Phase 2 deliverables, but the work of the team will be ongoing into the foreseeable future.»
There are 4 sub-committees focusing on specific areas:• Work Identification Working Group• Authority/Identifier Management Services Working Group• Cluster Knowledge Base Interaction/Editor Working Group• User experience/User Interface Working Group
SHARE-VDE Process Overview
40
41
Casalini bf:2.0 Data
42
The Super Work Entity Model
Possemato, T. (2019). “Share Virtual Discovery Environment in Linked Data (SHARE-VDE) Highlight on Data Modeling” 2019 LD4 Conference, Boston, MA
Discovery Overview1. Blacklight, the LD4P2 Cohort, and the UAL Discovery Review
2. SHARE VDE
SHARE VDE Portal Exercise
UAL Participation and Next StepsA vendor supported, community driven project, the SVDE transformation tool and
enrichment service will be used by SVDE members and all LD4P2/LD4P Cohort
members for consistency.
UAL has been active in the development of the project through participation in
steering meetings, analysis of conversion processes, and now work on the SVDE
Transformation Council and Work Identifier Working Group.
How does NEOS fit?
48
Further Context
Relating this to current practice: NEOS Standards and BIBFRAME1. Overview of key cataloguing standards in relation to BIBFRAME
a. BIBCO to BIBFRAME
b. CONSER to BIBFRAME
c. Review of LC conversion specifications
Program for Cooperative Cataloguing (PCC)
“It is time to move beyond knowledge and skills related to linked data at a theoretical
level and into implementation. Building on the PCC’s strong tradition of providing
training for metadata creators, active experimentation and piloting of linked data
practices will help inform policy decisions, training, and operationalizing such
practices. As we move to a culture of greater data sharing, it is crucial to extend our
community, both by engaging a more diverse range of members in the work of the
PCC and by collaborating with vendors, open source communities, and others.”
(Program for Cooperative Cataloguing, 2018)
Program for Cooperative Cataloguing (2018). PCC (Program for Cooperative Cataloging) Strategic DirectionsJanuary 2018-December 2021. Retrieved from https://www.loc.gov/aba/pcc/about/PCC-Strategic-Directions-2018-2021.pdf
51
BIBFRAME Project Overlap
Where would you like your BIBFRAME?
Linked Data for UAL
UAL Project WorkUAL - Institutional support
Strategic priorities 2019-2020
Vision: Moving Forward with Linked Data at UAL
Bibliographic Services Unit
1. Staffing
a. New Monographs Cataloguing Specialist tied to the LD4P Cohort project
b. New Linked Data Librarian Resident due to start in Bibliographic Services in July
c. Updates to expectations and job fact sheets
2. Training
a. LC provided training sessions for LD4P Cohort members, with further sessions pending for Sinopia
b. Ongoing review of webinar options
c. Common training through LibraryJuice XML and RDF Based Systems Course
d. Linked data lab time sessions to collaboratively work through ideas
3. Infrastructure
e. Testing of NEOS data in a test triplestore (GraphDB) with support from ComputeCanada
f. Testing of triplestore database work via the Stardog triplestore for SVDE
g. SVDE provides a discovery tool for testing
h. Sinopia provides a cataloguing module for working with BIBFRAME
i. ITS is working on the setup of a local production triplestore for NEOS data
4. Community
a. International BIBFRAME Community
b. Canadian BIBFRAME Readiness Task Force
c. Vendors
d. NEOS
Playmobil (2019). Roman Warriors’ Ship. Retrieved from: https://www.playmobil.ca/en/roman-warriors-ship/5390.html
Linked Data for NEOS
Implications for NEOS: Discussion1. What does this mean for the shared database?
2. What work is ahead for NEOS-Tech? How do we ensure our standards are
mapped to this environment?
3. What are the timelines?
a. When will BIBFRAME be here
b. How long can we continue using MARC
4. Support for discovery
5. Training
“Never let the future disturb you. You will meet it, if you
have to, with the same weapons of reason which today arm
you against the present.”
Marcus Aurelius, Emperor of Rome, 121-180. (2002). Meditations. London: The Folio Society.
63