+ All Categories
Home > Documents > Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of...

Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of...

Date post: 22-Jul-2020
Category:
Upload: others
View: 0 times
Download: 0 times
Share this document with a friend
22
Task Group on URIs in MARC Year One Report Date: October 6, 2016 Members: Robert Bremer, Steven Folsom, Paul Frank, Jean Godby, Les Hawkins, Reinhold Heuvelmann, Chew Chiat Naun, Adam Schiff, Jackie Shieh, Gary Strawn Contributing consultants: Nancy Fallgren, Nancy Lorimer, Melanie Wacker, Terry Reese, Corine Deliot, Thurstan Young OUTLINE: I. CHARGE II. EXECUTIVE SUMMARY III. PROCESSES III.1. Define and understand HTTP URI III.2. Identify issues/problems with adding URIs whether it was actually doable IV. PROBLEM STATEMENT IV.1. Where to place URIs in the MARC structure ($0, $4)? IV.2. What difficulties are evidenced? IV.3. What did we learn? IV.4. Outcomes IV.5. Next steps and indepths analyses in year 2 V. RECOMMENDATIONS TO STAKEHOLDERS VI. REFERENCES I. CHARGE: Charge 1: Identify and address any immediate policy issues surrounding the use of identifiers in MARC records that should be resolved before implementation proceeds on a large scale. These issues may include: 1. whether to use alphanumerical identifiers or URIs 2. the use of multiple identifiers for the same entity; 3. where to put work and expression identifiers. Charge 2: In collaboration with the PCC Standing Committees, develop guidelines for including identifiers in MARC bibliographic and authority records. Charge 3: Develop a work plan for the implementation of identifiers in $0 and other fields/subfields in member catalogs and in PCCaffiliated utilities. Tasks may include: 1. determine the entities for which identifiers should be provided in an initial implementation; 2. identify source vocabularies that will need to be accommodated; 3. identify automated methods for populating and maintaining new and existing records with identifiers; Page 1 of 22
Transcript
Page 1: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

Task Group on URIs in MARC Year One Report Date October 6 2016

Members Robert Bremer Steven Folsom Paul Frank Jean Godby Les Hawkins Reinhold Heuvelmann Chew Chiat Naun Adam Schiff Jackie Shieh Gary Strawn

Contributing consultants Nancy Fallgren Nancy Lorimer Melanie Wacker Terry Reese Corine Deliot Thurstan Young

OUTLINE I CHARGE II EXECUTIVE SUMMARY III PROCESSES

III1 Define and understand HTTP URI13 III2 Identify issuesproblems with adding URIs whether it was actually doable

IV PROBLEM STATEMENT IV1 Where to place URIs in the MARC structure ($0 $4)IV2 What difficulties are evidencedIV3 What did we learnIV4 OutcomesIV5 Next steps and in-shy‐depths analyses in13 year 2

V RECOMMENDATIONS TO STAKEHOLDERS VI REFERENCES

I CHARGE

Charge 1 Identify and address any immediate policy issues surrounding the use of identifiers in MARC recordsthat13 should be resolved before implementation proceeds13 on a large scale These issues13 may include

1 whether to use alphanumerical identifiers or URIs2 the use of multiple identifiers13 for the same entity3 where to put work and expression identifiers

Charge 2 In collaboration with the PCC Standing Committees develop guidelines for including identifiers inMARC bibliographic and authority records

Charge 3 Develop a work plan for the implementation of identifiers in $0 and other fieldssubfields in membercatalogs and in PCC-shy‐‑affiliated13 utilities Tasks may include

1 determine the entities for which identifiers should13 be provided13 in an initial implementation2 identify source vocabularies that will need to be accommodated3 identify automated methods for populating and maintaining new and existing records with

identifiers

Page13 1 of 22

4 develop13 requirements for tools that will allow catalogers to13 work13 accurately13 and efficiently13 withlinked data vocabularies

5 identify functionality that will be required for library systems (including ILSs and utilities) toexchange control protect and update13 data based on identifiers

6 develop13 a pilot project and13 identify13 partners

Charge 4 In consultation with the MARC Advisory Committee technologists versed in linked data best practicesand13 other stakeholders identify and13 prioritize any remaining issues concerning support for identifiers in theMARC format and initiate MARC13 proposals as appropriate Prioritization of issues should13 take into13 accountimpact feasibility and the late stage of MARCrsquos life cycle Issues may include

1 accommodating entities and13 relationships not currently well provisioned13 for identifiers in13 MARC2 consistency13 of provisions across MARC fields3 addressing distinction of URIs pointing to13 real world13 objects vs URIs pointing to13

documentsauthorities

The Task Group should give priority to actions that will lead to tangible results during the lifetime13 of thePCC Strategic Directions 2015-shy‐‑2017 The Task Group should feel free to form subgroups and call onadditional expertise13 as needed

II EXECUTIVE SUMMARY

The first year since the inception of the URI in MARC Task Group (TG) began despite the extremelychallenging schedules13 and demands all members13 and most consultants13 devoted a great deal of theirtime in working together13 through many issues It13 has been a great privilege13 to be13 part of the13 team inwhich everyone has hisher eyes on the goals

The deliberations and recommended solutions were based on two driving principles Firstly therecommended solution will be across-shy‐the-shy‐board13 and13 straightforward The implementation must havethe most13 and broad impact but13 with the least13 disruption to workflow and in MARC environmentSecondly an important fact that the13 TG has observed and kept insight throughout various discussionsThough lot of libraries have been13 anxious and13 in13 position13 to13 move forward13 with13 linked13 dataexperimentation and implementation Majority of libraries remain ambivalent and hesitant In such dualenvironments the13 TGrsquos recommendations must accommodate13 dual operations for period of time TheTG must provide ways for library to decide their pace and needs when transitioning13 from MARC tolinked data

Early on everyone was keenly aware of the syntax and semantic complexity of identifiers in the form ofdereferenceable uniform resource identifier13 (HTTP URI) After monthsrsquo discussions13 the TG firmlybelieved13 following the agile principles specifically the scrum approaches13 would give the process13 mostsuccess13 in addressing URI in MARC issue

bull Figure13 out how to do the13 work bull Do the work bull Identify whats getting in its way bull Take responsibility to resolve all the difficulties within its scope bull Work with other parts of the organization to resolve concerns outside their control

Page13 2 of 22

1) Recognized13 that there were not possible acrossb theb board13 simple solutions for MARC13 fields concerning $0 Therefore the TG pushed13 forward13 the fields that could13 benefit $0 without complications See MAC Paper 2016b DP19 in REFERENCE Section

2) Agreed13 upon the universal definition13 of $0 for URI that describes THING (URIConcept) Keeping in line with the overall13 principle of least disruption and most coverage the TG recommended the use of HTTP URI in $0 as default URI for libraries which opt to adopt URI in $0 Alternatively a text string identifier in $0 to remain in force for13 libraries which are not13 ready to move forward See MAC Paper 2016b DP18 in REFERENCE Section

3) Agreed13 that the relationship13 entity of an13 RDF statement be13 represented in MARC Potential candidates13 for expressing relationship were $413 $i $j $e The consensus was to focus on $4 due13 to the existing13 subfield having13 been defined in all those13 fields where13 relationships can currently be13 expressed in MARC The rescoping of $4 to hold URIproperty (predicate) does not prevent the13 library communityrsquos continued application of 3-letter relator codes13 It provides an opportunity for libraries which are ready to deploy HTTP URI for13 relationship (propertypredicate) Consensus was that $4 alone should13 be redefined to carry relationship URIs this was considered a consistent13 and across the board solution which would not require further amendments to the MARC formats by rescoping or defining additional subfields

4) Identified a need to host identifier for real13 world object13 The TG hoped to propose setting aside $1 for13 identifier13 that13 points to THING (URI resourceRWO)

5) TG Members who work closely with other standards communities such as ISNIVIAF have vested interests in the 024 in Authority13 The 024 field appears to possess the13 potential of addressing relationship of13 an entity across vocabulariesontologies [1]13 13 13

The TG hopes to address above items no 4-shy‐613 in discussion papers for MARC13 Advisory Committee (MAC)to consider

The Pilot Test that the TG conducted13 in13 February-shy‐March 2016 revealed that provisioning for URIs inMARC presents additional layers of complexity that require further consideration ie repeatabilitypairing ambiguous relationships and13 significance of the ordinal sequence Additionally the TG isworking hard further identifying potential field andor indicatorsubfield to record identifierrepresenting a Work a resource object These are described in sections below in greater13 details

III PROCESSES

The TG had in mind processes that would be the least disruptive yet with the most promising results Inorder to13 ensure cohesive and13 broad13 approaches the TG set forth13 the tasks a) define and13 understand13 uniform resource identifier and13 the deployment of the Web-shy‐service protocol scheme HTTP b) identify

Page13 3 of 22

issues and problems with adding URI13 in MARC13 Is it actually doable in current system that hosts MARCdata

III1 Define and understand HTTP URI [Charge 11 12 Charge 43]

According to13 a MARBI position13 paper published in 2009

The use of a URI instead13 of plain13 text is particularly applicable to13 situations where the value of thehellipelement13 comes from a controlled vocabulary which could be an authority list13 or13 formal thesaurus (eg a name from the LC Name Authority File or13 a topic for13 an LCSH heading)13 or13 any other list of controlled13 codes or terms (eg the MARC13 Code List for Languages)

However the goal of facilitating the transition from MARC to linked data now requires a more precisemachine understanding of the data accessible13 from the13 URIs that have13 been added to MARC records

The issue can be illustrated with an excerpt from the Library of Congress Name Authority record forHillary Clinton accessible at httpslccnlocgovn93010903 Of particular interest is the list of 024fields which identify ldquostandard number[s] or13 code[s] associated with the entity named in the 1xx fieldwhich cannot be accommodated in another fieldrdquo according to the MARC Authority13 definition All of the02 fields copied below contain URIs pertaining to Hillary Clinton

0247_ |a httpwwwwikidataorgentityQ6294 |2 uri 0247_ |a httpdbpediaorgresourceHillary_Rodham_Clinton |2 uri 0247_ |a httpviaforgviaf54950123 |2 uri 0247_ |a httpisniorgisni0000000122802598 |2 uri 0247_ |a httpd-nbinfognd119082101 |2 uri 0247_ |a httpidndlgojpauthndlna00552567 |2 uri 0247_ |a httpautnkpczjn20000700317 |2 uri 0247_ |a httpcataloguebnffrark12148cb12543158f |2 uri 0247_ |a httpwwwidreffr034705171 |2 uri 0247_ |a httpdatosbneesresourceXX1725857 |2 uri 0247_ |a httpidsbnitafIT5CICCU5CUBOV5C804461 |2 uri 0247_ |a httpcanticbnccatregistresCUCIda11695705 |2 uri 0247_ |a httpsmusicbrainzorgartist858a3d95-e1b2-4aac-8427-

a99e391ce8c5 |2 uri 0247_ |a httpwwwimdbcomnamenm0166921 |2 uri 0247_ |a httpbioguidecongressgovscriptsbiodisplayplindex=C001041 |2

uri 0247_ |a httpwwwnndbcompeople022000025944 |2 uri 0247_ |a httpsballotpediaorgHillary_Clinton |2 uri 0247_ |a httpswwwfreebasecomm0d06m5 |2 uri

The rows in the table can be partitioned into three categories

bull Near the bottom the 024 fields with the peach-shy‐colored background are13 human-shy‐readabledocuments about Hillary Clinton These are pages from popular resources maintainedoutside the library community such13 as IMDB and BioGuide which have13 been deemed

Page13 4 of 22

authoritative13 by library catalogers and authority experts In shorthand these13 URIs are13 standard URLs13 for Web pages

bull The rows with the blue background are13 records are13 derived from library authority files andmore modern registries designed for similar purposes They may be pages from13 libraryauthority files published on the13 Web human-shy‐readable views of13 machine-shy‐understandableRDF data or raw RDF But in one form or another all of the URIs resolve to13 libraryauthorities (or simply lsquoAuthoritiesrsquo) that are13 about Hillary Clinton The TG refer13 to these URIsas Authority URIs

bull The rows with the green background contain URIs13 that refer to Hillary13 Clinton directly in away that is technically distinct from documents about her These URIs conform to linkeddata conventions described13 in13 standard13 Web13 documents such13 as ldquoCool URIs for the SemanticWebrdquo [httpswwww3orgTRcooluris] The data accessible from these URIs13 has13 beenpublished13 by third13 parties as well as the library community and13 encodes a rich13 domainmodel designed expressly for machine understanding13 The TG refer13 to these URIs as Entityor Thing URIs

According to13 linked13 data conventions machine processes designed13 to13 construct meaningful statementsand inferences from them require13 Thing URIs When Thing URIs are13 defined for people13 and creative13 works one desirable outcome would be a machine-shy‐understandable statement such13 as lsquoHillary Clinton13 isthe author13 of13 the book It Takes a Villagersquo With technology available in 2016 data accessible from Webpage URIs may not be machine-shy‐understandable at all and13 Authority URIs may only be partiallyunderstandable The ambiguity of URIs illustrated13 by the 02413 fields in the13 MARC Authority records is alsopresent in13 MARC13 bibliographic records

III2 Identify issuesproblems with adding URIs whether it was actually doable [Charge1 Charge 3]

A pilot test of inserting HTTP URI in13 $0 in13 bibliographic and13 authority data13 emerged as one13 logical first step for the TG It helped the TG understand issues that could easily resolve in the near term and the dob ability of inserting URI in $0 in MARC environment

The Pilot Test began in February 2016 Members prepared13 sets of input data and13 worked13 with13 tool creators13 (MarcEdit and Authority13 Toolkit) to refine lookup algorithms13 for URI insertion in $0

The enhanced data13 with HTTP13 URIs embedded were to be ingested to several integrated library systems for13 evaluation13 This exercise assisted the TG gaining a cohesive understanding of the role of an identifier in the form of dereferenceable URI13 deployed in $0 in MARC environment13

Throughout the process the TG began to frame the questions that might assists in the effort in transitioning MARC data to linked data Including reached possible resolutions where potential problems may reside Such13 as planning for MAC13 proposals in13 its first year

Issues that were more long-shy‐term and may require in-shy‐depth13 discussions from broader community involvements for instance subfields such as $4 which have been defined might have the potential13 to

Page13 5 of 22

hold13 HTTP URI The repeatability and13 ambiguity and13 significance of the ordinal sequence are less13 trivialand complex

In regards to bulk processing of insertion system performance and scalability the Pilot Test also helpedaddressed SPARQL query adjustment on the13 server side Though URIs added by hand was the13 leastdesirable exercise which13 could13 be inevitable the TG also13 began13 documenting resources that13 would assist13 such endeavor

The overall strategies that the TG adopted were carefully thought-shy‐out in13 order to13 achieve iterativesuccess13 that will build confidence throughout phases13 of implementation

IV PROBLEM STATEMENT

To encode data13 suitable for transformation13 into13 RDF triples it is necessary to13 be able to13 identify in13 MARC13 the data elements corresponding to the subject predicate and object13 in each statement13 andor13 to provide URIs for them It quickly became apparent that the task is not simply to13 add13 subfields to13 allow URIs to be given -shy‐ itself a non-shy‐trivial problem given the limited number13 of13 unused subfield still available in MARC -shy‐ but also13 to13 negotiate the often13 ambiguous semantics of MARC The TG has sought to do this13 through a judicious combination13 of redefinition13 proposals clarification13 of existing semantics and13 best practice recommendations

Best practices for incorporating HTTP URIs in13 MARC13 BIB13 and13 Authority records without making major renovations to MARC format13 (taking into consideration costbenefit13 analysis for13 an lsquoend of13 lifersquo technology)

IV 1 Where to place URIs in the MARC structure ($0 $4) [Charge 3]

The TG developed13 a pilot to13 examine the issues surrounding the issues of adding identifiers to13 MARC13 21 data The work included13 the identification13 of actionable source vocabularies and13 creating test record13 sets13 with dereferenceable URIs13 embedded in the data A variety of13 formats were represented in the test13 data and13 ILS vendors programmers system engineers and13 discovery designers were consulted13 throughout13 the pilot13 to comment13 on the retrieval of13 actionable URIs and the appropriate policies ensuring13 the13 data13 are actionable13 in MARC 21 data

The TG also inventoried the13 MARC bibliographic and authority formats to identify MARC 21 fields that contain subfields13 capable of accommodating URIs In the bibliographic13 formats13 subfields13 $0 and $4 were identified as existing candidates13 for containing URIs subfields13 $0 and $4 were candidates13 in the authority format MARC 21 fields that might usefully contain a subfield for a URI but which do not have13 one defined13 were also13 noted

The TG focused on subfield $0 and $4 for13 its first13 three MARC Discussion papers submitted in to MAC at13 ALA13 Annual 2016

Page13 6 of 22

IV2 What difficulties are evidenced

IV2113 Adding multiple $0 [Charge 12]

The nature and use of subfield $0 has evolved in MARC since the subfield was13 first implemented in 2007 In 2010 it was redefined and came to include standard numbers including URIs in addition to its original use for authority record13 control numbers

However MARC is not specific as to which parts of a controlled heading string correspond to the $0 Nothing in the MARC specification rules out one $0 subfield applying to one set of subfields in a heading while a different $0 applies to others (To ameliorate this problem we formed13 a MARC13 objectURI reconciliation subgroup to enumerate the subfields naming the object13 in each MARC field b see IV22 below) And because $0 is repeatable it is possible to13 find13 multiple $0 values corresponding to13 the same heading13 subfields naming13 the13 same13 entity Indeed the13 latter practice13 is adopted by design in some13 implementations notably that at the German National13 Library13

The existence of different use cases and practices for relating headings to $0 has emerged as an issue13 that13 will13 need to be considered as the TGrsquos work proceeds In the case of OCLCrsquos heading control functionality related to LC names and LCSH subfield $0 data is included as an XML tag attribute in each subfield XML tag covered by a particular authority record13 and13 is repeated13 as many times as needed13 depending o the number of subfields used13 to13 represent the name or subject In the subsequent development of controlling for other authority files the same approach13 has been13 taken but instead13 retaining13 the13 same13 or different authority13 record control numbers in multiple13 $0 subfields [See examples at end of document]

This repeated use of $0 subfields containing the same authority record number or different authority record numbers for13 different13 parts of13 a heading runs contrary to the need that exists in an OCLC context of a single URI corresponding to13 the entire named13 entity given13 in13 the field Extraneous $0 subfields are automatically deleted in WorldCat records in fields that are13 otherwise13 controlled to a particular authority file However this leaves unresolved the question of controlling via multiple source vocabularies within the same language of cataloging13 which many13 see as a desirable mediumb toblongb term objective Given the investment13 in its development13 and the number13 of13 controlled headings in WorldCat completely changing the heading control functionality within WorldCat is not feasible so the TG and OCLC staff have13 considered other alternatives allowing for output of needed URIs in the13 format which libraries would prefer in the13 future

IV2213 How to identify a RDF object in a MARC datafield [Charge 43]

This emerged as an important need because the ability to identify a URI with its corresponding label is necessary to13 support both13 reconciliation13 of existing data and13 updates to13 those labels based13 o their association with an identifier The13 only realistic way to make this identification was to document the correspondences13 on a fieldb byb field basis Fortunately this was very achievable for13 the majority of13 fields in widespread use13 [Link to recommendations The investigation revealed a number of issues relating to

Page13 7 of 22

the identification of13 single entities vs larger13 sets (series conferences)13 and alignment13 of13 MARC and RDAvocabularies

IV2313 What did we find in identifying relationshipsmultiple relationships [Charge 41]

IV23113 Relationships are expressed in MARC by a variety of means includingIV231113 Field tagging either alone eg13 830 or in combination with

indicators eg13 780785 IV231213 Subfield codes eg13 041IV231313 Codes given in subfields eg13 700 $4IV231413 Controlled or natural13 language text given in subfields eg13 700 $i

IV23213 Some of13 these fields are very tightly bound to legacy MARC definitionsstructures and data Redesigning 041 for example to be hospitable toURIs would require a complete reconception of that field

IV23313 There is the greatest value in13 provisioning for URIs following a 7XX$4$0$113 pattern with $4 repurposed to house13 URIs much as $0 now does This approach13 seems13 to present a relatively low barrier to13 implementation13 while having widespread application in MARC

IV23413 Multiple relationships can cause ambiguity where they are associated withmultiple objects or multiple labels In such cases we recommend theexpedient of simply repeating13 the13 field in order to make13 the13 associationsunambiguous

IV2413 How one obtains URIs for various data sources depends on the linked13 data source (different data sources13 avail their URIs13 differently) and interoperability between the data source and the cataloging tools being used

T help13 support obtaining13 the right URIs for the its purposes in13 MARC the TG has begun13 a document currently referred to as13 Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources For commonly used vocabularies in MARC we want to document where in the data source UI one can find the canonical URIs that when dereferenced provides13 data Going forward for each entry13 in the13 document we13 want to explain whether a data source13 publishes their data13 as Authorities Real World Objects or both Also we want to document methods available for machine access to13 the data Is the data13 published13 as Linked13 Data13 available through13 http available through13 a SPARQL13 endpoint data dumps etc

IV24113 MarcEdit [Charge 3]

In the summer of 2014 MarcEdit introduced a suite of tools designed to begin testing the feasibility of embedding13 linked data13 concepts into MARC records Initially the scope of the suite was limited to embedding13 HTTP13 URI in the13 $0 in MARC fields 1xx 6xx 7xx in bibliographic records This initial work focused on integration with the US Library of13 Congressrsquos idlocgov service as well as OCLCrsquos VIAF services13 for resolution However over the past 2 years and in response to many of the questions and issues surfaced through the TG the Linking services have been expanded and revised to potentially support all use-shy‐cases13 identified by13 this13 Task13 Force as13 well as13 providing support for non-shy‐MARC21 users to configure the Linking tool for13 use with other13 MARC formats

Page13 8 of 22

The MarcEdit Linking toolkit13 currently supports the generation of13 URIs for13 all identified fields by this Task Force for authority and bibliographic records The application utilizes a rules file that documents field processing and service configuration values This allows MarcEdit13 to quickly make changes to the rules governing field processing as well as adding support13 for13 new collections and linked data endpoints As of this report (9212016) the MarcEdit Linked13 Data tool support resolution13 against the following linked data13 services

1 US Library of Congraess NAF2 US Library of Congress LCSH3 US Library of Congress Childrenrsquos Subject Headings4 US Library of Congress Demographic Group Terms5 Thesaurus for Graphic Materials6 US Library of Congress GenreForm Terms7 US Library13 of Congress Medium of Performance Thesaurus for Music8 RDA13 Carrier Types9 RDA13 Media Types10 RDA13 Content Types11 Getty Arts and Architecture Thesaurus12 Getty ULAN13 National Library of Medicine MESH14 OCLC FAST Headings15 OCLC VIAF16 German National Library (GND)17 [15 national library name indexes via VIAF]18 Japanese Diet13 Library

Additionally users have the ability to13 configure their own13 linked13 data endpoints for use with13 MarcEdit so long as13 the service in question supports13 SPARQL and json There is presently a knowledge-base article13 at httpmarceditreesetneteditingb marceditsb linkedb datab rulesb file documenting how users can both add new collections13 or modify13 the rules13 used13 when13 processing a particular field

Essentially MarcEdit utilizes13 its13 rules13 file to configure MarcEditrsquos13 linked data platform to identify the proper indexservice normalization13 (for data query purposes) and13 subfields to13 utilize as part of any look up13 process Additionally each13 rules block identifies when13 a field13 should13 be processed13 (ie only when13 used13 in13 a bibliographic record used13 in13 an13 authority record or both) For example here13 is the13 definition for13 the 650 field

ltfield type=bibliographicgt lttaggt650lttaggt ltsubfieldsgtabvxyzltsubfieldsgt ltind213 value=0 vocab=lcshgt ltind213 value=1 vocab=lcshacgt

Page13 9 of 22

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 2: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

4 develop13 requirements for tools that will allow catalogers to13 work13 accurately13 and efficiently13 withlinked data vocabularies

5 identify functionality that will be required for library systems (including ILSs and utilities) toexchange control protect and update13 data based on identifiers

6 develop13 a pilot project and13 identify13 partners

Charge 4 In consultation with the MARC Advisory Committee technologists versed in linked data best practicesand13 other stakeholders identify and13 prioritize any remaining issues concerning support for identifiers in theMARC format and initiate MARC13 proposals as appropriate Prioritization of issues should13 take into13 accountimpact feasibility and the late stage of MARCrsquos life cycle Issues may include

1 accommodating entities and13 relationships not currently well provisioned13 for identifiers in13 MARC2 consistency13 of provisions across MARC fields3 addressing distinction of URIs pointing to13 real world13 objects vs URIs pointing to13

documentsauthorities

The Task Group should give priority to actions that will lead to tangible results during the lifetime13 of thePCC Strategic Directions 2015-shy‐‑2017 The Task Group should feel free to form subgroups and call onadditional expertise13 as needed

II EXECUTIVE SUMMARY

The first year since the inception of the URI in MARC Task Group (TG) began despite the extremelychallenging schedules13 and demands all members13 and most consultants13 devoted a great deal of theirtime in working together13 through many issues It13 has been a great privilege13 to be13 part of the13 team inwhich everyone has hisher eyes on the goals

The deliberations and recommended solutions were based on two driving principles Firstly therecommended solution will be across-shy‐the-shy‐board13 and13 straightforward The implementation must havethe most13 and broad impact but13 with the least13 disruption to workflow and in MARC environmentSecondly an important fact that the13 TG has observed and kept insight throughout various discussionsThough lot of libraries have been13 anxious and13 in13 position13 to13 move forward13 with13 linked13 dataexperimentation and implementation Majority of libraries remain ambivalent and hesitant In such dualenvironments the13 TGrsquos recommendations must accommodate13 dual operations for period of time TheTG must provide ways for library to decide their pace and needs when transitioning13 from MARC tolinked data

Early on everyone was keenly aware of the syntax and semantic complexity of identifiers in the form ofdereferenceable uniform resource identifier13 (HTTP URI) After monthsrsquo discussions13 the TG firmlybelieved13 following the agile principles specifically the scrum approaches13 would give the process13 mostsuccess13 in addressing URI in MARC issue

bull Figure13 out how to do the13 work bull Do the work bull Identify whats getting in its way bull Take responsibility to resolve all the difficulties within its scope bull Work with other parts of the organization to resolve concerns outside their control

Page13 2 of 22

1) Recognized13 that there were not possible acrossb theb board13 simple solutions for MARC13 fields concerning $0 Therefore the TG pushed13 forward13 the fields that could13 benefit $0 without complications See MAC Paper 2016b DP19 in REFERENCE Section

2) Agreed13 upon the universal definition13 of $0 for URI that describes THING (URIConcept) Keeping in line with the overall13 principle of least disruption and most coverage the TG recommended the use of HTTP URI in $0 as default URI for libraries which opt to adopt URI in $0 Alternatively a text string identifier in $0 to remain in force for13 libraries which are not13 ready to move forward See MAC Paper 2016b DP18 in REFERENCE Section

3) Agreed13 that the relationship13 entity of an13 RDF statement be13 represented in MARC Potential candidates13 for expressing relationship were $413 $i $j $e The consensus was to focus on $4 due13 to the existing13 subfield having13 been defined in all those13 fields where13 relationships can currently be13 expressed in MARC The rescoping of $4 to hold URIproperty (predicate) does not prevent the13 library communityrsquos continued application of 3-letter relator codes13 It provides an opportunity for libraries which are ready to deploy HTTP URI for13 relationship (propertypredicate) Consensus was that $4 alone should13 be redefined to carry relationship URIs this was considered a consistent13 and across the board solution which would not require further amendments to the MARC formats by rescoping or defining additional subfields

4) Identified a need to host identifier for real13 world object13 The TG hoped to propose setting aside $1 for13 identifier13 that13 points to THING (URI resourceRWO)

5) TG Members who work closely with other standards communities such as ISNIVIAF have vested interests in the 024 in Authority13 The 024 field appears to possess the13 potential of addressing relationship of13 an entity across vocabulariesontologies [1]13 13 13

The TG hopes to address above items no 4-shy‐613 in discussion papers for MARC13 Advisory Committee (MAC)to consider

The Pilot Test that the TG conducted13 in13 February-shy‐March 2016 revealed that provisioning for URIs inMARC presents additional layers of complexity that require further consideration ie repeatabilitypairing ambiguous relationships and13 significance of the ordinal sequence Additionally the TG isworking hard further identifying potential field andor indicatorsubfield to record identifierrepresenting a Work a resource object These are described in sections below in greater13 details

III PROCESSES

The TG had in mind processes that would be the least disruptive yet with the most promising results Inorder to13 ensure cohesive and13 broad13 approaches the TG set forth13 the tasks a) define and13 understand13 uniform resource identifier and13 the deployment of the Web-shy‐service protocol scheme HTTP b) identify

Page13 3 of 22

issues and problems with adding URI13 in MARC13 Is it actually doable in current system that hosts MARCdata

III1 Define and understand HTTP URI [Charge 11 12 Charge 43]

According to13 a MARBI position13 paper published in 2009

The use of a URI instead13 of plain13 text is particularly applicable to13 situations where the value of thehellipelement13 comes from a controlled vocabulary which could be an authority list13 or13 formal thesaurus (eg a name from the LC Name Authority File or13 a topic for13 an LCSH heading)13 or13 any other list of controlled13 codes or terms (eg the MARC13 Code List for Languages)

However the goal of facilitating the transition from MARC to linked data now requires a more precisemachine understanding of the data accessible13 from the13 URIs that have13 been added to MARC records

The issue can be illustrated with an excerpt from the Library of Congress Name Authority record forHillary Clinton accessible at httpslccnlocgovn93010903 Of particular interest is the list of 024fields which identify ldquostandard number[s] or13 code[s] associated with the entity named in the 1xx fieldwhich cannot be accommodated in another fieldrdquo according to the MARC Authority13 definition All of the02 fields copied below contain URIs pertaining to Hillary Clinton

0247_ |a httpwwwwikidataorgentityQ6294 |2 uri 0247_ |a httpdbpediaorgresourceHillary_Rodham_Clinton |2 uri 0247_ |a httpviaforgviaf54950123 |2 uri 0247_ |a httpisniorgisni0000000122802598 |2 uri 0247_ |a httpd-nbinfognd119082101 |2 uri 0247_ |a httpidndlgojpauthndlna00552567 |2 uri 0247_ |a httpautnkpczjn20000700317 |2 uri 0247_ |a httpcataloguebnffrark12148cb12543158f |2 uri 0247_ |a httpwwwidreffr034705171 |2 uri 0247_ |a httpdatosbneesresourceXX1725857 |2 uri 0247_ |a httpidsbnitafIT5CICCU5CUBOV5C804461 |2 uri 0247_ |a httpcanticbnccatregistresCUCIda11695705 |2 uri 0247_ |a httpsmusicbrainzorgartist858a3d95-e1b2-4aac-8427-

a99e391ce8c5 |2 uri 0247_ |a httpwwwimdbcomnamenm0166921 |2 uri 0247_ |a httpbioguidecongressgovscriptsbiodisplayplindex=C001041 |2

uri 0247_ |a httpwwwnndbcompeople022000025944 |2 uri 0247_ |a httpsballotpediaorgHillary_Clinton |2 uri 0247_ |a httpswwwfreebasecomm0d06m5 |2 uri

The rows in the table can be partitioned into three categories

bull Near the bottom the 024 fields with the peach-shy‐colored background are13 human-shy‐readabledocuments about Hillary Clinton These are pages from popular resources maintainedoutside the library community such13 as IMDB and BioGuide which have13 been deemed

Page13 4 of 22

authoritative13 by library catalogers and authority experts In shorthand these13 URIs are13 standard URLs13 for Web pages

bull The rows with the blue background are13 records are13 derived from library authority files andmore modern registries designed for similar purposes They may be pages from13 libraryauthority files published on the13 Web human-shy‐readable views of13 machine-shy‐understandableRDF data or raw RDF But in one form or another all of the URIs resolve to13 libraryauthorities (or simply lsquoAuthoritiesrsquo) that are13 about Hillary Clinton The TG refer13 to these URIsas Authority URIs

bull The rows with the green background contain URIs13 that refer to Hillary13 Clinton directly in away that is technically distinct from documents about her These URIs conform to linkeddata conventions described13 in13 standard13 Web13 documents such13 as ldquoCool URIs for the SemanticWebrdquo [httpswwww3orgTRcooluris] The data accessible from these URIs13 has13 beenpublished13 by third13 parties as well as the library community and13 encodes a rich13 domainmodel designed expressly for machine understanding13 The TG refer13 to these URIs as Entityor Thing URIs

According to13 linked13 data conventions machine processes designed13 to13 construct meaningful statementsand inferences from them require13 Thing URIs When Thing URIs are13 defined for people13 and creative13 works one desirable outcome would be a machine-shy‐understandable statement such13 as lsquoHillary Clinton13 isthe author13 of13 the book It Takes a Villagersquo With technology available in 2016 data accessible from Webpage URIs may not be machine-shy‐understandable at all and13 Authority URIs may only be partiallyunderstandable The ambiguity of URIs illustrated13 by the 02413 fields in the13 MARC Authority records is alsopresent in13 MARC13 bibliographic records

III2 Identify issuesproblems with adding URIs whether it was actually doable [Charge1 Charge 3]

A pilot test of inserting HTTP URI in13 $0 in13 bibliographic and13 authority data13 emerged as one13 logical first step for the TG It helped the TG understand issues that could easily resolve in the near term and the dob ability of inserting URI in $0 in MARC environment

The Pilot Test began in February 2016 Members prepared13 sets of input data and13 worked13 with13 tool creators13 (MarcEdit and Authority13 Toolkit) to refine lookup algorithms13 for URI insertion in $0

The enhanced data13 with HTTP13 URIs embedded were to be ingested to several integrated library systems for13 evaluation13 This exercise assisted the TG gaining a cohesive understanding of the role of an identifier in the form of dereferenceable URI13 deployed in $0 in MARC environment13

Throughout the process the TG began to frame the questions that might assists in the effort in transitioning MARC data to linked data Including reached possible resolutions where potential problems may reside Such13 as planning for MAC13 proposals in13 its first year

Issues that were more long-shy‐term and may require in-shy‐depth13 discussions from broader community involvements for instance subfields such as $4 which have been defined might have the potential13 to

Page13 5 of 22

hold13 HTTP URI The repeatability and13 ambiguity and13 significance of the ordinal sequence are less13 trivialand complex

In regards to bulk processing of insertion system performance and scalability the Pilot Test also helpedaddressed SPARQL query adjustment on the13 server side Though URIs added by hand was the13 leastdesirable exercise which13 could13 be inevitable the TG also13 began13 documenting resources that13 would assist13 such endeavor

The overall strategies that the TG adopted were carefully thought-shy‐out in13 order to13 achieve iterativesuccess13 that will build confidence throughout phases13 of implementation

IV PROBLEM STATEMENT

To encode data13 suitable for transformation13 into13 RDF triples it is necessary to13 be able to13 identify in13 MARC13 the data elements corresponding to the subject predicate and object13 in each statement13 andor13 to provide URIs for them It quickly became apparent that the task is not simply to13 add13 subfields to13 allow URIs to be given -shy‐ itself a non-shy‐trivial problem given the limited number13 of13 unused subfield still available in MARC -shy‐ but also13 to13 negotiate the often13 ambiguous semantics of MARC The TG has sought to do this13 through a judicious combination13 of redefinition13 proposals clarification13 of existing semantics and13 best practice recommendations

Best practices for incorporating HTTP URIs in13 MARC13 BIB13 and13 Authority records without making major renovations to MARC format13 (taking into consideration costbenefit13 analysis for13 an lsquoend of13 lifersquo technology)

IV 1 Where to place URIs in the MARC structure ($0 $4) [Charge 3]

The TG developed13 a pilot to13 examine the issues surrounding the issues of adding identifiers to13 MARC13 21 data The work included13 the identification13 of actionable source vocabularies and13 creating test record13 sets13 with dereferenceable URIs13 embedded in the data A variety of13 formats were represented in the test13 data and13 ILS vendors programmers system engineers and13 discovery designers were consulted13 throughout13 the pilot13 to comment13 on the retrieval of13 actionable URIs and the appropriate policies ensuring13 the13 data13 are actionable13 in MARC 21 data

The TG also inventoried the13 MARC bibliographic and authority formats to identify MARC 21 fields that contain subfields13 capable of accommodating URIs In the bibliographic13 formats13 subfields13 $0 and $4 were identified as existing candidates13 for containing URIs subfields13 $0 and $4 were candidates13 in the authority format MARC 21 fields that might usefully contain a subfield for a URI but which do not have13 one defined13 were also13 noted

The TG focused on subfield $0 and $4 for13 its first13 three MARC Discussion papers submitted in to MAC at13 ALA13 Annual 2016

Page13 6 of 22

IV2 What difficulties are evidenced

IV2113 Adding multiple $0 [Charge 12]

The nature and use of subfield $0 has evolved in MARC since the subfield was13 first implemented in 2007 In 2010 it was redefined and came to include standard numbers including URIs in addition to its original use for authority record13 control numbers

However MARC is not specific as to which parts of a controlled heading string correspond to the $0 Nothing in the MARC specification rules out one $0 subfield applying to one set of subfields in a heading while a different $0 applies to others (To ameliorate this problem we formed13 a MARC13 objectURI reconciliation subgroup to enumerate the subfields naming the object13 in each MARC field b see IV22 below) And because $0 is repeatable it is possible to13 find13 multiple $0 values corresponding to13 the same heading13 subfields naming13 the13 same13 entity Indeed the13 latter practice13 is adopted by design in some13 implementations notably that at the German National13 Library13

The existence of different use cases and practices for relating headings to $0 has emerged as an issue13 that13 will13 need to be considered as the TGrsquos work proceeds In the case of OCLCrsquos heading control functionality related to LC names and LCSH subfield $0 data is included as an XML tag attribute in each subfield XML tag covered by a particular authority record13 and13 is repeated13 as many times as needed13 depending o the number of subfields used13 to13 represent the name or subject In the subsequent development of controlling for other authority files the same approach13 has been13 taken but instead13 retaining13 the13 same13 or different authority13 record control numbers in multiple13 $0 subfields [See examples at end of document]

This repeated use of $0 subfields containing the same authority record number or different authority record numbers for13 different13 parts of13 a heading runs contrary to the need that exists in an OCLC context of a single URI corresponding to13 the entire named13 entity given13 in13 the field Extraneous $0 subfields are automatically deleted in WorldCat records in fields that are13 otherwise13 controlled to a particular authority file However this leaves unresolved the question of controlling via multiple source vocabularies within the same language of cataloging13 which many13 see as a desirable mediumb toblongb term objective Given the investment13 in its development13 and the number13 of13 controlled headings in WorldCat completely changing the heading control functionality within WorldCat is not feasible so the TG and OCLC staff have13 considered other alternatives allowing for output of needed URIs in the13 format which libraries would prefer in the13 future

IV2213 How to identify a RDF object in a MARC datafield [Charge 43]

This emerged as an important need because the ability to identify a URI with its corresponding label is necessary to13 support both13 reconciliation13 of existing data and13 updates to13 those labels based13 o their association with an identifier The13 only realistic way to make this identification was to document the correspondences13 on a fieldb byb field basis Fortunately this was very achievable for13 the majority of13 fields in widespread use13 [Link to recommendations The investigation revealed a number of issues relating to

Page13 7 of 22

the identification of13 single entities vs larger13 sets (series conferences)13 and alignment13 of13 MARC and RDAvocabularies

IV2313 What did we find in identifying relationshipsmultiple relationships [Charge 41]

IV23113 Relationships are expressed in MARC by a variety of means includingIV231113 Field tagging either alone eg13 830 or in combination with

indicators eg13 780785 IV231213 Subfield codes eg13 041IV231313 Codes given in subfields eg13 700 $4IV231413 Controlled or natural13 language text given in subfields eg13 700 $i

IV23213 Some of13 these fields are very tightly bound to legacy MARC definitionsstructures and data Redesigning 041 for example to be hospitable toURIs would require a complete reconception of that field

IV23313 There is the greatest value in13 provisioning for URIs following a 7XX$4$0$113 pattern with $4 repurposed to house13 URIs much as $0 now does This approach13 seems13 to present a relatively low barrier to13 implementation13 while having widespread application in MARC

IV23413 Multiple relationships can cause ambiguity where they are associated withmultiple objects or multiple labels In such cases we recommend theexpedient of simply repeating13 the13 field in order to make13 the13 associationsunambiguous

IV2413 How one obtains URIs for various data sources depends on the linked13 data source (different data sources13 avail their URIs13 differently) and interoperability between the data source and the cataloging tools being used

T help13 support obtaining13 the right URIs for the its purposes in13 MARC the TG has begun13 a document currently referred to as13 Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources For commonly used vocabularies in MARC we want to document where in the data source UI one can find the canonical URIs that when dereferenced provides13 data Going forward for each entry13 in the13 document we13 want to explain whether a data source13 publishes their data13 as Authorities Real World Objects or both Also we want to document methods available for machine access to13 the data Is the data13 published13 as Linked13 Data13 available through13 http available through13 a SPARQL13 endpoint data dumps etc

IV24113 MarcEdit [Charge 3]

In the summer of 2014 MarcEdit introduced a suite of tools designed to begin testing the feasibility of embedding13 linked data13 concepts into MARC records Initially the scope of the suite was limited to embedding13 HTTP13 URI in the13 $0 in MARC fields 1xx 6xx 7xx in bibliographic records This initial work focused on integration with the US Library of13 Congressrsquos idlocgov service as well as OCLCrsquos VIAF services13 for resolution However over the past 2 years and in response to many of the questions and issues surfaced through the TG the Linking services have been expanded and revised to potentially support all use-shy‐cases13 identified by13 this13 Task13 Force as13 well as13 providing support for non-shy‐MARC21 users to configure the Linking tool for13 use with other13 MARC formats

Page13 8 of 22

The MarcEdit Linking toolkit13 currently supports the generation of13 URIs for13 all identified fields by this Task Force for authority and bibliographic records The application utilizes a rules file that documents field processing and service configuration values This allows MarcEdit13 to quickly make changes to the rules governing field processing as well as adding support13 for13 new collections and linked data endpoints As of this report (9212016) the MarcEdit Linked13 Data tool support resolution13 against the following linked data13 services

1 US Library of Congraess NAF2 US Library of Congress LCSH3 US Library of Congress Childrenrsquos Subject Headings4 US Library of Congress Demographic Group Terms5 Thesaurus for Graphic Materials6 US Library of Congress GenreForm Terms7 US Library13 of Congress Medium of Performance Thesaurus for Music8 RDA13 Carrier Types9 RDA13 Media Types10 RDA13 Content Types11 Getty Arts and Architecture Thesaurus12 Getty ULAN13 National Library of Medicine MESH14 OCLC FAST Headings15 OCLC VIAF16 German National Library (GND)17 [15 national library name indexes via VIAF]18 Japanese Diet13 Library

Additionally users have the ability to13 configure their own13 linked13 data endpoints for use with13 MarcEdit so long as13 the service in question supports13 SPARQL and json There is presently a knowledge-base article13 at httpmarceditreesetneteditingb marceditsb linkedb datab rulesb file documenting how users can both add new collections13 or modify13 the rules13 used13 when13 processing a particular field

Essentially MarcEdit utilizes13 its13 rules13 file to configure MarcEditrsquos13 linked data platform to identify the proper indexservice normalization13 (for data query purposes) and13 subfields to13 utilize as part of any look up13 process Additionally each13 rules block identifies when13 a field13 should13 be processed13 (ie only when13 used13 in13 a bibliographic record used13 in13 an13 authority record or both) For example here13 is the13 definition for13 the 650 field

ltfield type=bibliographicgt lttaggt650lttaggt ltsubfieldsgtabvxyzltsubfieldsgt ltind213 value=0 vocab=lcshgt ltind213 value=1 vocab=lcshacgt

Page13 9 of 22

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 3: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

1) Recognized13 that there were not possible acrossb theb board13 simple solutions for MARC13 fields concerning $0 Therefore the TG pushed13 forward13 the fields that could13 benefit $0 without complications See MAC Paper 2016b DP19 in REFERENCE Section

2) Agreed13 upon the universal definition13 of $0 for URI that describes THING (URIConcept) Keeping in line with the overall13 principle of least disruption and most coverage the TG recommended the use of HTTP URI in $0 as default URI for libraries which opt to adopt URI in $0 Alternatively a text string identifier in $0 to remain in force for13 libraries which are not13 ready to move forward See MAC Paper 2016b DP18 in REFERENCE Section

3) Agreed13 that the relationship13 entity of an13 RDF statement be13 represented in MARC Potential candidates13 for expressing relationship were $413 $i $j $e The consensus was to focus on $4 due13 to the existing13 subfield having13 been defined in all those13 fields where13 relationships can currently be13 expressed in MARC The rescoping of $4 to hold URIproperty (predicate) does not prevent the13 library communityrsquos continued application of 3-letter relator codes13 It provides an opportunity for libraries which are ready to deploy HTTP URI for13 relationship (propertypredicate) Consensus was that $4 alone should13 be redefined to carry relationship URIs this was considered a consistent13 and across the board solution which would not require further amendments to the MARC formats by rescoping or defining additional subfields

4) Identified a need to host identifier for real13 world object13 The TG hoped to propose setting aside $1 for13 identifier13 that13 points to THING (URI resourceRWO)

5) TG Members who work closely with other standards communities such as ISNIVIAF have vested interests in the 024 in Authority13 The 024 field appears to possess the13 potential of addressing relationship of13 an entity across vocabulariesontologies [1]13 13 13

The TG hopes to address above items no 4-shy‐613 in discussion papers for MARC13 Advisory Committee (MAC)to consider

The Pilot Test that the TG conducted13 in13 February-shy‐March 2016 revealed that provisioning for URIs inMARC presents additional layers of complexity that require further consideration ie repeatabilitypairing ambiguous relationships and13 significance of the ordinal sequence Additionally the TG isworking hard further identifying potential field andor indicatorsubfield to record identifierrepresenting a Work a resource object These are described in sections below in greater13 details

III PROCESSES

The TG had in mind processes that would be the least disruptive yet with the most promising results Inorder to13 ensure cohesive and13 broad13 approaches the TG set forth13 the tasks a) define and13 understand13 uniform resource identifier and13 the deployment of the Web-shy‐service protocol scheme HTTP b) identify

Page13 3 of 22

issues and problems with adding URI13 in MARC13 Is it actually doable in current system that hosts MARCdata

III1 Define and understand HTTP URI [Charge 11 12 Charge 43]

According to13 a MARBI position13 paper published in 2009

The use of a URI instead13 of plain13 text is particularly applicable to13 situations where the value of thehellipelement13 comes from a controlled vocabulary which could be an authority list13 or13 formal thesaurus (eg a name from the LC Name Authority File or13 a topic for13 an LCSH heading)13 or13 any other list of controlled13 codes or terms (eg the MARC13 Code List for Languages)

However the goal of facilitating the transition from MARC to linked data now requires a more precisemachine understanding of the data accessible13 from the13 URIs that have13 been added to MARC records

The issue can be illustrated with an excerpt from the Library of Congress Name Authority record forHillary Clinton accessible at httpslccnlocgovn93010903 Of particular interest is the list of 024fields which identify ldquostandard number[s] or13 code[s] associated with the entity named in the 1xx fieldwhich cannot be accommodated in another fieldrdquo according to the MARC Authority13 definition All of the02 fields copied below contain URIs pertaining to Hillary Clinton

0247_ |a httpwwwwikidataorgentityQ6294 |2 uri 0247_ |a httpdbpediaorgresourceHillary_Rodham_Clinton |2 uri 0247_ |a httpviaforgviaf54950123 |2 uri 0247_ |a httpisniorgisni0000000122802598 |2 uri 0247_ |a httpd-nbinfognd119082101 |2 uri 0247_ |a httpidndlgojpauthndlna00552567 |2 uri 0247_ |a httpautnkpczjn20000700317 |2 uri 0247_ |a httpcataloguebnffrark12148cb12543158f |2 uri 0247_ |a httpwwwidreffr034705171 |2 uri 0247_ |a httpdatosbneesresourceXX1725857 |2 uri 0247_ |a httpidsbnitafIT5CICCU5CUBOV5C804461 |2 uri 0247_ |a httpcanticbnccatregistresCUCIda11695705 |2 uri 0247_ |a httpsmusicbrainzorgartist858a3d95-e1b2-4aac-8427-

a99e391ce8c5 |2 uri 0247_ |a httpwwwimdbcomnamenm0166921 |2 uri 0247_ |a httpbioguidecongressgovscriptsbiodisplayplindex=C001041 |2

uri 0247_ |a httpwwwnndbcompeople022000025944 |2 uri 0247_ |a httpsballotpediaorgHillary_Clinton |2 uri 0247_ |a httpswwwfreebasecomm0d06m5 |2 uri

The rows in the table can be partitioned into three categories

bull Near the bottom the 024 fields with the peach-shy‐colored background are13 human-shy‐readabledocuments about Hillary Clinton These are pages from popular resources maintainedoutside the library community such13 as IMDB and BioGuide which have13 been deemed

Page13 4 of 22

authoritative13 by library catalogers and authority experts In shorthand these13 URIs are13 standard URLs13 for Web pages

bull The rows with the blue background are13 records are13 derived from library authority files andmore modern registries designed for similar purposes They may be pages from13 libraryauthority files published on the13 Web human-shy‐readable views of13 machine-shy‐understandableRDF data or raw RDF But in one form or another all of the URIs resolve to13 libraryauthorities (or simply lsquoAuthoritiesrsquo) that are13 about Hillary Clinton The TG refer13 to these URIsas Authority URIs

bull The rows with the green background contain URIs13 that refer to Hillary13 Clinton directly in away that is technically distinct from documents about her These URIs conform to linkeddata conventions described13 in13 standard13 Web13 documents such13 as ldquoCool URIs for the SemanticWebrdquo [httpswwww3orgTRcooluris] The data accessible from these URIs13 has13 beenpublished13 by third13 parties as well as the library community and13 encodes a rich13 domainmodel designed expressly for machine understanding13 The TG refer13 to these URIs as Entityor Thing URIs

According to13 linked13 data conventions machine processes designed13 to13 construct meaningful statementsand inferences from them require13 Thing URIs When Thing URIs are13 defined for people13 and creative13 works one desirable outcome would be a machine-shy‐understandable statement such13 as lsquoHillary Clinton13 isthe author13 of13 the book It Takes a Villagersquo With technology available in 2016 data accessible from Webpage URIs may not be machine-shy‐understandable at all and13 Authority URIs may only be partiallyunderstandable The ambiguity of URIs illustrated13 by the 02413 fields in the13 MARC Authority records is alsopresent in13 MARC13 bibliographic records

III2 Identify issuesproblems with adding URIs whether it was actually doable [Charge1 Charge 3]

A pilot test of inserting HTTP URI in13 $0 in13 bibliographic and13 authority data13 emerged as one13 logical first step for the TG It helped the TG understand issues that could easily resolve in the near term and the dob ability of inserting URI in $0 in MARC environment

The Pilot Test began in February 2016 Members prepared13 sets of input data and13 worked13 with13 tool creators13 (MarcEdit and Authority13 Toolkit) to refine lookup algorithms13 for URI insertion in $0

The enhanced data13 with HTTP13 URIs embedded were to be ingested to several integrated library systems for13 evaluation13 This exercise assisted the TG gaining a cohesive understanding of the role of an identifier in the form of dereferenceable URI13 deployed in $0 in MARC environment13

Throughout the process the TG began to frame the questions that might assists in the effort in transitioning MARC data to linked data Including reached possible resolutions where potential problems may reside Such13 as planning for MAC13 proposals in13 its first year

Issues that were more long-shy‐term and may require in-shy‐depth13 discussions from broader community involvements for instance subfields such as $4 which have been defined might have the potential13 to

Page13 5 of 22

hold13 HTTP URI The repeatability and13 ambiguity and13 significance of the ordinal sequence are less13 trivialand complex

In regards to bulk processing of insertion system performance and scalability the Pilot Test also helpedaddressed SPARQL query adjustment on the13 server side Though URIs added by hand was the13 leastdesirable exercise which13 could13 be inevitable the TG also13 began13 documenting resources that13 would assist13 such endeavor

The overall strategies that the TG adopted were carefully thought-shy‐out in13 order to13 achieve iterativesuccess13 that will build confidence throughout phases13 of implementation

IV PROBLEM STATEMENT

To encode data13 suitable for transformation13 into13 RDF triples it is necessary to13 be able to13 identify in13 MARC13 the data elements corresponding to the subject predicate and object13 in each statement13 andor13 to provide URIs for them It quickly became apparent that the task is not simply to13 add13 subfields to13 allow URIs to be given -shy‐ itself a non-shy‐trivial problem given the limited number13 of13 unused subfield still available in MARC -shy‐ but also13 to13 negotiate the often13 ambiguous semantics of MARC The TG has sought to do this13 through a judicious combination13 of redefinition13 proposals clarification13 of existing semantics and13 best practice recommendations

Best practices for incorporating HTTP URIs in13 MARC13 BIB13 and13 Authority records without making major renovations to MARC format13 (taking into consideration costbenefit13 analysis for13 an lsquoend of13 lifersquo technology)

IV 1 Where to place URIs in the MARC structure ($0 $4) [Charge 3]

The TG developed13 a pilot to13 examine the issues surrounding the issues of adding identifiers to13 MARC13 21 data The work included13 the identification13 of actionable source vocabularies and13 creating test record13 sets13 with dereferenceable URIs13 embedded in the data A variety of13 formats were represented in the test13 data and13 ILS vendors programmers system engineers and13 discovery designers were consulted13 throughout13 the pilot13 to comment13 on the retrieval of13 actionable URIs and the appropriate policies ensuring13 the13 data13 are actionable13 in MARC 21 data

The TG also inventoried the13 MARC bibliographic and authority formats to identify MARC 21 fields that contain subfields13 capable of accommodating URIs In the bibliographic13 formats13 subfields13 $0 and $4 were identified as existing candidates13 for containing URIs subfields13 $0 and $4 were candidates13 in the authority format MARC 21 fields that might usefully contain a subfield for a URI but which do not have13 one defined13 were also13 noted

The TG focused on subfield $0 and $4 for13 its first13 three MARC Discussion papers submitted in to MAC at13 ALA13 Annual 2016

Page13 6 of 22

IV2 What difficulties are evidenced

IV2113 Adding multiple $0 [Charge 12]

The nature and use of subfield $0 has evolved in MARC since the subfield was13 first implemented in 2007 In 2010 it was redefined and came to include standard numbers including URIs in addition to its original use for authority record13 control numbers

However MARC is not specific as to which parts of a controlled heading string correspond to the $0 Nothing in the MARC specification rules out one $0 subfield applying to one set of subfields in a heading while a different $0 applies to others (To ameliorate this problem we formed13 a MARC13 objectURI reconciliation subgroup to enumerate the subfields naming the object13 in each MARC field b see IV22 below) And because $0 is repeatable it is possible to13 find13 multiple $0 values corresponding to13 the same heading13 subfields naming13 the13 same13 entity Indeed the13 latter practice13 is adopted by design in some13 implementations notably that at the German National13 Library13

The existence of different use cases and practices for relating headings to $0 has emerged as an issue13 that13 will13 need to be considered as the TGrsquos work proceeds In the case of OCLCrsquos heading control functionality related to LC names and LCSH subfield $0 data is included as an XML tag attribute in each subfield XML tag covered by a particular authority record13 and13 is repeated13 as many times as needed13 depending o the number of subfields used13 to13 represent the name or subject In the subsequent development of controlling for other authority files the same approach13 has been13 taken but instead13 retaining13 the13 same13 or different authority13 record control numbers in multiple13 $0 subfields [See examples at end of document]

This repeated use of $0 subfields containing the same authority record number or different authority record numbers for13 different13 parts of13 a heading runs contrary to the need that exists in an OCLC context of a single URI corresponding to13 the entire named13 entity given13 in13 the field Extraneous $0 subfields are automatically deleted in WorldCat records in fields that are13 otherwise13 controlled to a particular authority file However this leaves unresolved the question of controlling via multiple source vocabularies within the same language of cataloging13 which many13 see as a desirable mediumb toblongb term objective Given the investment13 in its development13 and the number13 of13 controlled headings in WorldCat completely changing the heading control functionality within WorldCat is not feasible so the TG and OCLC staff have13 considered other alternatives allowing for output of needed URIs in the13 format which libraries would prefer in the13 future

IV2213 How to identify a RDF object in a MARC datafield [Charge 43]

This emerged as an important need because the ability to identify a URI with its corresponding label is necessary to13 support both13 reconciliation13 of existing data and13 updates to13 those labels based13 o their association with an identifier The13 only realistic way to make this identification was to document the correspondences13 on a fieldb byb field basis Fortunately this was very achievable for13 the majority of13 fields in widespread use13 [Link to recommendations The investigation revealed a number of issues relating to

Page13 7 of 22

the identification of13 single entities vs larger13 sets (series conferences)13 and alignment13 of13 MARC and RDAvocabularies

IV2313 What did we find in identifying relationshipsmultiple relationships [Charge 41]

IV23113 Relationships are expressed in MARC by a variety of means includingIV231113 Field tagging either alone eg13 830 or in combination with

indicators eg13 780785 IV231213 Subfield codes eg13 041IV231313 Codes given in subfields eg13 700 $4IV231413 Controlled or natural13 language text given in subfields eg13 700 $i

IV23213 Some of13 these fields are very tightly bound to legacy MARC definitionsstructures and data Redesigning 041 for example to be hospitable toURIs would require a complete reconception of that field

IV23313 There is the greatest value in13 provisioning for URIs following a 7XX$4$0$113 pattern with $4 repurposed to house13 URIs much as $0 now does This approach13 seems13 to present a relatively low barrier to13 implementation13 while having widespread application in MARC

IV23413 Multiple relationships can cause ambiguity where they are associated withmultiple objects or multiple labels In such cases we recommend theexpedient of simply repeating13 the13 field in order to make13 the13 associationsunambiguous

IV2413 How one obtains URIs for various data sources depends on the linked13 data source (different data sources13 avail their URIs13 differently) and interoperability between the data source and the cataloging tools being used

T help13 support obtaining13 the right URIs for the its purposes in13 MARC the TG has begun13 a document currently referred to as13 Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources For commonly used vocabularies in MARC we want to document where in the data source UI one can find the canonical URIs that when dereferenced provides13 data Going forward for each entry13 in the13 document we13 want to explain whether a data source13 publishes their data13 as Authorities Real World Objects or both Also we want to document methods available for machine access to13 the data Is the data13 published13 as Linked13 Data13 available through13 http available through13 a SPARQL13 endpoint data dumps etc

IV24113 MarcEdit [Charge 3]

In the summer of 2014 MarcEdit introduced a suite of tools designed to begin testing the feasibility of embedding13 linked data13 concepts into MARC records Initially the scope of the suite was limited to embedding13 HTTP13 URI in the13 $0 in MARC fields 1xx 6xx 7xx in bibliographic records This initial work focused on integration with the US Library of13 Congressrsquos idlocgov service as well as OCLCrsquos VIAF services13 for resolution However over the past 2 years and in response to many of the questions and issues surfaced through the TG the Linking services have been expanded and revised to potentially support all use-shy‐cases13 identified by13 this13 Task13 Force as13 well as13 providing support for non-shy‐MARC21 users to configure the Linking tool for13 use with other13 MARC formats

Page13 8 of 22

The MarcEdit Linking toolkit13 currently supports the generation of13 URIs for13 all identified fields by this Task Force for authority and bibliographic records The application utilizes a rules file that documents field processing and service configuration values This allows MarcEdit13 to quickly make changes to the rules governing field processing as well as adding support13 for13 new collections and linked data endpoints As of this report (9212016) the MarcEdit Linked13 Data tool support resolution13 against the following linked data13 services

1 US Library of Congraess NAF2 US Library of Congress LCSH3 US Library of Congress Childrenrsquos Subject Headings4 US Library of Congress Demographic Group Terms5 Thesaurus for Graphic Materials6 US Library of Congress GenreForm Terms7 US Library13 of Congress Medium of Performance Thesaurus for Music8 RDA13 Carrier Types9 RDA13 Media Types10 RDA13 Content Types11 Getty Arts and Architecture Thesaurus12 Getty ULAN13 National Library of Medicine MESH14 OCLC FAST Headings15 OCLC VIAF16 German National Library (GND)17 [15 national library name indexes via VIAF]18 Japanese Diet13 Library

Additionally users have the ability to13 configure their own13 linked13 data endpoints for use with13 MarcEdit so long as13 the service in question supports13 SPARQL and json There is presently a knowledge-base article13 at httpmarceditreesetneteditingb marceditsb linkedb datab rulesb file documenting how users can both add new collections13 or modify13 the rules13 used13 when13 processing a particular field

Essentially MarcEdit utilizes13 its13 rules13 file to configure MarcEditrsquos13 linked data platform to identify the proper indexservice normalization13 (for data query purposes) and13 subfields to13 utilize as part of any look up13 process Additionally each13 rules block identifies when13 a field13 should13 be processed13 (ie only when13 used13 in13 a bibliographic record used13 in13 an13 authority record or both) For example here13 is the13 definition for13 the 650 field

ltfield type=bibliographicgt lttaggt650lttaggt ltsubfieldsgtabvxyzltsubfieldsgt ltind213 value=0 vocab=lcshgt ltind213 value=1 vocab=lcshacgt

Page13 9 of 22

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 4: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

issues and problems with adding URI13 in MARC13 Is it actually doable in current system that hosts MARCdata

III1 Define and understand HTTP URI [Charge 11 12 Charge 43]

According to13 a MARBI position13 paper published in 2009

The use of a URI instead13 of plain13 text is particularly applicable to13 situations where the value of thehellipelement13 comes from a controlled vocabulary which could be an authority list13 or13 formal thesaurus (eg a name from the LC Name Authority File or13 a topic for13 an LCSH heading)13 or13 any other list of controlled13 codes or terms (eg the MARC13 Code List for Languages)

However the goal of facilitating the transition from MARC to linked data now requires a more precisemachine understanding of the data accessible13 from the13 URIs that have13 been added to MARC records

The issue can be illustrated with an excerpt from the Library of Congress Name Authority record forHillary Clinton accessible at httpslccnlocgovn93010903 Of particular interest is the list of 024fields which identify ldquostandard number[s] or13 code[s] associated with the entity named in the 1xx fieldwhich cannot be accommodated in another fieldrdquo according to the MARC Authority13 definition All of the02 fields copied below contain URIs pertaining to Hillary Clinton

0247_ |a httpwwwwikidataorgentityQ6294 |2 uri 0247_ |a httpdbpediaorgresourceHillary_Rodham_Clinton |2 uri 0247_ |a httpviaforgviaf54950123 |2 uri 0247_ |a httpisniorgisni0000000122802598 |2 uri 0247_ |a httpd-nbinfognd119082101 |2 uri 0247_ |a httpidndlgojpauthndlna00552567 |2 uri 0247_ |a httpautnkpczjn20000700317 |2 uri 0247_ |a httpcataloguebnffrark12148cb12543158f |2 uri 0247_ |a httpwwwidreffr034705171 |2 uri 0247_ |a httpdatosbneesresourceXX1725857 |2 uri 0247_ |a httpidsbnitafIT5CICCU5CUBOV5C804461 |2 uri 0247_ |a httpcanticbnccatregistresCUCIda11695705 |2 uri 0247_ |a httpsmusicbrainzorgartist858a3d95-e1b2-4aac-8427-

a99e391ce8c5 |2 uri 0247_ |a httpwwwimdbcomnamenm0166921 |2 uri 0247_ |a httpbioguidecongressgovscriptsbiodisplayplindex=C001041 |2

uri 0247_ |a httpwwwnndbcompeople022000025944 |2 uri 0247_ |a httpsballotpediaorgHillary_Clinton |2 uri 0247_ |a httpswwwfreebasecomm0d06m5 |2 uri

The rows in the table can be partitioned into three categories

bull Near the bottom the 024 fields with the peach-shy‐colored background are13 human-shy‐readabledocuments about Hillary Clinton These are pages from popular resources maintainedoutside the library community such13 as IMDB and BioGuide which have13 been deemed

Page13 4 of 22

authoritative13 by library catalogers and authority experts In shorthand these13 URIs are13 standard URLs13 for Web pages

bull The rows with the blue background are13 records are13 derived from library authority files andmore modern registries designed for similar purposes They may be pages from13 libraryauthority files published on the13 Web human-shy‐readable views of13 machine-shy‐understandableRDF data or raw RDF But in one form or another all of the URIs resolve to13 libraryauthorities (or simply lsquoAuthoritiesrsquo) that are13 about Hillary Clinton The TG refer13 to these URIsas Authority URIs

bull The rows with the green background contain URIs13 that refer to Hillary13 Clinton directly in away that is technically distinct from documents about her These URIs conform to linkeddata conventions described13 in13 standard13 Web13 documents such13 as ldquoCool URIs for the SemanticWebrdquo [httpswwww3orgTRcooluris] The data accessible from these URIs13 has13 beenpublished13 by third13 parties as well as the library community and13 encodes a rich13 domainmodel designed expressly for machine understanding13 The TG refer13 to these URIs as Entityor Thing URIs

According to13 linked13 data conventions machine processes designed13 to13 construct meaningful statementsand inferences from them require13 Thing URIs When Thing URIs are13 defined for people13 and creative13 works one desirable outcome would be a machine-shy‐understandable statement such13 as lsquoHillary Clinton13 isthe author13 of13 the book It Takes a Villagersquo With technology available in 2016 data accessible from Webpage URIs may not be machine-shy‐understandable at all and13 Authority URIs may only be partiallyunderstandable The ambiguity of URIs illustrated13 by the 02413 fields in the13 MARC Authority records is alsopresent in13 MARC13 bibliographic records

III2 Identify issuesproblems with adding URIs whether it was actually doable [Charge1 Charge 3]

A pilot test of inserting HTTP URI in13 $0 in13 bibliographic and13 authority data13 emerged as one13 logical first step for the TG It helped the TG understand issues that could easily resolve in the near term and the dob ability of inserting URI in $0 in MARC environment

The Pilot Test began in February 2016 Members prepared13 sets of input data and13 worked13 with13 tool creators13 (MarcEdit and Authority13 Toolkit) to refine lookup algorithms13 for URI insertion in $0

The enhanced data13 with HTTP13 URIs embedded were to be ingested to several integrated library systems for13 evaluation13 This exercise assisted the TG gaining a cohesive understanding of the role of an identifier in the form of dereferenceable URI13 deployed in $0 in MARC environment13

Throughout the process the TG began to frame the questions that might assists in the effort in transitioning MARC data to linked data Including reached possible resolutions where potential problems may reside Such13 as planning for MAC13 proposals in13 its first year

Issues that were more long-shy‐term and may require in-shy‐depth13 discussions from broader community involvements for instance subfields such as $4 which have been defined might have the potential13 to

Page13 5 of 22

hold13 HTTP URI The repeatability and13 ambiguity and13 significance of the ordinal sequence are less13 trivialand complex

In regards to bulk processing of insertion system performance and scalability the Pilot Test also helpedaddressed SPARQL query adjustment on the13 server side Though URIs added by hand was the13 leastdesirable exercise which13 could13 be inevitable the TG also13 began13 documenting resources that13 would assist13 such endeavor

The overall strategies that the TG adopted were carefully thought-shy‐out in13 order to13 achieve iterativesuccess13 that will build confidence throughout phases13 of implementation

IV PROBLEM STATEMENT

To encode data13 suitable for transformation13 into13 RDF triples it is necessary to13 be able to13 identify in13 MARC13 the data elements corresponding to the subject predicate and object13 in each statement13 andor13 to provide URIs for them It quickly became apparent that the task is not simply to13 add13 subfields to13 allow URIs to be given -shy‐ itself a non-shy‐trivial problem given the limited number13 of13 unused subfield still available in MARC -shy‐ but also13 to13 negotiate the often13 ambiguous semantics of MARC The TG has sought to do this13 through a judicious combination13 of redefinition13 proposals clarification13 of existing semantics and13 best practice recommendations

Best practices for incorporating HTTP URIs in13 MARC13 BIB13 and13 Authority records without making major renovations to MARC format13 (taking into consideration costbenefit13 analysis for13 an lsquoend of13 lifersquo technology)

IV 1 Where to place URIs in the MARC structure ($0 $4) [Charge 3]

The TG developed13 a pilot to13 examine the issues surrounding the issues of adding identifiers to13 MARC13 21 data The work included13 the identification13 of actionable source vocabularies and13 creating test record13 sets13 with dereferenceable URIs13 embedded in the data A variety of13 formats were represented in the test13 data and13 ILS vendors programmers system engineers and13 discovery designers were consulted13 throughout13 the pilot13 to comment13 on the retrieval of13 actionable URIs and the appropriate policies ensuring13 the13 data13 are actionable13 in MARC 21 data

The TG also inventoried the13 MARC bibliographic and authority formats to identify MARC 21 fields that contain subfields13 capable of accommodating URIs In the bibliographic13 formats13 subfields13 $0 and $4 were identified as existing candidates13 for containing URIs subfields13 $0 and $4 were candidates13 in the authority format MARC 21 fields that might usefully contain a subfield for a URI but which do not have13 one defined13 were also13 noted

The TG focused on subfield $0 and $4 for13 its first13 three MARC Discussion papers submitted in to MAC at13 ALA13 Annual 2016

Page13 6 of 22

IV2 What difficulties are evidenced

IV2113 Adding multiple $0 [Charge 12]

The nature and use of subfield $0 has evolved in MARC since the subfield was13 first implemented in 2007 In 2010 it was redefined and came to include standard numbers including URIs in addition to its original use for authority record13 control numbers

However MARC is not specific as to which parts of a controlled heading string correspond to the $0 Nothing in the MARC specification rules out one $0 subfield applying to one set of subfields in a heading while a different $0 applies to others (To ameliorate this problem we formed13 a MARC13 objectURI reconciliation subgroup to enumerate the subfields naming the object13 in each MARC field b see IV22 below) And because $0 is repeatable it is possible to13 find13 multiple $0 values corresponding to13 the same heading13 subfields naming13 the13 same13 entity Indeed the13 latter practice13 is adopted by design in some13 implementations notably that at the German National13 Library13

The existence of different use cases and practices for relating headings to $0 has emerged as an issue13 that13 will13 need to be considered as the TGrsquos work proceeds In the case of OCLCrsquos heading control functionality related to LC names and LCSH subfield $0 data is included as an XML tag attribute in each subfield XML tag covered by a particular authority record13 and13 is repeated13 as many times as needed13 depending o the number of subfields used13 to13 represent the name or subject In the subsequent development of controlling for other authority files the same approach13 has been13 taken but instead13 retaining13 the13 same13 or different authority13 record control numbers in multiple13 $0 subfields [See examples at end of document]

This repeated use of $0 subfields containing the same authority record number or different authority record numbers for13 different13 parts of13 a heading runs contrary to the need that exists in an OCLC context of a single URI corresponding to13 the entire named13 entity given13 in13 the field Extraneous $0 subfields are automatically deleted in WorldCat records in fields that are13 otherwise13 controlled to a particular authority file However this leaves unresolved the question of controlling via multiple source vocabularies within the same language of cataloging13 which many13 see as a desirable mediumb toblongb term objective Given the investment13 in its development13 and the number13 of13 controlled headings in WorldCat completely changing the heading control functionality within WorldCat is not feasible so the TG and OCLC staff have13 considered other alternatives allowing for output of needed URIs in the13 format which libraries would prefer in the13 future

IV2213 How to identify a RDF object in a MARC datafield [Charge 43]

This emerged as an important need because the ability to identify a URI with its corresponding label is necessary to13 support both13 reconciliation13 of existing data and13 updates to13 those labels based13 o their association with an identifier The13 only realistic way to make this identification was to document the correspondences13 on a fieldb byb field basis Fortunately this was very achievable for13 the majority of13 fields in widespread use13 [Link to recommendations The investigation revealed a number of issues relating to

Page13 7 of 22

the identification of13 single entities vs larger13 sets (series conferences)13 and alignment13 of13 MARC and RDAvocabularies

IV2313 What did we find in identifying relationshipsmultiple relationships [Charge 41]

IV23113 Relationships are expressed in MARC by a variety of means includingIV231113 Field tagging either alone eg13 830 or in combination with

indicators eg13 780785 IV231213 Subfield codes eg13 041IV231313 Codes given in subfields eg13 700 $4IV231413 Controlled or natural13 language text given in subfields eg13 700 $i

IV23213 Some of13 these fields are very tightly bound to legacy MARC definitionsstructures and data Redesigning 041 for example to be hospitable toURIs would require a complete reconception of that field

IV23313 There is the greatest value in13 provisioning for URIs following a 7XX$4$0$113 pattern with $4 repurposed to house13 URIs much as $0 now does This approach13 seems13 to present a relatively low barrier to13 implementation13 while having widespread application in MARC

IV23413 Multiple relationships can cause ambiguity where they are associated withmultiple objects or multiple labels In such cases we recommend theexpedient of simply repeating13 the13 field in order to make13 the13 associationsunambiguous

IV2413 How one obtains URIs for various data sources depends on the linked13 data source (different data sources13 avail their URIs13 differently) and interoperability between the data source and the cataloging tools being used

T help13 support obtaining13 the right URIs for the its purposes in13 MARC the TG has begun13 a document currently referred to as13 Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources For commonly used vocabularies in MARC we want to document where in the data source UI one can find the canonical URIs that when dereferenced provides13 data Going forward for each entry13 in the13 document we13 want to explain whether a data source13 publishes their data13 as Authorities Real World Objects or both Also we want to document methods available for machine access to13 the data Is the data13 published13 as Linked13 Data13 available through13 http available through13 a SPARQL13 endpoint data dumps etc

IV24113 MarcEdit [Charge 3]

In the summer of 2014 MarcEdit introduced a suite of tools designed to begin testing the feasibility of embedding13 linked data13 concepts into MARC records Initially the scope of the suite was limited to embedding13 HTTP13 URI in the13 $0 in MARC fields 1xx 6xx 7xx in bibliographic records This initial work focused on integration with the US Library of13 Congressrsquos idlocgov service as well as OCLCrsquos VIAF services13 for resolution However over the past 2 years and in response to many of the questions and issues surfaced through the TG the Linking services have been expanded and revised to potentially support all use-shy‐cases13 identified by13 this13 Task13 Force as13 well as13 providing support for non-shy‐MARC21 users to configure the Linking tool for13 use with other13 MARC formats

Page13 8 of 22

The MarcEdit Linking toolkit13 currently supports the generation of13 URIs for13 all identified fields by this Task Force for authority and bibliographic records The application utilizes a rules file that documents field processing and service configuration values This allows MarcEdit13 to quickly make changes to the rules governing field processing as well as adding support13 for13 new collections and linked data endpoints As of this report (9212016) the MarcEdit Linked13 Data tool support resolution13 against the following linked data13 services

1 US Library of Congraess NAF2 US Library of Congress LCSH3 US Library of Congress Childrenrsquos Subject Headings4 US Library of Congress Demographic Group Terms5 Thesaurus for Graphic Materials6 US Library of Congress GenreForm Terms7 US Library13 of Congress Medium of Performance Thesaurus for Music8 RDA13 Carrier Types9 RDA13 Media Types10 RDA13 Content Types11 Getty Arts and Architecture Thesaurus12 Getty ULAN13 National Library of Medicine MESH14 OCLC FAST Headings15 OCLC VIAF16 German National Library (GND)17 [15 national library name indexes via VIAF]18 Japanese Diet13 Library

Additionally users have the ability to13 configure their own13 linked13 data endpoints for use with13 MarcEdit so long as13 the service in question supports13 SPARQL and json There is presently a knowledge-base article13 at httpmarceditreesetneteditingb marceditsb linkedb datab rulesb file documenting how users can both add new collections13 or modify13 the rules13 used13 when13 processing a particular field

Essentially MarcEdit utilizes13 its13 rules13 file to configure MarcEditrsquos13 linked data platform to identify the proper indexservice normalization13 (for data query purposes) and13 subfields to13 utilize as part of any look up13 process Additionally each13 rules block identifies when13 a field13 should13 be processed13 (ie only when13 used13 in13 a bibliographic record used13 in13 an13 authority record or both) For example here13 is the13 definition for13 the 650 field

ltfield type=bibliographicgt lttaggt650lttaggt ltsubfieldsgtabvxyzltsubfieldsgt ltind213 value=0 vocab=lcshgt ltind213 value=1 vocab=lcshacgt

Page13 9 of 22

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 5: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

authoritative13 by library catalogers and authority experts In shorthand these13 URIs are13 standard URLs13 for Web pages

bull The rows with the blue background are13 records are13 derived from library authority files andmore modern registries designed for similar purposes They may be pages from13 libraryauthority files published on the13 Web human-shy‐readable views of13 machine-shy‐understandableRDF data or raw RDF But in one form or another all of the URIs resolve to13 libraryauthorities (or simply lsquoAuthoritiesrsquo) that are13 about Hillary Clinton The TG refer13 to these URIsas Authority URIs

bull The rows with the green background contain URIs13 that refer to Hillary13 Clinton directly in away that is technically distinct from documents about her These URIs conform to linkeddata conventions described13 in13 standard13 Web13 documents such13 as ldquoCool URIs for the SemanticWebrdquo [httpswwww3orgTRcooluris] The data accessible from these URIs13 has13 beenpublished13 by third13 parties as well as the library community and13 encodes a rich13 domainmodel designed expressly for machine understanding13 The TG refer13 to these URIs as Entityor Thing URIs

According to13 linked13 data conventions machine processes designed13 to13 construct meaningful statementsand inferences from them require13 Thing URIs When Thing URIs are13 defined for people13 and creative13 works one desirable outcome would be a machine-shy‐understandable statement such13 as lsquoHillary Clinton13 isthe author13 of13 the book It Takes a Villagersquo With technology available in 2016 data accessible from Webpage URIs may not be machine-shy‐understandable at all and13 Authority URIs may only be partiallyunderstandable The ambiguity of URIs illustrated13 by the 02413 fields in the13 MARC Authority records is alsopresent in13 MARC13 bibliographic records

III2 Identify issuesproblems with adding URIs whether it was actually doable [Charge1 Charge 3]

A pilot test of inserting HTTP URI in13 $0 in13 bibliographic and13 authority data13 emerged as one13 logical first step for the TG It helped the TG understand issues that could easily resolve in the near term and the dob ability of inserting URI in $0 in MARC environment

The Pilot Test began in February 2016 Members prepared13 sets of input data and13 worked13 with13 tool creators13 (MarcEdit and Authority13 Toolkit) to refine lookup algorithms13 for URI insertion in $0

The enhanced data13 with HTTP13 URIs embedded were to be ingested to several integrated library systems for13 evaluation13 This exercise assisted the TG gaining a cohesive understanding of the role of an identifier in the form of dereferenceable URI13 deployed in $0 in MARC environment13

Throughout the process the TG began to frame the questions that might assists in the effort in transitioning MARC data to linked data Including reached possible resolutions where potential problems may reside Such13 as planning for MAC13 proposals in13 its first year

Issues that were more long-shy‐term and may require in-shy‐depth13 discussions from broader community involvements for instance subfields such as $4 which have been defined might have the potential13 to

Page13 5 of 22

hold13 HTTP URI The repeatability and13 ambiguity and13 significance of the ordinal sequence are less13 trivialand complex

In regards to bulk processing of insertion system performance and scalability the Pilot Test also helpedaddressed SPARQL query adjustment on the13 server side Though URIs added by hand was the13 leastdesirable exercise which13 could13 be inevitable the TG also13 began13 documenting resources that13 would assist13 such endeavor

The overall strategies that the TG adopted were carefully thought-shy‐out in13 order to13 achieve iterativesuccess13 that will build confidence throughout phases13 of implementation

IV PROBLEM STATEMENT

To encode data13 suitable for transformation13 into13 RDF triples it is necessary to13 be able to13 identify in13 MARC13 the data elements corresponding to the subject predicate and object13 in each statement13 andor13 to provide URIs for them It quickly became apparent that the task is not simply to13 add13 subfields to13 allow URIs to be given -shy‐ itself a non-shy‐trivial problem given the limited number13 of13 unused subfield still available in MARC -shy‐ but also13 to13 negotiate the often13 ambiguous semantics of MARC The TG has sought to do this13 through a judicious combination13 of redefinition13 proposals clarification13 of existing semantics and13 best practice recommendations

Best practices for incorporating HTTP URIs in13 MARC13 BIB13 and13 Authority records without making major renovations to MARC format13 (taking into consideration costbenefit13 analysis for13 an lsquoend of13 lifersquo technology)

IV 1 Where to place URIs in the MARC structure ($0 $4) [Charge 3]

The TG developed13 a pilot to13 examine the issues surrounding the issues of adding identifiers to13 MARC13 21 data The work included13 the identification13 of actionable source vocabularies and13 creating test record13 sets13 with dereferenceable URIs13 embedded in the data A variety of13 formats were represented in the test13 data and13 ILS vendors programmers system engineers and13 discovery designers were consulted13 throughout13 the pilot13 to comment13 on the retrieval of13 actionable URIs and the appropriate policies ensuring13 the13 data13 are actionable13 in MARC 21 data

The TG also inventoried the13 MARC bibliographic and authority formats to identify MARC 21 fields that contain subfields13 capable of accommodating URIs In the bibliographic13 formats13 subfields13 $0 and $4 were identified as existing candidates13 for containing URIs subfields13 $0 and $4 were candidates13 in the authority format MARC 21 fields that might usefully contain a subfield for a URI but which do not have13 one defined13 were also13 noted

The TG focused on subfield $0 and $4 for13 its first13 three MARC Discussion papers submitted in to MAC at13 ALA13 Annual 2016

Page13 6 of 22

IV2 What difficulties are evidenced

IV2113 Adding multiple $0 [Charge 12]

The nature and use of subfield $0 has evolved in MARC since the subfield was13 first implemented in 2007 In 2010 it was redefined and came to include standard numbers including URIs in addition to its original use for authority record13 control numbers

However MARC is not specific as to which parts of a controlled heading string correspond to the $0 Nothing in the MARC specification rules out one $0 subfield applying to one set of subfields in a heading while a different $0 applies to others (To ameliorate this problem we formed13 a MARC13 objectURI reconciliation subgroup to enumerate the subfields naming the object13 in each MARC field b see IV22 below) And because $0 is repeatable it is possible to13 find13 multiple $0 values corresponding to13 the same heading13 subfields naming13 the13 same13 entity Indeed the13 latter practice13 is adopted by design in some13 implementations notably that at the German National13 Library13

The existence of different use cases and practices for relating headings to $0 has emerged as an issue13 that13 will13 need to be considered as the TGrsquos work proceeds In the case of OCLCrsquos heading control functionality related to LC names and LCSH subfield $0 data is included as an XML tag attribute in each subfield XML tag covered by a particular authority record13 and13 is repeated13 as many times as needed13 depending o the number of subfields used13 to13 represent the name or subject In the subsequent development of controlling for other authority files the same approach13 has been13 taken but instead13 retaining13 the13 same13 or different authority13 record control numbers in multiple13 $0 subfields [See examples at end of document]

This repeated use of $0 subfields containing the same authority record number or different authority record numbers for13 different13 parts of13 a heading runs contrary to the need that exists in an OCLC context of a single URI corresponding to13 the entire named13 entity given13 in13 the field Extraneous $0 subfields are automatically deleted in WorldCat records in fields that are13 otherwise13 controlled to a particular authority file However this leaves unresolved the question of controlling via multiple source vocabularies within the same language of cataloging13 which many13 see as a desirable mediumb toblongb term objective Given the investment13 in its development13 and the number13 of13 controlled headings in WorldCat completely changing the heading control functionality within WorldCat is not feasible so the TG and OCLC staff have13 considered other alternatives allowing for output of needed URIs in the13 format which libraries would prefer in the13 future

IV2213 How to identify a RDF object in a MARC datafield [Charge 43]

This emerged as an important need because the ability to identify a URI with its corresponding label is necessary to13 support both13 reconciliation13 of existing data and13 updates to13 those labels based13 o their association with an identifier The13 only realistic way to make this identification was to document the correspondences13 on a fieldb byb field basis Fortunately this was very achievable for13 the majority of13 fields in widespread use13 [Link to recommendations The investigation revealed a number of issues relating to

Page13 7 of 22

the identification of13 single entities vs larger13 sets (series conferences)13 and alignment13 of13 MARC and RDAvocabularies

IV2313 What did we find in identifying relationshipsmultiple relationships [Charge 41]

IV23113 Relationships are expressed in MARC by a variety of means includingIV231113 Field tagging either alone eg13 830 or in combination with

indicators eg13 780785 IV231213 Subfield codes eg13 041IV231313 Codes given in subfields eg13 700 $4IV231413 Controlled or natural13 language text given in subfields eg13 700 $i

IV23213 Some of13 these fields are very tightly bound to legacy MARC definitionsstructures and data Redesigning 041 for example to be hospitable toURIs would require a complete reconception of that field

IV23313 There is the greatest value in13 provisioning for URIs following a 7XX$4$0$113 pattern with $4 repurposed to house13 URIs much as $0 now does This approach13 seems13 to present a relatively low barrier to13 implementation13 while having widespread application in MARC

IV23413 Multiple relationships can cause ambiguity where they are associated withmultiple objects or multiple labels In such cases we recommend theexpedient of simply repeating13 the13 field in order to make13 the13 associationsunambiguous

IV2413 How one obtains URIs for various data sources depends on the linked13 data source (different data sources13 avail their URIs13 differently) and interoperability between the data source and the cataloging tools being used

T help13 support obtaining13 the right URIs for the its purposes in13 MARC the TG has begun13 a document currently referred to as13 Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources For commonly used vocabularies in MARC we want to document where in the data source UI one can find the canonical URIs that when dereferenced provides13 data Going forward for each entry13 in the13 document we13 want to explain whether a data source13 publishes their data13 as Authorities Real World Objects or both Also we want to document methods available for machine access to13 the data Is the data13 published13 as Linked13 Data13 available through13 http available through13 a SPARQL13 endpoint data dumps etc

IV24113 MarcEdit [Charge 3]

In the summer of 2014 MarcEdit introduced a suite of tools designed to begin testing the feasibility of embedding13 linked data13 concepts into MARC records Initially the scope of the suite was limited to embedding13 HTTP13 URI in the13 $0 in MARC fields 1xx 6xx 7xx in bibliographic records This initial work focused on integration with the US Library of13 Congressrsquos idlocgov service as well as OCLCrsquos VIAF services13 for resolution However over the past 2 years and in response to many of the questions and issues surfaced through the TG the Linking services have been expanded and revised to potentially support all use-shy‐cases13 identified by13 this13 Task13 Force as13 well as13 providing support for non-shy‐MARC21 users to configure the Linking tool for13 use with other13 MARC formats

Page13 8 of 22

The MarcEdit Linking toolkit13 currently supports the generation of13 URIs for13 all identified fields by this Task Force for authority and bibliographic records The application utilizes a rules file that documents field processing and service configuration values This allows MarcEdit13 to quickly make changes to the rules governing field processing as well as adding support13 for13 new collections and linked data endpoints As of this report (9212016) the MarcEdit Linked13 Data tool support resolution13 against the following linked data13 services

1 US Library of Congraess NAF2 US Library of Congress LCSH3 US Library of Congress Childrenrsquos Subject Headings4 US Library of Congress Demographic Group Terms5 Thesaurus for Graphic Materials6 US Library of Congress GenreForm Terms7 US Library13 of Congress Medium of Performance Thesaurus for Music8 RDA13 Carrier Types9 RDA13 Media Types10 RDA13 Content Types11 Getty Arts and Architecture Thesaurus12 Getty ULAN13 National Library of Medicine MESH14 OCLC FAST Headings15 OCLC VIAF16 German National Library (GND)17 [15 national library name indexes via VIAF]18 Japanese Diet13 Library

Additionally users have the ability to13 configure their own13 linked13 data endpoints for use with13 MarcEdit so long as13 the service in question supports13 SPARQL and json There is presently a knowledge-base article13 at httpmarceditreesetneteditingb marceditsb linkedb datab rulesb file documenting how users can both add new collections13 or modify13 the rules13 used13 when13 processing a particular field

Essentially MarcEdit utilizes13 its13 rules13 file to configure MarcEditrsquos13 linked data platform to identify the proper indexservice normalization13 (for data query purposes) and13 subfields to13 utilize as part of any look up13 process Additionally each13 rules block identifies when13 a field13 should13 be processed13 (ie only when13 used13 in13 a bibliographic record used13 in13 an13 authority record or both) For example here13 is the13 definition for13 the 650 field

ltfield type=bibliographicgt lttaggt650lttaggt ltsubfieldsgtabvxyzltsubfieldsgt ltind213 value=0 vocab=lcshgt ltind213 value=1 vocab=lcshacgt

Page13 9 of 22

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 6: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

hold13 HTTP URI The repeatability and13 ambiguity and13 significance of the ordinal sequence are less13 trivialand complex

In regards to bulk processing of insertion system performance and scalability the Pilot Test also helpedaddressed SPARQL query adjustment on the13 server side Though URIs added by hand was the13 leastdesirable exercise which13 could13 be inevitable the TG also13 began13 documenting resources that13 would assist13 such endeavor

The overall strategies that the TG adopted were carefully thought-shy‐out in13 order to13 achieve iterativesuccess13 that will build confidence throughout phases13 of implementation

IV PROBLEM STATEMENT

To encode data13 suitable for transformation13 into13 RDF triples it is necessary to13 be able to13 identify in13 MARC13 the data elements corresponding to the subject predicate and object13 in each statement13 andor13 to provide URIs for them It quickly became apparent that the task is not simply to13 add13 subfields to13 allow URIs to be given -shy‐ itself a non-shy‐trivial problem given the limited number13 of13 unused subfield still available in MARC -shy‐ but also13 to13 negotiate the often13 ambiguous semantics of MARC The TG has sought to do this13 through a judicious combination13 of redefinition13 proposals clarification13 of existing semantics and13 best practice recommendations

Best practices for incorporating HTTP URIs in13 MARC13 BIB13 and13 Authority records without making major renovations to MARC format13 (taking into consideration costbenefit13 analysis for13 an lsquoend of13 lifersquo technology)

IV 1 Where to place URIs in the MARC structure ($0 $4) [Charge 3]

The TG developed13 a pilot to13 examine the issues surrounding the issues of adding identifiers to13 MARC13 21 data The work included13 the identification13 of actionable source vocabularies and13 creating test record13 sets13 with dereferenceable URIs13 embedded in the data A variety of13 formats were represented in the test13 data and13 ILS vendors programmers system engineers and13 discovery designers were consulted13 throughout13 the pilot13 to comment13 on the retrieval of13 actionable URIs and the appropriate policies ensuring13 the13 data13 are actionable13 in MARC 21 data

The TG also inventoried the13 MARC bibliographic and authority formats to identify MARC 21 fields that contain subfields13 capable of accommodating URIs In the bibliographic13 formats13 subfields13 $0 and $4 were identified as existing candidates13 for containing URIs subfields13 $0 and $4 were candidates13 in the authority format MARC 21 fields that might usefully contain a subfield for a URI but which do not have13 one defined13 were also13 noted

The TG focused on subfield $0 and $4 for13 its first13 three MARC Discussion papers submitted in to MAC at13 ALA13 Annual 2016

Page13 6 of 22

IV2 What difficulties are evidenced

IV2113 Adding multiple $0 [Charge 12]

The nature and use of subfield $0 has evolved in MARC since the subfield was13 first implemented in 2007 In 2010 it was redefined and came to include standard numbers including URIs in addition to its original use for authority record13 control numbers

However MARC is not specific as to which parts of a controlled heading string correspond to the $0 Nothing in the MARC specification rules out one $0 subfield applying to one set of subfields in a heading while a different $0 applies to others (To ameliorate this problem we formed13 a MARC13 objectURI reconciliation subgroup to enumerate the subfields naming the object13 in each MARC field b see IV22 below) And because $0 is repeatable it is possible to13 find13 multiple $0 values corresponding to13 the same heading13 subfields naming13 the13 same13 entity Indeed the13 latter practice13 is adopted by design in some13 implementations notably that at the German National13 Library13

The existence of different use cases and practices for relating headings to $0 has emerged as an issue13 that13 will13 need to be considered as the TGrsquos work proceeds In the case of OCLCrsquos heading control functionality related to LC names and LCSH subfield $0 data is included as an XML tag attribute in each subfield XML tag covered by a particular authority record13 and13 is repeated13 as many times as needed13 depending o the number of subfields used13 to13 represent the name or subject In the subsequent development of controlling for other authority files the same approach13 has been13 taken but instead13 retaining13 the13 same13 or different authority13 record control numbers in multiple13 $0 subfields [See examples at end of document]

This repeated use of $0 subfields containing the same authority record number or different authority record numbers for13 different13 parts of13 a heading runs contrary to the need that exists in an OCLC context of a single URI corresponding to13 the entire named13 entity given13 in13 the field Extraneous $0 subfields are automatically deleted in WorldCat records in fields that are13 otherwise13 controlled to a particular authority file However this leaves unresolved the question of controlling via multiple source vocabularies within the same language of cataloging13 which many13 see as a desirable mediumb toblongb term objective Given the investment13 in its development13 and the number13 of13 controlled headings in WorldCat completely changing the heading control functionality within WorldCat is not feasible so the TG and OCLC staff have13 considered other alternatives allowing for output of needed URIs in the13 format which libraries would prefer in the13 future

IV2213 How to identify a RDF object in a MARC datafield [Charge 43]

This emerged as an important need because the ability to identify a URI with its corresponding label is necessary to13 support both13 reconciliation13 of existing data and13 updates to13 those labels based13 o their association with an identifier The13 only realistic way to make this identification was to document the correspondences13 on a fieldb byb field basis Fortunately this was very achievable for13 the majority of13 fields in widespread use13 [Link to recommendations The investigation revealed a number of issues relating to

Page13 7 of 22

the identification of13 single entities vs larger13 sets (series conferences)13 and alignment13 of13 MARC and RDAvocabularies

IV2313 What did we find in identifying relationshipsmultiple relationships [Charge 41]

IV23113 Relationships are expressed in MARC by a variety of means includingIV231113 Field tagging either alone eg13 830 or in combination with

indicators eg13 780785 IV231213 Subfield codes eg13 041IV231313 Codes given in subfields eg13 700 $4IV231413 Controlled or natural13 language text given in subfields eg13 700 $i

IV23213 Some of13 these fields are very tightly bound to legacy MARC definitionsstructures and data Redesigning 041 for example to be hospitable toURIs would require a complete reconception of that field

IV23313 There is the greatest value in13 provisioning for URIs following a 7XX$4$0$113 pattern with $4 repurposed to house13 URIs much as $0 now does This approach13 seems13 to present a relatively low barrier to13 implementation13 while having widespread application in MARC

IV23413 Multiple relationships can cause ambiguity where they are associated withmultiple objects or multiple labels In such cases we recommend theexpedient of simply repeating13 the13 field in order to make13 the13 associationsunambiguous

IV2413 How one obtains URIs for various data sources depends on the linked13 data source (different data sources13 avail their URIs13 differently) and interoperability between the data source and the cataloging tools being used

T help13 support obtaining13 the right URIs for the its purposes in13 MARC the TG has begun13 a document currently referred to as13 Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources For commonly used vocabularies in MARC we want to document where in the data source UI one can find the canonical URIs that when dereferenced provides13 data Going forward for each entry13 in the13 document we13 want to explain whether a data source13 publishes their data13 as Authorities Real World Objects or both Also we want to document methods available for machine access to13 the data Is the data13 published13 as Linked13 Data13 available through13 http available through13 a SPARQL13 endpoint data dumps etc

IV24113 MarcEdit [Charge 3]

In the summer of 2014 MarcEdit introduced a suite of tools designed to begin testing the feasibility of embedding13 linked data13 concepts into MARC records Initially the scope of the suite was limited to embedding13 HTTP13 URI in the13 $0 in MARC fields 1xx 6xx 7xx in bibliographic records This initial work focused on integration with the US Library of13 Congressrsquos idlocgov service as well as OCLCrsquos VIAF services13 for resolution However over the past 2 years and in response to many of the questions and issues surfaced through the TG the Linking services have been expanded and revised to potentially support all use-shy‐cases13 identified by13 this13 Task13 Force as13 well as13 providing support for non-shy‐MARC21 users to configure the Linking tool for13 use with other13 MARC formats

Page13 8 of 22

The MarcEdit Linking toolkit13 currently supports the generation of13 URIs for13 all identified fields by this Task Force for authority and bibliographic records The application utilizes a rules file that documents field processing and service configuration values This allows MarcEdit13 to quickly make changes to the rules governing field processing as well as adding support13 for13 new collections and linked data endpoints As of this report (9212016) the MarcEdit Linked13 Data tool support resolution13 against the following linked data13 services

1 US Library of Congraess NAF2 US Library of Congress LCSH3 US Library of Congress Childrenrsquos Subject Headings4 US Library of Congress Demographic Group Terms5 Thesaurus for Graphic Materials6 US Library of Congress GenreForm Terms7 US Library13 of Congress Medium of Performance Thesaurus for Music8 RDA13 Carrier Types9 RDA13 Media Types10 RDA13 Content Types11 Getty Arts and Architecture Thesaurus12 Getty ULAN13 National Library of Medicine MESH14 OCLC FAST Headings15 OCLC VIAF16 German National Library (GND)17 [15 national library name indexes via VIAF]18 Japanese Diet13 Library

Additionally users have the ability to13 configure their own13 linked13 data endpoints for use with13 MarcEdit so long as13 the service in question supports13 SPARQL and json There is presently a knowledge-base article13 at httpmarceditreesetneteditingb marceditsb linkedb datab rulesb file documenting how users can both add new collections13 or modify13 the rules13 used13 when13 processing a particular field

Essentially MarcEdit utilizes13 its13 rules13 file to configure MarcEditrsquos13 linked data platform to identify the proper indexservice normalization13 (for data query purposes) and13 subfields to13 utilize as part of any look up13 process Additionally each13 rules block identifies when13 a field13 should13 be processed13 (ie only when13 used13 in13 a bibliographic record used13 in13 an13 authority record or both) For example here13 is the13 definition for13 the 650 field

ltfield type=bibliographicgt lttaggt650lttaggt ltsubfieldsgtabvxyzltsubfieldsgt ltind213 value=0 vocab=lcshgt ltind213 value=1 vocab=lcshacgt

Page13 9 of 22

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 7: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

IV2 What difficulties are evidenced

IV2113 Adding multiple $0 [Charge 12]

The nature and use of subfield $0 has evolved in MARC since the subfield was13 first implemented in 2007 In 2010 it was redefined and came to include standard numbers including URIs in addition to its original use for authority record13 control numbers

However MARC is not specific as to which parts of a controlled heading string correspond to the $0 Nothing in the MARC specification rules out one $0 subfield applying to one set of subfields in a heading while a different $0 applies to others (To ameliorate this problem we formed13 a MARC13 objectURI reconciliation subgroup to enumerate the subfields naming the object13 in each MARC field b see IV22 below) And because $0 is repeatable it is possible to13 find13 multiple $0 values corresponding to13 the same heading13 subfields naming13 the13 same13 entity Indeed the13 latter practice13 is adopted by design in some13 implementations notably that at the German National13 Library13

The existence of different use cases and practices for relating headings to $0 has emerged as an issue13 that13 will13 need to be considered as the TGrsquos work proceeds In the case of OCLCrsquos heading control functionality related to LC names and LCSH subfield $0 data is included as an XML tag attribute in each subfield XML tag covered by a particular authority record13 and13 is repeated13 as many times as needed13 depending o the number of subfields used13 to13 represent the name or subject In the subsequent development of controlling for other authority files the same approach13 has been13 taken but instead13 retaining13 the13 same13 or different authority13 record control numbers in multiple13 $0 subfields [See examples at end of document]

This repeated use of $0 subfields containing the same authority record number or different authority record numbers for13 different13 parts of13 a heading runs contrary to the need that exists in an OCLC context of a single URI corresponding to13 the entire named13 entity given13 in13 the field Extraneous $0 subfields are automatically deleted in WorldCat records in fields that are13 otherwise13 controlled to a particular authority file However this leaves unresolved the question of controlling via multiple source vocabularies within the same language of cataloging13 which many13 see as a desirable mediumb toblongb term objective Given the investment13 in its development13 and the number13 of13 controlled headings in WorldCat completely changing the heading control functionality within WorldCat is not feasible so the TG and OCLC staff have13 considered other alternatives allowing for output of needed URIs in the13 format which libraries would prefer in the13 future

IV2213 How to identify a RDF object in a MARC datafield [Charge 43]

This emerged as an important need because the ability to identify a URI with its corresponding label is necessary to13 support both13 reconciliation13 of existing data and13 updates to13 those labels based13 o their association with an identifier The13 only realistic way to make this identification was to document the correspondences13 on a fieldb byb field basis Fortunately this was very achievable for13 the majority of13 fields in widespread use13 [Link to recommendations The investigation revealed a number of issues relating to

Page13 7 of 22

the identification of13 single entities vs larger13 sets (series conferences)13 and alignment13 of13 MARC and RDAvocabularies

IV2313 What did we find in identifying relationshipsmultiple relationships [Charge 41]

IV23113 Relationships are expressed in MARC by a variety of means includingIV231113 Field tagging either alone eg13 830 or in combination with

indicators eg13 780785 IV231213 Subfield codes eg13 041IV231313 Codes given in subfields eg13 700 $4IV231413 Controlled or natural13 language text given in subfields eg13 700 $i

IV23213 Some of13 these fields are very tightly bound to legacy MARC definitionsstructures and data Redesigning 041 for example to be hospitable toURIs would require a complete reconception of that field

IV23313 There is the greatest value in13 provisioning for URIs following a 7XX$4$0$113 pattern with $4 repurposed to house13 URIs much as $0 now does This approach13 seems13 to present a relatively low barrier to13 implementation13 while having widespread application in MARC

IV23413 Multiple relationships can cause ambiguity where they are associated withmultiple objects or multiple labels In such cases we recommend theexpedient of simply repeating13 the13 field in order to make13 the13 associationsunambiguous

IV2413 How one obtains URIs for various data sources depends on the linked13 data source (different data sources13 avail their URIs13 differently) and interoperability between the data source and the cataloging tools being used

T help13 support obtaining13 the right URIs for the its purposes in13 MARC the TG has begun13 a document currently referred to as13 Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources For commonly used vocabularies in MARC we want to document where in the data source UI one can find the canonical URIs that when dereferenced provides13 data Going forward for each entry13 in the13 document we13 want to explain whether a data source13 publishes their data13 as Authorities Real World Objects or both Also we want to document methods available for machine access to13 the data Is the data13 published13 as Linked13 Data13 available through13 http available through13 a SPARQL13 endpoint data dumps etc

IV24113 MarcEdit [Charge 3]

In the summer of 2014 MarcEdit introduced a suite of tools designed to begin testing the feasibility of embedding13 linked data13 concepts into MARC records Initially the scope of the suite was limited to embedding13 HTTP13 URI in the13 $0 in MARC fields 1xx 6xx 7xx in bibliographic records This initial work focused on integration with the US Library of13 Congressrsquos idlocgov service as well as OCLCrsquos VIAF services13 for resolution However over the past 2 years and in response to many of the questions and issues surfaced through the TG the Linking services have been expanded and revised to potentially support all use-shy‐cases13 identified by13 this13 Task13 Force as13 well as13 providing support for non-shy‐MARC21 users to configure the Linking tool for13 use with other13 MARC formats

Page13 8 of 22

The MarcEdit Linking toolkit13 currently supports the generation of13 URIs for13 all identified fields by this Task Force for authority and bibliographic records The application utilizes a rules file that documents field processing and service configuration values This allows MarcEdit13 to quickly make changes to the rules governing field processing as well as adding support13 for13 new collections and linked data endpoints As of this report (9212016) the MarcEdit Linked13 Data tool support resolution13 against the following linked data13 services

1 US Library of Congraess NAF2 US Library of Congress LCSH3 US Library of Congress Childrenrsquos Subject Headings4 US Library of Congress Demographic Group Terms5 Thesaurus for Graphic Materials6 US Library of Congress GenreForm Terms7 US Library13 of Congress Medium of Performance Thesaurus for Music8 RDA13 Carrier Types9 RDA13 Media Types10 RDA13 Content Types11 Getty Arts and Architecture Thesaurus12 Getty ULAN13 National Library of Medicine MESH14 OCLC FAST Headings15 OCLC VIAF16 German National Library (GND)17 [15 national library name indexes via VIAF]18 Japanese Diet13 Library

Additionally users have the ability to13 configure their own13 linked13 data endpoints for use with13 MarcEdit so long as13 the service in question supports13 SPARQL and json There is presently a knowledge-base article13 at httpmarceditreesetneteditingb marceditsb linkedb datab rulesb file documenting how users can both add new collections13 or modify13 the rules13 used13 when13 processing a particular field

Essentially MarcEdit utilizes13 its13 rules13 file to configure MarcEditrsquos13 linked data platform to identify the proper indexservice normalization13 (for data query purposes) and13 subfields to13 utilize as part of any look up13 process Additionally each13 rules block identifies when13 a field13 should13 be processed13 (ie only when13 used13 in13 a bibliographic record used13 in13 an13 authority record or both) For example here13 is the13 definition for13 the 650 field

ltfield type=bibliographicgt lttaggt650lttaggt ltsubfieldsgtabvxyzltsubfieldsgt ltind213 value=0 vocab=lcshgt ltind213 value=1 vocab=lcshacgt

Page13 9 of 22

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 8: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

the identification of13 single entities vs larger13 sets (series conferences)13 and alignment13 of13 MARC and RDAvocabularies

IV2313 What did we find in identifying relationshipsmultiple relationships [Charge 41]

IV23113 Relationships are expressed in MARC by a variety of means includingIV231113 Field tagging either alone eg13 830 or in combination with

indicators eg13 780785 IV231213 Subfield codes eg13 041IV231313 Codes given in subfields eg13 700 $4IV231413 Controlled or natural13 language text given in subfields eg13 700 $i

IV23213 Some of13 these fields are very tightly bound to legacy MARC definitionsstructures and data Redesigning 041 for example to be hospitable toURIs would require a complete reconception of that field

IV23313 There is the greatest value in13 provisioning for URIs following a 7XX$4$0$113 pattern with $4 repurposed to house13 URIs much as $0 now does This approach13 seems13 to present a relatively low barrier to13 implementation13 while having widespread application in MARC

IV23413 Multiple relationships can cause ambiguity where they are associated withmultiple objects or multiple labels In such cases we recommend theexpedient of simply repeating13 the13 field in order to make13 the13 associationsunambiguous

IV2413 How one obtains URIs for various data sources depends on the linked13 data source (different data sources13 avail their URIs13 differently) and interoperability between the data source and the cataloging tools being used

T help13 support obtaining13 the right URIs for the its purposes in13 MARC the TG has begun13 a document currently referred to as13 Formulating and Obtaining URIs A Guide to Commonly Used Vocabularies and Reference Sources For commonly used vocabularies in MARC we want to document where in the data source UI one can find the canonical URIs that when dereferenced provides13 data Going forward for each entry13 in the13 document we13 want to explain whether a data source13 publishes their data13 as Authorities Real World Objects or both Also we want to document methods available for machine access to13 the data Is the data13 published13 as Linked13 Data13 available through13 http available through13 a SPARQL13 endpoint data dumps etc

IV24113 MarcEdit [Charge 3]

In the summer of 2014 MarcEdit introduced a suite of tools designed to begin testing the feasibility of embedding13 linked data13 concepts into MARC records Initially the scope of the suite was limited to embedding13 HTTP13 URI in the13 $0 in MARC fields 1xx 6xx 7xx in bibliographic records This initial work focused on integration with the US Library of13 Congressrsquos idlocgov service as well as OCLCrsquos VIAF services13 for resolution However over the past 2 years and in response to many of the questions and issues surfaced through the TG the Linking services have been expanded and revised to potentially support all use-shy‐cases13 identified by13 this13 Task13 Force as13 well as13 providing support for non-shy‐MARC21 users to configure the Linking tool for13 use with other13 MARC formats

Page13 8 of 22

The MarcEdit Linking toolkit13 currently supports the generation of13 URIs for13 all identified fields by this Task Force for authority and bibliographic records The application utilizes a rules file that documents field processing and service configuration values This allows MarcEdit13 to quickly make changes to the rules governing field processing as well as adding support13 for13 new collections and linked data endpoints As of this report (9212016) the MarcEdit Linked13 Data tool support resolution13 against the following linked data13 services

1 US Library of Congraess NAF2 US Library of Congress LCSH3 US Library of Congress Childrenrsquos Subject Headings4 US Library of Congress Demographic Group Terms5 Thesaurus for Graphic Materials6 US Library of Congress GenreForm Terms7 US Library13 of Congress Medium of Performance Thesaurus for Music8 RDA13 Carrier Types9 RDA13 Media Types10 RDA13 Content Types11 Getty Arts and Architecture Thesaurus12 Getty ULAN13 National Library of Medicine MESH14 OCLC FAST Headings15 OCLC VIAF16 German National Library (GND)17 [15 national library name indexes via VIAF]18 Japanese Diet13 Library

Additionally users have the ability to13 configure their own13 linked13 data endpoints for use with13 MarcEdit so long as13 the service in question supports13 SPARQL and json There is presently a knowledge-base article13 at httpmarceditreesetneteditingb marceditsb linkedb datab rulesb file documenting how users can both add new collections13 or modify13 the rules13 used13 when13 processing a particular field

Essentially MarcEdit utilizes13 its13 rules13 file to configure MarcEditrsquos13 linked data platform to identify the proper indexservice normalization13 (for data query purposes) and13 subfields to13 utilize as part of any look up13 process Additionally each13 rules block identifies when13 a field13 should13 be processed13 (ie only when13 used13 in13 a bibliographic record used13 in13 an13 authority record or both) For example here13 is the13 definition for13 the 650 field

ltfield type=bibliographicgt lttaggt650lttaggt ltsubfieldsgtabvxyzltsubfieldsgt ltind213 value=0 vocab=lcshgt ltind213 value=1 vocab=lcshacgt

Page13 9 of 22

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 9: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

The MarcEdit Linking toolkit13 currently supports the generation of13 URIs for13 all identified fields by this Task Force for authority and bibliographic records The application utilizes a rules file that documents field processing and service configuration values This allows MarcEdit13 to quickly make changes to the rules governing field processing as well as adding support13 for13 new collections and linked data endpoints As of this report (9212016) the MarcEdit Linked13 Data tool support resolution13 against the following linked data13 services

1 US Library of Congraess NAF2 US Library of Congress LCSH3 US Library of Congress Childrenrsquos Subject Headings4 US Library of Congress Demographic Group Terms5 Thesaurus for Graphic Materials6 US Library of Congress GenreForm Terms7 US Library13 of Congress Medium of Performance Thesaurus for Music8 RDA13 Carrier Types9 RDA13 Media Types10 RDA13 Content Types11 Getty Arts and Architecture Thesaurus12 Getty ULAN13 National Library of Medicine MESH14 OCLC FAST Headings15 OCLC VIAF16 German National Library (GND)17 [15 national library name indexes via VIAF]18 Japanese Diet13 Library

Additionally users have the ability to13 configure their own13 linked13 data endpoints for use with13 MarcEdit so long as13 the service in question supports13 SPARQL and json There is presently a knowledge-base article13 at httpmarceditreesetneteditingb marceditsb linkedb datab rulesb file documenting how users can both add new collections13 or modify13 the rules13 used13 when13 processing a particular field

Essentially MarcEdit utilizes13 its13 rules13 file to configure MarcEditrsquos13 linked data platform to identify the proper indexservice normalization13 (for data query purposes) and13 subfields to13 utilize as part of any look up13 process Additionally each13 rules block identifies when13 a field13 should13 be processed13 (ie only when13 used13 in13 a bibliographic record used13 in13 an13 authority record or both) For example here13 is the13 definition for13 the 650 field

ltfield type=bibliographicgt lttaggt650lttaggt ltsubfieldsgtabvxyzltsubfieldsgt ltind213 value=0 vocab=lcshgt ltind213 value=1 vocab=lcshacgt

Page13 9 of 22

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 10: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

ltind213 value=2 vocab=meshgt ltind213 value=7 vocab=nonegt ltindexgt2ltindexgt lturigt0lturigt ltspecial_instructionsgtsubjectltspecial_instructionsgt ltfieldgt

Each MarcEdit rules block is a small segment of XML that profiles field usage within a record This is why MarcEditrsquos linking tool can be used with other flavors of MARC (like UNIMARC) the Linking service has no concept of MARC21 -- just for ISO2701 format -- the rules file provides that13 context

This approach has allowed MarcEdit to quickly profile and examine the implication of developing URIs for13 linking fields like the 880 field which provide some unique challenges -shy‐-shy‐ but can13 be accommodated13 via the current rules file format

Utilizing the current process MarcEditrsquos linking tool can accommodate a wide range of linking scenarios For example in an authority record

Page13 10 of 22

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 11: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

Within a Bibliographic Record

Page13 11 of 22

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 12: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

Page13 12 of 22

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 13: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

Across Diverse vocabularies

Current development on the tool will continue to13 focus on the inclusion13 and13 support of additional vocabularies continuing13 to work13 with linked data providers around scalability13 issues (and ways in which MarcEdit [or services like it] can reduce impacts on their13 services as well as working to profile this service to work with other flavors13 of MARC like UNIMARC to encourage further experimentation

IV24213 Authority Toolkit [Charge 3]

The authority toolkit is a program for the construction and modification of authority records13 One version is designed for use within OCLCs Connexion program for records in the LCNACO authority13 file but another version13 can13 work with13 records in13 files and13 so13 with13 records from13 other sources Both versions of the toolkit have the same capabilities At an early13 stage the toolkit acquired the ability13 to test13 terms used in authority fields such as the 370 and 372 against13 vocabularies available at13 idlocgov (at13 present LCMPT LCSH LCDGT AFSET geographic area13 codes RDA content terms and the13 LCNACO Authority File) Somewhat later it added13 the ability to13 verify terms against the MeSH vocabulary

Page13 13 of 22

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 14: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

(Additional vocabularies may be added in the future based on user13 requests)13 To perform this verification the program needs to know which vocabularies are used to control terms in which parts of which authority fields how13 to query the source to determine whether or not it is defined and how13 to react13 to the information returned by the source The toolkits actions are controlled above all by the subfield $2 code appearing in the same subfield as13 the term but in the absence of a subfield $2 code operator preferences come into13 play as well (For example an13 operator may prefer that an unlabeled term be tested against13 MeSH first and if13 not13 found tested against13 LCSH or13 perhaps tested only against13 LCDGT) A detailed description of the tookits process for verifying13 the content of authority13 fields can be found in the programs documentation athttpfileslibrarynorthwesternedupublicoclcdocumentationverifymenu

If the toolkits search for an entire term is successful the toolkit could easily supply the corresponding URI and add it to the authority record in subfield $0 This URI may be contained in the data provided by the source or13 it13 could be constructed mechanically once the toolkit13 has extracted the appropriate identifier13 As part13 of13 experimentation encouraged by the TG13 on January 1513 201613 the toolkit acquired an option to add subfield $0 to fields which could be13 verified (This option is described athttpfileslibrarynorthwesternedupublicoclcdocumentationoptionsverification0 If13 a field contains13 more than one term the toolkit must divide the field into multiple fields13 (one for each term) before it can13 add13 subfield $0

The following illustration shows an authority record as verified by the authority toolkit with the option to add subfield $0 during verification turned on (For13 this experiment subfield $0 was locally defined for13 some fields)

Although13 the toolkit13 can often discover13 information about13 compound terms (such as some corporate bodies with13 subordinate units and13 some LCSH headings) for which13 an13 authority record13 exists for some parts but not all the toolkit cannot supply subfield13 $0 (There is no authority record and so no URI that represents the entire term)13 The toolkit13 also cannot13 add subfield $0 to fields that13 contain multiple terms if the field contains an aggregation of terms rather than a collection of independent items13 (Example13 the toolkit cannot add13 subfield13 $0 to13 the 382 field)

The task of discovering that a term given in an authority record is defined in an external vocabulary is made more difficult because the searching mechanisms available do not always compensate

Page13 14 of 22

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 15: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

appropriately for operator variations in13 punctuation capitalization13 and13 the use of combining diacriticsIn addition the response time experienced by the toolkit can vary widely even for the same termsearched repeatedly within a brief time and some services13 are unavailable13 over the13 weekend If the13 potential of linked13 data is to13 be enjoyed services providing data must ensure that their entrymechanisms are robust and flexible and available at all times

IV24313 Lookup online (eg VIAF Getty ULAN Geonames Wikidata)

Online lookup requires manual operation Users must be well versed in SPARQL queries that individualservices13 provide Getty ULAN works13 differently to Geonames13 and Wikidata The URI returns from aquery may not be a RDF URI but one that may land13 user onto a Web page or document

IV3 What did we learn [Charge 13 Charge 3]

IV31 Tackle13 low hanging fruitwhat can we13 do in 1year13

The TGrsquos activities during Year were designed to position the MARC community to take tangible steps toward incorporating linked data URIs into its processes within an achievable timeframe13 Therefore the TG put aside some tasks such as overhaul of certain legacy MARC data13 elements that would have delayed13 progress with13 the TGrsquos practical objectives The tool development undertaken by Terry Reese13 and Gary Strawn was designed to advance13 these13 objectives but so were13 the13 Formulating URIs document and the13 MARC objectURI reconciliation work both of which document information that will be13 needed by other stakeholders and the work IDs in MARC proposal which seeks to remove one of13 the main barriers to13 routine incorporation13 of work identifiers in13 MARC13 records

IV32 Add $0 where13 itrsquos not defined (not simple)

One of the TGrsquos goals was also to identify and add $0 to13 fields that currently do not have one defined The TG found the followings MARC field that needed $0 defined

bibliographic 046 257 260264 375 753authority 046 360 375 377 663 680 681

These fields do not render an easy resolution when considering $0 which reflects the resource object for an entity described The13 TG conducted thorough analyses and concluded that only 25 and 37 could contain a URI that is13 an unambiguous13 between the field13 and13 the object it represents leaving out more complicated cases eg fields13 264 Production Publication Distribution Manufacture and Copyright Notice and 382 Medium of Performance

One of the issues confronted with drafting discussion paper 2016b DP19 was the extent of effort needed to individually propose subfield $0 for13 MARC 21 fields that13 do not13 contain it MAC accepted the paper13 as

Page13 15 of 22

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 16: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

a proposal and there13 was agreement ldquothat similar changes such as those13 recommended this paper might in the future be considered13 as part of a MARC13 Fastb Track processrdquo Being able to fastb track proposals for13 defining subfield13 $0 in13 field13 which13 do not contain13 it will considerably streamline the process in13 the future

IV33 Strategies in lieu of limited life cycle of MARC13 environment

Though many may see MARC is ldquodeadrdquo the system remains a viable tool that delivers metadata13 for data13 discovery It is also however a legacy format that reflects in13 its somewhat baroque structure a long history of accretion13 to13 meet varied and changing needs In pursuing its goals the TG has adopted a strategy of pursuing changes13 that can be applied coherently across13 MARC and maximize return on the library communityrsquos investment of effort13 There are economical13 and sensible approaches in13 determining what to do The TG always kept in mind of recommendations must cause the least disruption for data transition from MARC to linked data There is unlikely to have a wholesale possibility of13 inserting HTTP URI though possibly most but not all of MARC13 fields andor subfields

The TG is committed to work through a list of tasks and identify viable solutions While $0 after one yearrsquos deliberation seemed a straightforward solution for URI representing13 resource object more discussions needed13 with regards to predicate that denotes relationship13 MARC data have not been consistent in expressing relationship Combination field indicators and subfields13 raises13 complexity13 for the process

IV34 ILS13 analysis results

Some13 ILSs would not load the processed records because of13 the presence of13 $0 Others loaded but did nothing with13 the data

The TG members mocked up files of bibliographic and authority data adding various URIs in subfield $0 wherever subfield $0 is currently defined in MARC These files were uploaded13 into13 a number of ILS systems13 to see if the addition of subfield $0 with URIs13 caused problems No significant problems were found These files included URIs in subfield $0 which were not prefixed with the (uri) identifier

In OCLC the same $0 subfields were also not problematic OCLCrsquos validation of subfield $0 does not check13 the structure of subfield $0 in the same way13 as13 it does13 for control numbers13 in 760-shy‐78713 subfield $w or URLs in13 $u13 subfields Use of URIs in subfield $4 to express relationship information would require a change to OCLCrsquos13 validation of $4 subfields but that may13 be readily13 changed without extensive effort

IV35 Tools needed MarcNext Authority Toolkit

Currently the TG has tested13 and13 continued13 to13 work with MarcNext and Authority Toolkit The TG members continues collecting and recording additional tools and resources that facilitate practitioners in identifying and validating an RDF13 URI

Page13 16 of 22

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 17: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

IV36 Need to be13 able13 to easily report duplicates found in VIAF etc and need away to know13 which URI to use when duplicates are found

Throughout the first year of investigation and deliberation the TG learned though vocabularies and ontologies are structured13 per standards and13 published13 for adoption13 some are more domain13 specific than others Often there are more than one methods to structure a body of13 data Duplications can be expected across various datasets The13 reconciliation of URI is one13 of the13 tasks that the13 TG has recognized yet not in a position to recommend solution in the13 near term

IV4 Outcomes

IV41 MAC Discussion Papers [Charge 4]

The TG was aware that some aspects of its13 intended goals13 were not yet accommodated by the MARC format Following the defined workflows of13 MARC governance and standardization the TG submitted several discussion papers13 to the MARC Advisory Committee (MAC) As13 an initial preparation an informal discussion13 paper entitled13 URIs in MARC A Call for Best Practices by Steven13 Folsom had13 been13 discussed13 during the June 2015 MAC13 meeting It focused13 o subfield13 $0 Authority record13 control number or standard13 number its current usage its capability for URIs and13 addressed13 some aspects of best practice The paper generated13 extensive discussion and13 there was broad13 agreement that the13 time13 was right for the library community to begin using URIs consistently Steven Folsom was asked to cooperate with the PCC to develop a formal MAC Discussion Paper

In fall13 2015 the British Library (BL) submitted two papers to MAC for the13 January 2016 meeting independently of the TG covering title to title relationships via subfield $w and specific relationship information then discussed using subfield $013 The approaches taken by the BL in its papers coupled with the approach taken by13 the TG resulted in MAC suggesting13 that the British Library13 and the PCC should collaborate on submitting a paper for June 2016

During the MAC meetings at the ALA Annual Conference in Orlando in June 2016 three papers were presented13 by or in13 cooperation with the TG Discussion Paper No 2016b DP18 entitled Redefining Subfield $ to Remove13 the13 Use13 of Parenthetical Prefix (uri) in the13 MARC 2 Authority Bibliographic and Holdings Formats described the syntactical improvement that a subfield $0 containing a URI13 without the parenthetical prefix (uri) would allow so that13 automated processes could use the content13 of13 these$0s without having to strip away prefix The13 discussion paper was discussed at the13 MAC meeting and the recommendation was made that the13 discussion paper be13 upgraded to proposal status it was approved at the13 meeting as proposal From now on a $0 containing an identifier in the13 form of a web retrieval protocol eg HTTP URI should not13 be given a parenthetical prefix

second13 paper was presented to the13 MAC Discussion Paper No 2016b DP19 Adding Subfield $0 to Fields 257 and 377 in the13 MARC 2 Bibliographic Format and Field 377 in the13 MARC 21 Authority Format It resulted from extensive13 analyses of the13 MARC Bibliographic and Authority formats by the TG selecting fields13 which are to be controlled by an identifier Only those fields13 where an identifier can be applied with clear correspondence13 between the13 field and one13 entity were13 included in the13 paper The13 discussion13 paper was discussed at the MAC meeting and the recommendation was13 made that the discussion13 paper be upgraded13 to13 proposal status it was also13 approved13 at the meeting as a proposal Both13 changes will be included13 into13 the update 23 to13 the MARC13 documentation to13 be expected in fall 2016

Page13 17 of 22

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 18: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

The third paper Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in13 the MARC13 21 Authority and13 Bibliographic Formats was presented13 by the British13 Library in consultation with the TG13 This paper generated vivid discussions It was acknowledged that the13 approach to recording URIs for relationships using subfield $ was preferable13 to any of the13 other alternatives outlined by the13 paper The13 distinction between relator codes and relationship codes in the MARC format was questioned As of now an across-shy‐the-shy‐board13 solution13 for recording URIs for any data element in MARC subfield or field seems to be13 preferred by NDMSO over what it regards as an ad hoc solution for single elements This13 discussion will be continued13 this paper should not be considered in isolation but rather in the context of the other13 papers which the TG13 in is the process of submitting Taken as whole it is hoped that they will achieve the comprehensive solution which is sought throughout the MARC13 formats

IV42 Formulating amp Obtaining URI document [Charge 32]

A draft document was for commonly used13 sources for authorities and13 identifiers For each source screen captures13 were made showing where a URI could be found for a particular entity or how to formulate a URI once the identifier13 for13 the entity is known Before making this document available widely it must be determined how13 best to organize it Some13 resources provide13 URIs that directly represent13 a thing and others provide URIs that reference an authority (eg13 controlled or standard vocabularies which may or may not have underlying metadata about the thing) or a resource describing a thing The document needs to be able to distinguish this and inform catalogers which URIs are for real world objects and which are not In order to be helpful to developers building tools the document13 intends to also include descriptions of how data sources provide machine access to the data13 Is the data published13 as Linked13 Data available through13 http available through13 a SPARQL endpoint data dumps etc13 Another issue that13 must13 be determined is where to put13 the final document and how it13 will be maintained Should it be13 cooperatively maintained by the13 community (such as on a wiki) or should some group within PCC take responsibility for keeping it up to date and adding to it

IV43 Revisions to13 OCLC13 handling13 of HTTP URIs [Charge 31]

The question arises as to whether it would be better for catalogers to enter all needed URIs directly into the shared bibliographic record in WorldCat or whether OCLC13 should13 provide options for output of URIs based13 o data present in13 particular MARC13 fields and13 profiled13 library preferences Clearly some libraries will embrace use of URIs for their webb based13 catalogs while others may find13 them problematic in local13 displays of bibliographic information OCLC staff have looked into the issue and believe that the use of output options would13 likely produce more consistent results as well as meet the varying needs of libraries

The TG members are drafting a spreadsheet outlining the subfields that together name an entity for which a corresponding URI could be added in subfield $0 That spreadsheet will be useful as the basis for13 future specifications for13 use by OCLC system developers It will13 allow for a comparison of what is desired13 by the PCC13 cataloging community in13 terms of URIs corresponding to13 the entire named13 entity versus the existing13 use of subfield $0 and subfieldb $0b like information used in OCLC heading controlling functionality13 That heading control functionality allows for control numbers in multiple $ subfields corresponding to different parts13 of a named entity ie corporate name hierarchies names13 and titles

Page13 18 of 22

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 19: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

subjects13 and separately controlled subdivisions etc These are cases13 where output of multiple URIs13 corresponding only13 to part of the named entity13 would not be preferred

OCLC cataloging policies in this area are expected to evolve as this TG makes recommendations andOCLC development work moves ahead on the proposed13 output options for URIs

IV5 Next steps and in-depths analyses in year 2 [Charge 3 Charge 4]

In 2016-shy‐2017 the13 TG will continue13 an agenda13 focused on practical outcomes Work is already welladvanced on several of the13 following items

IV51 In collaboration with OCLC develop a specification for outputting URIs based on internal linkages present in13 WorldCat data

V52 Complete13 the13 MARC objectURI reconciliation document and seek toincorporate the information into formal13 MARC documentation

IV53 Produce13 work ID recommendation and use13 it in pilot implementationIV54 Produce13 discussion paper or proposal for handling relationships in MARCIV55 Consider additional targeted reconciliation projectsIV56 In consultation with stakeholders evaluate need for additional MARC

proposals or best practices IV57 RWO recommendationsIV58 Identify ldquohomesrdquo in PCC or elsewhere for aspects of the TGrsquos work that will

need13 further exploration13 or continuing upkeep IV59 Outreach advocacy trainingIV510 Etc

V RECOMMENDATIONS TO STAKEHOLDERS

During its first year the TG13 was very much focused on the needs and interests of the many differentstakeholders This13 is13 reflected both in the outcomes13 of the work completed so far13 (see Sec IV4Outcomes as well as in the plans laid out13 for13 year13 2 (see Sec III 5 Next steps and in-shy‐depths analysis in13 year 2) After careful consideration the TG proposes the implementation13 of URIs in13 MARC13 for thenear-shy‐term The sooner13 this process can begin the sooner13 the data providers eg libraries can producethe data that13 can be more easily transformed into linked data In order13 to facilitate progress towardsthis goal the TG developed the recommendations already outlined13 in13 the report above such13 as thespreadsheet identifying the phase 1 entities13 for identities ie the subfields13 that together name an entityin each MARC field (see Sec IV43 Revisions to OCLC handling of HTTP13 URIs) and the draft13 document13 Formulating13 an Obtaining13 URIs A Guide to13 Commonly Used13 Vocabularies an Reference Sources TheTG hopes that this document could be used as starting point to develop an official list of PCCsanctioned initial source vocabularies13 for embedding URIs

For the sake of13 consistency expediency and accuracy it13 is advisable to use automated processes for13 populating MARC13 records with13 URIs Individual catalogers doing this work manually is not a desirable

Page13 19 of 22

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 20: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

practice and13 could13 be less efficient13 Several13 possible ways to accomplish this goal have been outlined in this report13 (see Secs IV241 MarcEdit IV2 42 Authority Toolkit and IV43 Revisions to OCLC handling of HTTP URIs)

Outreach advocacy and training will be a core goal of phase 2 The TG is planning on working closely with stakeholders such as other PCC committees to influence cataloging policies and best practices that have been13 identified13 problematic for the implementation13 of URIs in13 MARC

Training needs related to implementation (for13 example13 how to obtain URIs or the13 difference13 between authorities and real world objects) will be13 communicated to the13 PCC Standing Committee13 on Training so that13 appropriate training can be either13 identified or13 developed

Though MARC is the most prominently13 used schema for library13 metadata it is frequently13 used alongside many others that may or may not allow for the inclusion of URIs In addition to that concern are the maintenance of identifiers recommendation in relation to reconciliation and possible13 ILS13 functional requirements The TG on URIs in MARC is recommending that13 new TGs be formed13 concerning URIs for non-shy‐MARC metadata

VI REFERENCES

1 The subgroup Work IDs in MARC has identified potential fields13 and scenarios13 to accommodate a work identifier (or multiple work identifiers)13 Considerations have been given to legacy data whether a work identifier (ID) already established in an authority format13 or not (7XX $t13 1XX240)13 An unambiguous relationship of13 a work ID among various vocabularies (024) and relationships among variant of a work etc The subgroup will present recommendations to the13 community in 2017

Links Meetings of the MARC Advisory Committee Agendas and Minutes

2015-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2015_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐15html

2016-shy‐0113 MAC meeting httpwwwlocgovmarcmacmw2016_agehtmlhttpwwwlocgovmarcmacminutesmw-shy‐16html

2016-shy‐0613 MAC meeting httpwwwlocgovmarcmacan2016_agehtmlhttpwwwlocgovmarcmacminutesan-shy‐16html

Papers

Page13 20 of 22

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 21: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

Informal13 discussion paper13 URIs in MARC13 A Call13 for Best Practices (Steven Folsom Discovery Metadata13 Librarian Cornell University) httpsdocsgooglecomdocumentd1fuHvF8bXH7hldY_xJ7f_xn2rP2Dj8o-shy‐Ca9jhHghIeUgeditpli=1

Discussion Paper No 2016-shy‐DP04 Extending the Use of Subfield $0 to Encompass Linking Fields in theMARC 21 Bibliographic Format (British Library) httpwwwlocgovmarcmac20162016-shy‐dp04htmlDiscussion Paper No 2016-shy‐DP05 Expanding the Definition of Subfield $w to Encompass StandardNumbers in the MARC 21 Bibliographic and Authority Formats (British Library) httpwwwlocgovmarcmac20162016-shy‐dp05html

Discussion Paper No 2016-shy‐DP17 Redefining Subfield $4 to Encompass URIs for Relationships in theMARC 21 Authority and Bibliographic Formats (British Library in consultation with the PCC Task Groupo URIs in13 MARC) httpwwwlocgovmarcmac20162016-shy‐dp17html

Discussion Paper No 2016-shy‐DP18 Redefining Subfield $0 to Remove the Use of Parenthetical Prefix(uri)13 in the MARC 21 Authority Bibliographic and13 Holdings Formats (PCC13 Task Group13 o URI in13 MARC13 in consultation with the British Library) httpwwwlocgovmarcmac20162016-shy‐dp18html

Discussion Paper No 2016-shy‐DP19 Adding Subfield $0 to Fields 257 and 377 in the MARC 21 BibliographicFormat and Field 37 in the13 MARC 2 Authority Format (PCC URI in MARC Task Group) httpwwwlocgovmarcmac20162016-shy‐dp19html

MARC Format Overview Status Information httpwwwlocgovmarcstatushtml

Examples for Sec IV21

This LC subject heading string is linked to three different authority records The links are OCLCrsquosARNs No single13 $0 could be13 output for this subject access point

650 0 ǂa NeurologistsltLink2068890gt ǂz New ZealandltLink255121gt ǂv BiographyltLink4933801gt

This medical subject string is linked to one authority record although the controlling process linksindividual subfields It is a candidate for output of a single $0 with a URI13 because the links all13 refer to thesingle authority record In the case of MeSH unlike LCSH the $0 subfield displays in Connexion SeeOCLC record 957132118

650 12 ǂa NeurologyltLink(DNLM)D009462Q000266gt ǂx historyltLink(DNLM)D009462Q000266gt

Page13 21 of 22

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22

Page 22: Task Groupon URIsinMARC !! Year OneReport · 06-10-2016  · The first year since the inception of theURI in MARCTaskGroup (TG)began, despite the extremely ... believedfollowing the

Displays as650 12 Neurology ǂx history ǂ0 (DNLM)D009462Q000266

So it could be13 output with single13 $ containing the13 corresponding URI for the13 MeSH heading

Page13 22 of 22


Recommended