Automating Name Authority Record Updates and Bibliographic File Maintenance
Catalog Management Interest Group, ALA Annual, Chicago, IL, June 29, 2013
Lucas MakMichigan State University Libraries
A Proof of Concept
Authority Control at MSU
1.5 millions Authority Records (1.1 millions NARs) In-house
NACO institution Database maintenance
Post-cataloging Authority Control New Headings Report
• Download NARs from SkyRiver Updates to NARs not necessary caught
• 1XX (No item cataloged under changed 1XX not in New Headings Report)
• Elements other than 1XX (e.g. 4XX, 670)
LC/NACO NAF RDA Transition
PCC Day 1 for RDA NAR: Mar. 31, 2013 Phased reissuance of NARs
Phase 1 • Scope
– NARs with characteristics known to be at variance with RDA practice – Not candidates for any of the mechanical changes to be made during phase 2
• Adding a 667 note “THIS 1XX FIELD CANNOT BE USED UNDER RDA UNTIL THIS RECORD HAS BEEN REVIEWED AND/OR UPDATED”– Completed Aug. 20, 2012 (436,943 records processed)
Phase 2• Programmatic changes to 1XX headings that are not acceptable under RDA (e.g.,
changes to Bible headings, spelling out Dept. and months, etc., abbreviations in the subfield $d for personal names)
• Completed March 27, 2013 (371,942 records changed)
Updates of NARs by NACO institutions Reviewing, upgrading, and recoding Phase 1 records to
RDA Adding any of the 17 new MARC fields (e.g. 046, 372, etc.) Routine NAR maintenance
• PCC post-RDA test guidelines “strongly encourage” to evaluate and recode the “RDA-acceptable AACR2 NARs” to RDA whenever possible
Objectives
To catch changes to NARs Changes in 1XX Addition, deletion, or updates of elements other than 1XX
To perform related BFM if 1XX in a NAR is changed
Tasks
To download NARs one-by-one/in bulk To detect updates to NARs already existing in ILS To overlay existing NARs with updated ones Updates authorized access points (AAPs) in bib
records if 1XX in NAR updated To automate and link up the above tasks
Task #1: Download NARs OCLC LCNAF SRU Service
Can be searched by LCCN Available in multiple schema including MARCXML SRU-based service (HTTP request) FREE!! But:
• Updated every Monday night• Bulk download – by search term (e.g. after certain date)
Implementation• Search LCCNs one-by-one by AutoIt script
– Around 10 records/sec. retrieved• Download XML files into one folder (files named by LCCN)
Task #2: NAR Update Detection To compare NARs from ILS and NARs from LC/NACO NAF by XSLT
MARC 005 (timestamp)
If timestamp more current on the NAR from NAF Overlay the NAR in ILS
Task #3: Export/Overlay of NARs
MarcEdit Export updated NARs into ILS Through TCP/IP (Host address, Port, .mrc file) One-by-one (though .mrc file can contain multiple NARs)
Task #4: Updates of Bib AAPs
XSLT To detect changes in 1XX between old and new NARs To build AAP conversion table (a TXT file) when 1XX is
changed AutoIt
Automate bib AAP updates by “Global Update” module in ILS• Read old and new AAPs from the TXT file and fill out info required
in “Global Update” process
Task #5: Automation
Use AutoIt to: Link up various steps in the workflow Automate searching against OCLC LCNAF SRU Service by
compiling and sending HTTP requests Execute various XSLTs in a predetermined sequence
• e.g. NAR comparison AAP comparison Read TXT files (LCCN list, AAP conversion table) created by
XSLT processes Run MarcEdit to overlay obsolete NARs Execute “Global Update” process
Basic Workflow
ILS
ILS NARs
Extract by Create Lists
LCCNs
Extract by XSLT
Search by AutoIt
LC/NACO NARs
Retrieve
Updated NARs
Compare by XSLT
Overlay by MarcEdit
Updated Headings
Global Update
Data Integrity Issue #1
No ILS ARN in extracted NARs Needed for 949 overlay command Solution
• Extract “LCCN” & “ILS ARN” pair through Create Lists• Merge ARN into extracted NARs (907$a) by XSLT/MarcEdit
Data Integrity Issue #2
NARs without 010 010 contains LCCN Some LCCNs transposed into 035
• Original prefix (n, no, nb, nr) removed• Prepended with prefix (OCoLC)• Possibly done during system migration
Solution1. Search string in 035 (excl. prefix) as keyword in SkyRiver2. Retrieve complete LCCN from matched record3. Search retrieved LCCN against OCLC Service and download the
record
Data Integrity Issue #3
Existing NARs without 005 No timestamp
• Bring in the new NAR whenever the old NAR lacks 005
Data Integrity Issue #4
Local data in NAR Local call no. (e.g. 050, 090, 053$5) Institution code & initials (shared catalog) Copy local data into new NAR before overlay
Search and Retrieval Issue #1 “Blank” XML File from OCLC LCNAF SRU Service
Search and Retrieval Issue #1 (Cont’d)
No hit for some LCCNs XML file size: < 2KB LCCNs in places other than 010$a Not indexed
• Cancelled LCCNs (010$z)
Solution1. Compile a list of LCCNs with file size < 2KB2. Search LCCNs in SkyRiver by Keyword3. Get new LCCNs from 010$a4. Search OCLC LCNAF SRU Service using new LCCNs But …
Search and Retrieval Issue #2 Keyword search in SkyRiver returns multiple hits
Undifferentiated & related NARs
Write LCCNs with multiple hits to a log file for manual review
Person broken out from undifferentiated NAR
Original undifferentiated NAR cancelled
Search and Retrieval Issue #2 (Cont’d)
• Keyword search in SkyRiver returns multiple hits Same numeral part of LCCN with different prefixes
Write LCCNs with multiple hits to a log file for manual review
NAR contributed
via RLIN
NAR contributed
via OCLC
Search and Retrieval Issue #2 (Cont’d)
Keyword search in SkyRiver returns no hit The LCCN in question no longer exists in NAF
• NAR containing cancelled LCCN was cancelled again– Loss of 010$z
• Write no-hit LCCNs into log file for manual review
Search and Retrieval Issue #2 (Cont’d) Keyword search in SkyRiver returns no hit
False negative• Space between prefix and number removed• Hyphen within number removed (e.g. n 85-342238 n
85342238)– Search normalized LCCNs
• Delay in returning result for a search due to slow or unstable Internet connection speed– Set a longer wait time before trying to copy new LCCN– Run keyword search in SkyRiver in loop until
» Number of entries in log file equals to immediate preceding round, or
» File size of the no-hit log file equals zero
Global Update Issues
ILS interface navigation AAPs with diacritics
Found by search in Global Update module but couldn’t be replaced
Code points & exact match in Global Update Old AAPs not found
Corresponding bib records deleted “Orphan” NARs Write LCCN to log file for manual review
Not Found & Search
Revised Workflow
ILS NARs
LCCNs
LC/NACO NARs
ILS
Updated AAPs
Extract
Extract
Search
Found & Retrieve
Compare
Global Update
ARN- LCCNLog File
Not Found/
Multiple Hits
Merg
e
Retrieve New LCCN
Search
Fishy NARsUpdated NARs
Overlay by MarcEdit
AAPs Not Found
Test Results
82,398 NARs tested 81,362 NARs needed to be overlaid* 4,584 AAPs became obsolete 10,900 bib records had at least one heading flipped* Many NARs exported from ILS do not contain field 005
Limitations Identities broken out from undifferentiated
NARs can’t be detected Partially taken care of by “New Headings Report”
AAPs have no corresponding NARs Non-Latin script parallel APs in Field 880 Scalability issues
Slow export using MarcEdit Slow “Global Update” process Memory intensive XSLT process
• “Java heap space” out of memory error
Possible Enhancements
“Data Exchange” module for NAR overlay Data Exchange module – record load function Manual intervention needed
SQL backend of Sierra (Sierra DNA) Write SQL commands to batch changes But, EDIT function not yet available through SQL command
AACP (Automatic Authority Control Processing) Flip AAPs matching 4XX in NARs to corresponding 1XX in an overnight
process Replace “Global Update” with AACP
• “Rig” undated NARs by inserting obsolete AAP as 4XX• Export “rigged” NARs to ILS to trigger the overnight process• Overlay exported “rigged” NARs in ILS with original updated NARs