The DeathFlip Project: Killing Off Your Authors Practically Painlessly Michael Kreyche, Systems...

Post on 12-Jan-2016

213 views 0 download

transcript

The DeathFlip Project:Killing Off Your Authors Practically Painlessly

Michael Kreyche, Systems LibrarianAmey Park, Database Maintenance Librarian

Kent State UniversityMay18-20, 2009

2

Summary

At a Glance

LC Policy Change—Adding Death Dates

Record Maintenance Nightmare

Brainstorming for Solutions

"DeathFlip" Authority Records

Mostly Automated Procedures

Record Creation

3

The Short Version:

library.kent.edu/deathflip

4

5

Typical DeathFlip Record

00239nz 2200097n 4500

008 080125n| acannaab |n aaa ¶

100 1 ‡a Burns, George, ‡d 1896-1996¶

400 1 ‡a Burns, George, ‡d 1896-¶

667 ‡a DeathFlip¶

670 ‡a 20060201¶

910 ‡a n 79065048 ¶

6

The General Idea

1. Load all the DeathFlip records. Twice.

2. Wait for AACP to run overnight

3. Check your “Updated bibliographic headings” report.

4. Gloat.

5. Check your “Near matches” report.

6. Do a little cleanup.

7

FAQ

How many DF records do you have?– 18,358 so far.

Did you say load them TWICE? – Yes, once as names and once as subjects.

Should I load them all?– Yes.

Won’t I get a lot of blind references reported?– Yes. But it’s OK.

Won’t I get a lot of duplicates reported?– Yes. But it’s OK. Really.

8

Interesting equation: b + d = tDF

If:– Your authority file is up to date and– You load a DeathFlip file

The following equation holds true for your reports:Blind references + Duplicate records

= Total DeathFlip records in file

9

20 Year Old LC Policy

“Don't add dates to existing headings”

Criticized by members of the library community, especially when headings– are for famous dead people and – include a birth date but lack a death date

Examples:• 100 1# $a Warhol, Andy, $d 1928-• 100 1# $a Dali, Salvador, $d 1904-• 100 1# $a Nixon, Richard M. $q (Richard Milhous),$d 1913-• 100 0# $a Diana, $c Princess of Wales, $d 1961-• 100 0# $a John Paul $b II, $c Pope, $d 1920-

10

Death Dates Chronology

June 2005: LC Proposal

September 2005: LC Decisions

Fall 2005: LCRI 22.17 revision

2006: Big LC project to add death dates to "selected" headings

2007: KSU finds a solution

11

LC's Proposed Change, 2005

“Allow the optional addition of dates (birth, death or both) to existing personal name headings at will.... Catalogers would not be required to add dates to existing personal name headings (other than to resolve conflicts) but may exercise judgment and add the date or dates if these are judged to be useful.”

Comments?

http://www.loc.gov/catdir/cpso/pndatesorig.html

12

Comments

93 “yes” comments: wholeheartedly supporting the proposal as posted

28 “partial yes” comments: ... not the addition of dates “at will” ... reduce the initial impact of BFM*

12 “non-voting” comments

6 “no” comments: totally disapproved ... generally cited the impact of BFM*

* Bibliographic File Maintenance

13

A Few Specific Comments

Provide notification lists of the names changed in order to expedite BFM in local catalogs

Retain former heading in a 400 field to expedite BFM or machine “flipping” in some local systems

Use “b.” for all beginning dates, thus eliminating open dates altogether

14

LC's Key Decisions

Allow the optional addition of death dates to established headings that contain birth dates only

Investigate the development of a notification service for changed headings

Investigate changes to the MARC 21 authority format for coding “former headings” in a discrete MARC tag

15

OCLC Feed and Archive

http://www.oclc.org/rss/feeds/authorityrecords/default.htm

16

MARC Revision Chronology

December, 2006: proposal published(http://www.loc.gov/marc/marbi/2007/2007-02.html)

January 2007: amended/approved by MARC Advisory Committee

May 2007: approved by LC/LAC/BL

October 2007: MARC21 Authorities Update 8

Proposal No. 2007-02"Incorporating invalid former headings in 4XX

fields of the MARC 21 Authority Format"

17

MARC21 Authorities Update 8

Two Changes– Added code "h" for 4xx $w, byte 1 (No

reference structures): Indicates that the reference is not valid in any reference structure [i.e. name/subject/series].

– Expanded usage of 4xx $i:When subfield $w/1 contains code h (No reference structures), subfield $i may contain the date that a heading became invalid.

18Suppressing Display of “Flip” 4xx FieldsExisting coding (seems good enough to me):

100 1 ‡a Burns, George, ‡d 1896-1996¶

400 1 ‡wnnea ‡a Burns, George, ‡d 1896-¶

New coding:

100 1 ‡a Burns, George, ‡d 1896-1996¶400 1 ‡wnhe ‡a Burns, George, ‡d 1896-¶

19

For the Time Being:

No LC 4xx for “Former Headings”

AACP can't flip the bib headings automatically.

Each new death date requires global changes or manual correction

Very unhappy authority control librarian!

20

Workflow Implications

Records from Vendor:– updated authority record only• heading shows up on blind reference list• use global update when more than a few

records• manual editing for onesies/twosies

– with a "current" bib record• older bibliographic records not reported—

split file!!!• check all headings on the Death Date list?

21

Vendor Option (Backstage)

Developed for Wofford College– Supply brief records to flip headings– Supply full records to replace brief ones– 72% increase in cost over existing service

Can we do it ourselves?

22

DeathFlip Plan A

Create DF records from OCLC feed

Load DF records with custom table– Match existing records– Add identifying field

Output a copy of matched records

Overlay matched records with DF records

Wait overnight for AACP

Overlay DF with original records

23

Thinking the Unthinkable

What if:

We loaded all the DeathFlip records– Even if it meant two records per heading– Even if we didn’t have bib headings

Would AACP work?

Yes!

24

DeathFlip Plan B

Create DF records from OCLC feed

Load All DF records– Some will "duplicate" real records– Some will report as blind references

Wait overnight for AACP

Suppress/Delete DF records

Repeat as needed

25

Typical Basic Workflow

Day 1– Clear headings reports– Load DeathFlip records

Day 2– Generate reports– Suppress or delete records– Clear reports (wait till next day if

suppressed)

Names/Subjects can be done separately

26Last Quarterly Loads (March 2009)

Subjects

Names

All

Records loaded 1,115 1,115

Fields updated 247 3,497 3,744

Bib records updated

133 2,493 2,626

Headings 24 333 577

27

Timing

Load DF before vendor records– Flips retro records and the newly returned

records. – Report when loading bslw shows lots of

duplicate records, but can pretty much ignore or scan quickly. Blinds worth looking at.

Otherwise:– New records with a death date report as

blind because non-current bib records haven't been flipped yet—have to look at all because some really are legit (especially conference and corporate names).

28Initial Record Loads—December 2007

Subjects

Names

All

Records loaded 9,949 9,949

Fields updated 461 4,078 4,539

Near matches 617 419 1,036

Headings from February 2006-August 2007

29

Testing the equation: b + d = tDF

Subjects

Names

All

tDF (records loaded)

9,949 9,949

b (blind references)

8,280 4,845

d (duplicate records)

1,554 4,965

b + d 9,834 9,810

Missing records 115 139 254

30Identifying Missing Authority Records

Tentative process:1. Make a list of loaded DF records;

export record number, 910, 1xx subject, 1xx name to text file

2. Make a list from the blind/duplicate report; export record number, 1xx subject, 1xx name to text file

3. Merge and sort text files; identify non-duplicated record numbers

4. Look up in OCLC by LCCN (910)

31

Timing for duplicate detection

Duplicate detection works best when your database is more up to date than the DeathFlip records.

If you get records from a vendor, load the DeathFlip records after the vendor records.

32

OCLC's RSS Feed

33

Processing the Feed—Raw Code

34

Step 1: Download Feed

Perl script saves current feed– Covers about the last two months– Older data migrated to HTML archive

Two stages:– Initially, harvested archive– Harvest feed at least every two

months

Character encoding changes over time

35

Step 2: Extract Data

Another Perl script– Parses Feed– Produces Tab-Delimited file

36

Step 3: Build MARC Record

Yet another Perl script– Fixes delimiters and other characters– Restructures data into MARC

37

Occasional Retrospective Flip?

Unsuppress or reload earlier DF records to fix newly loaded bib records

March 2008 (3 months after first load)– 17 fields changed

May 2009 (1½ years after first load):– 122 fields changed for 36 headings

How long is it worthwhile???

38

Outcomes

Pluses– Saves a huge amount of work– Can find missing authority records

Minuses– Doesn't catch TOC fields (970)– Doesn't catch headings with $e, $t,

etc.– Sometimes the DF headings get

updated

39

Original:– Gimbutas, Marija Alseikaitė,|d1921-

DeathFlip:– Gimbutas, Marija Alseikaitė,|d1921-1994

Latest:– Gimbutas, Marija,|d1921-1994– This heading reported as blind.

40

Go For It!

library.kent.edu/deathflip

41

Bibliography

Addition of Death Dates to Personal Names. http://www.loc.gov/catdir/cpso/pndates.html. See also:

– http://www.loc.gov/catdir/cpso/pndatesorig.html

– http://www.loc.gov/catdir/cpso/deathdates.pdf

LC Proposal, Addition of Dates to Existing Personal Name Headings. June, 2005. http://www.loc.gov/catdir/cpso/pndatesorig.html

LC Analysis of Comments Received and Corresponding Decisions. September, 2005 http://www.loc.gov/catdir/cpso/deathdates.pdf

Policy on the Implementation of Revised LCRI 22.17, Notice of OCLC RSS Feed for Local Bibliographic File Maintenance. Fall 2005. http://www.loc.gov/catdir/cpso/lcri22_17imp.html

Park,Amey L. Death Dates Added to Some Personal Name Headings. TECHKNOW: A Quarterly Review of Bright Ideas For the Technical Services Division. Ohio Library Council. http://www.library.kent.edu/files/TechKNOW_July_2006.pdf