Implementing FRBR on Large Databases

Post on 23-Feb-2016

36 views 0 download

Tags:

description

Implementing FRBR on Large Databases . Thomas Hickey Diane Vizine-Goetz OCLC Research. What is FRBR. IFLA study group report: Functional Requirements for Bibliographic Records Bibliographic model independent of cataloging rules Clusters bibliographic items into a f our-level structure - PowerPoint PPT Presentation

transcript

Implementing FRBR on Large Databases

Thomas HickeyDiane Vizine-Goetz

OCLC Research

2CNI 2002 Fall Task Force

What is FRBR• IFLA study group report: Functional

Requirements for Bibliographic Records • Bibliographic model independent of

cataloging rules• Clusters bibliographic items into a four-

level structure• Work• Expression• Manifestation• Item

3CNI 2002 Fall Task Force

Control of Entities in FRBR

ItemManifestation

ExpressionWork

Corporate Body

Person Concept

PlaceEventObject

Entities

SurrogatesUniform titlesCitations Names Subjects

4CNI 2002 Fall Task Force

Why FRBR?• Potential to improve:

– Cataloging– Discovery– Delivery

• By– Bringing versions of works together– Showing relationships of various kinds– Enabling users to navigate to level of

interest

5CNI 2002 Fall Task Force

Research on FRBR & WorldCat• Subsets

– By library, region– Example/problem sets

• Shakespeare, the Bible• Humphry Clinker• 1,000 random works

– By genre• Dissertations• Fiction

• Whole file, 47 million bibliographic records

6CNI 2002 Fall Task Force

Our Approach• Concentrating on work-level

– Problems with expression-level clusters

• Efficient, maintainable, understandable

• Few, if any, false matches with correct cataloging– Err on the side of missed matches– Some accommodation of frequent

variants• Compare with manually clustered

7CNI 2002 Fall Task Force

The Algorithm• A key is generated for each record• Extract author, title

– Look up in NACO authority file– Added entry information as needed

• Form a key from bibliographic record– Author, title, added entry information– These can be sorted, compared

10CNI 2002 Fall Task Force

Problems• Many (17%) records do not have

– Author main-entry– Uniform title

• In general these can not be matched– Look at added entries– Information at the expression and

manifestation levels– Handled separately– 180,000 clusters involving ~400,000

records

11CNI 2002 Fall Task Force

Top 10 WorldCat Clusters# Recs Author/Title Key

8,383 bible\n t8,055 bible6,174 bible\authorized4,033 bible\o t\psalms3,964 haggadah3,477 great britain/treaties etc2,402 bible\o t2,248 koran2,153 arabian nights

12CNI 2002 Fall Task Force

Top 10 from a Public Library# Recs Author/Title Key

89 bible\authorized85 mother goose84 chopin, frederic\1810 1849/piano music81 schulz, charles m/peanuts63 davis, jim/garfield61 moore, clement clarke\1779 1863/night before

christmas60 mozart, wolfgang amadeus\1756

1791/instrumental music58 bach, johann sebastian\1685 1750/cantatas57 beethoven, ludwig van\1770 1827/sonatas56 twain, mark\1835 1910/adventures of

huckleberry finn

13CNI 2002 Fall Task Force

Results• Manual estimate: 1.5

manifestations/work in WorldCat• Algorithm: ~1.3• 25,844 clusters have 20 or more

records• 401,659 clusters have 5 or more

records

14CNI 2002 Fall Task Force

Preliminary Plans• Build structures for FRBR into new

catalog• Expose FRBR clustering for

searching• Make visible in cataloging

– As consensus on implementation is developed

– As cataloging rules accommodate FRBR

15CNI 2002 Fall Task Force

Spin-offs• NACO normalization code

– Testbed– Server

• Authority work– ePrints UK

• FRBR in other projects– FictionFinder– NDLTD union catalog

16CNI 2002 Fall Task Force

Fiction Subset • 2,665,662 WorldCat records • 1,758,479 work clusters• 1.5 records/cluster• 3,866 clusters have 20 or more

records• 50,540 clusters have 5 or more

records

17CNI 2002 Fall Task Force

Top 10 clusters for fiction# Recs Author/Title Key

1,296 defoe, daniel\1661 1731/robinson crusoe1,248 carroll, lewis\1832 1898/alices adventures in

wonderland 971 cervantes saavedra, miguel de\1547 1616/don

quixote 828 stevenson, robert louis\1850 1894/treasure

island 689 twain, mark\1835 1910/adventures of

huckleberry finn 624 twain, mark\1835 1910/adventures of tom

sawyer 618 swift, jonathan\1667 1745/gullivers travels 600 andersen, h c\hans christian\1805 1875/tales 581 stowe, harriet beecher\1811 1896/uncle toms

cabin 570 arabian nights

18CNI 2002 Fall Task Force

FictionFinder• Employs work clusters in a prototype

system for searching and browsing bibliographic records for fiction

• Indexes records at the work level and organizes displays by work and expression (primarily language)

• Includes records for textual items; additional modes of expression (moving image, sound) to be added later

395 records for author “crichton, michael\1942” clustered into 17 entries

23 airframe 40 andromeda strain 5 binary 11 case of need 44 congo 26 disclosure 5 disclosure a novel 16 eaters of the dead 7 eaters of the dead the manuscript of ibn fadlan relating his experiences with the

northmen in a d 922 27 great train robbery 47 jurassic park 25 lost world 37 rising sun 31 sphere 7 sphere a novel 19 terminal man 25 timeline 395

Typical Results Set Display

Typical Work-level Display

Typical Results Set Display

Typical Work-level Display

24CNI 2002 Fall Task Force

Benefits • Aggregated displays for works and

expressions• Enhancement of (fiction) records at

work level– with elements from records within the

work cluster (e.g., summaries, genre terms, subject headings, class numbers)

– with external data (e.g., literary prizes, prequels/sequels, evaluative content)

25CNI 2002 Fall Task Force

Challenges• Identifying appropriate bibliographic

data for systematically grouping or differentiating works and expressions – Works

• Genre (graphic novel v.s novel)• Genre + mode of expressions (audio book v.s

radio play)• Degree of modification (abridgement of juvenile

work v.s an adaptation for young children)– Expressions

• translators, illustrators, editors

26CNI 2002 Fall Task Force

Next Steps• FRBR algorithm

– Explore applications– Refine algorithm as needed

• FictionFinder– Add records for sound and image– Conduct user studies

27CNI 2002 Fall Task Force

Links• Functional Requirements for Bibliographic

Records - Final Report– http://www.ifla.org/VII/s13/frbr/frbr.htm

• Experiments with the IFLA Functional Requirements for Bibliographic Records (FRBR)– http://www.dlib.org/dlib/september02/hickey/09hicke

y.html• OCLC Research Activities and IFLA's Functional

Requirements for Bibliographic Records– http://www.oclc.org/research/projects/frbr/index.shtm

• Implementing FRBR on Large Databases– http://staff.oclc.org/~vizine/CNI/OCLCFRBR.htm