18 March2009
Identifiers and GLIMIR
OCLC Symposium for Publishers and Librarians
Identifiers and GLIMIR
OCLC Symposium for Publishers and Librarians
Janifer GatenbyResearch Integration and StandardsOCLC
IdentifiersIdentifiers
Resource Identifiers
Creator Identifiers
Institution Identifiers
Importance of identifiersImportance of identifiers
Identifiers Seal Uniqueness: “n” number of other elements are necessary for uniqueness
Commerce:
distribution, promotion, rights management, copyright protection, royalty payments
On the web:
key to navigation among sites for resources & information about resources
GLIMIRGLIMIR
Glimir (Global Library Manifestation Identifier) is a project to connect all metadata records for the same resource or publication, starting initially with WorldCat.
Importance of RANKImportance of RANK
Ultimate Goal of GLIMIR: to link resources in different sites with a single agreed “canonical” identifier to cluster hits and thereby to maximize the rank of library resources in web sphere
Resource Identifiers: GLIMIRResource Identifiers: GLIMIR
Global Library Manifestation Identifier (Manifestation = Resource)
• No one single manifestation identifier
• ISBN, ISSN, ISMN (music), ISRC (sound recordings), V-ISAN (audio-visual)
• DOI
Only 30% of WorldCat resources have an international identifier
Outside use of OCLC numbers….Outside use of OCLC numbers….
A subsidiary of the US ISBN Agency
Linking inwards: OCLC Permalinks Linking inwards: OCLC Permalinks
• Simple URLs permitting direct access into WorldCat
• Want to use them (or equivalent) for accessing same resource on other databases
www.worldcat.org/oclc/225507364
Linking inwards: WorldCat APILinking inwards: WorldCat API
Identifiers [SRU or OpenSearch]– direct access to single metadata record
– thence to full text
– thence to enriched content
•Citations, holdings, OPAC links
•Potentially audience level, copyright
http://worldcat.org/devnet/index.php/Main_Page
Linking outwards: Linking outwards:
To:
xISBN xISSN xOCLCNUMxISBN xISSN xOCLCNUM
• Web services to find all related editions of a resource
• Easily incorporated into library catalogs, Web sites, and other library applications
• See http://worldcat.org/devnet for more info100+ ISBNs for Sorcerers Stone
32 English (US and UK)9 Spanish 3 Russian, German, Finnish , Latin2 Chinese, Czech, French, Korean, Norwegian, Persian, Polish,
Portuguese, Romanian, Turkish, Welsh, 1 Afrikaans, Albanian, Armenian, Basque, Bengali, Georgian, Galician,
Gaelic, Ancient Greek, Greek, Gujarati, Hindi, Hungarian, Icelandic, Italian, Japanese, Latvian, Lithuanian, Malayalam, Sherpa, Slovenian, Swedish, Thai, Ukrainian, Urdu
16 Audio59 Book
ISSN History ToolISSN History Toolhttp://worldcat.org/xissn/titlehistory?issn=0888-5885
SRU record update : Near Real TimeSRU record update : Near Real Time
Inserts, updates, deletions
Machine & QA corrections and merges
Identifiers, inserts and corrections
Near Real Time Update
• Dutch union catalogue (GGC) in 12 months 560,000 records, 2 million holdings
• Libraries Australia went live on 16 January 2009
Data identifier export serviceData identifier export service
•Provision of work level identifiers associated with a member’s subset of WorldCat
•Permitting clustering of result sets (without any retrospective data conversion)
Family of IdentifiersFamily of Identifiers
Work
Expression Expression
Manifestation ManifestationManifestation Manifestation
AuthorsWork
Subjects,Dewey +
GLIMIR
ISBN, ISSN, ISMN, V- ISAN
WCat Identities, VIAF
ISNI, ISILFRBR clusters
ISTC, ISWC, ISAN
ISCI
Identifiers and Required GroupingsIdentifiers and Required Groupings
Author
Author
Subject
Class
IR LangIR Lang Phase 1
Linked content at the right levelLinked content at the right level
Work
Expression Expression
Manifestation ManifestationManifestation Manifestation
AuthorsWork
Subjects,Dewey +
Full text links, usage statistics, cover art, holdings
Full text links, usage statistics, cover art, holdings
Biographies, affiliationsBiographies, affiliations
Reviews, evaluation, lists, prizes Reviews, evaluation, lists, prizes
Tidy identifiersTidy identifiers
All resources identified in a global scheme
Possible to do cross database links more reliably; mashups
Improved quality of WorldCat
Statistical data is consolidated
Important for copyright registry
Step 1 - OCLC WorldCat QualityStep 1 - OCLC WorldCat Quality
DDR project – Duplicate Detection and Resolution (Q1 2009)
• Will not eliminate “intentional duplicates” – different language of cataloguing, different schemes of transliteration, institutional records
• Require mechanism to cluster these variants & be able to select the most appropriate for display depending on the user
Impact on quality
• Manifestation count, in addition to record count
• Impact on all products and services
• Part of identifiers architecture
Further StepsFurther Steps
Optimise OCLC products and services
• Identifiers at work and contributor level
• Display most appropriate record
• e.g. Collection analysis, WCRS, xISBN, xISSN
Towards a global identifier
• Diffusion of identifiers
• Increased coverage
• Manifestation resolution services
Weibel LinesWeibel Lines
Read more at: http://weibel- lines.typepad.com/weibelines/2008/02/a-glimir-of-the.html
“Interesting and challenging issues arise in the design of such identifiers and their supporting infrastructure. Broad adoption will require a careful balance of use-cases, business issues, and community participation in meeting the need. All of this in an environment already crowded with myriad special purpose identifiers.”
“To the extent that such identifiers are canonical – that is, become the dominant identifier for a given asset, they increase the “URI equity” for library assets and will strengthen the library presence on the Web. “
Author identificationAuthor identification
• OCLC & libraries potential core part of ISNI consortium
• BnF, BL, CISAC, ALCS, Adami, Bowker, IFRRO, Prolitterus
• ISNI cannot be managed in the same way as ISBN etc.
• Authors do not stay with one publisher – cannot hand blocks to publishers
• Need database for management & ideal to start with an existing database
• ISNI proof of concept
• VIAF & WorldCat identities + ALCS, UK, Adami & CISAC
ISNI identifiersISNI identifiers
• Seal uniqueness of an identity
• Permit the liaison of the same identity in different databases
• Significant redundancies can be eliminated by sharing data
• And expanding: Czech republic, Israel, Italy, Japan, Portugal, Slovenia and Spain
VIAF (www.viaf.org )VIAF (www.viaf.org )
Enhanced Records
Records with links
Total links
LC 4,788,892 1,072,159 1,488,970
BnF 1,053,729 593,484 917,209
DNB 3,093,385 721,477 1,097,228
Sweden 116,599 51,606 122,439
Totals 9,053,605 2,438,726 3,625,846
VIAF
Swedish
WorldCat IdentitiesWorldCat Identities
Dependence on• VIAF
• Authority records in WorldCat
• FRBR comparison algorithm (work level)
Sources:
WorldCat• 126 million bibliographic records
VIAF
Wikipedia
Each ‘name’ in WorldCat
Personal names: 24,669,126
Corporate names: 7,029,257
Subject names: 14,445 (e.g. animals, gods, imaginary
characters)
http://www.worldcat.org/identities/
WorldCat Identities
VIAF LINKS
ISNI Proof of ConceptISNI Proof of Concept
• For VIAF & libraries
• Enables significant honing of author / work links
• Enriched data (biographic)
• For ALCS & industry
• Matching techniques & expertise
• Freely available data (9 million VIAF, 31+ million WCat identities)
• direct search and SRU API
• Wiki interface for corrections and enhancements (trial)
• Biographical links (Wikipedia ++)
• Enriching links to holdings, translations, non commercial works (unreported royalties?)
ALCS dataALCS data
• 311,046 line entries (author / work)
• Semi Colon delimited
• contactid; nameid; IPINameNumber;
• prefix; forename; middlename; surname; suffix; dateofbirth; dateofdeath; nationality;
• contribution; ISBN; title; subtitle
• 35,088 unique contact identifiers (parties), about 40,000 public identities
• 27,319 matches (78%)
Jack Curtis / David Harsent (& Oliver Dalton, Francis Greig, David Lawrence, David Pascoe) Jack Curtis / David Harsent (& Oliver Dalton, Francis Greig, David Lawrence, David Pascoe)
A bird's idea of flight ALCS OCLCAfter dark LC OCLCAnother round at the pillars ALCS LC OCLCConfessor ALCS OCLCConjure me ALCS OCLCCrows' parliament ALCS LC DNB OCLC Sam LawrenceDer blick des magiers DNB OCLCDer schrei der schwalbe ALCS DNB OCLCDie spur der krahe ALCS DNB OCLCDreams of the dead ALCS LC OCLCFrom an inland sea ALCS OCLCGawain libretto ALCS LC BNF OCLCGlory ALCS LC DNB OCLCLe parlement des corbeaux ALCS BNF OCLCLegion ALCS LC BNF OCLCLes enfants du matin ALCS BNF OCLCLivewire chillers ‐ down came a spider ALCS OCLC no author attribution
Jack Curtis / David HarsentJack Curtis / David Harsent
Marriage ALCS BNF OCLCMirrors kill ALCS LC OCLCMort ou vif BNFMr. punch ALCS LC OCLCNews from the front LC OCLCPoint of impact LC OCLCPotted priest OCLCRicordati di me ALCSRuchlos DNB OCLCSelected poems 1969‐2005 ALCS LC OCLCSons of the morning ALCS OCLCSorrow of Sarajevo OCLCSprinitng from the graveyard LC OCLCStorybook hero OCLCTerrahawks ALCS OCLCTonight's lover OCLCTruce LC OCLCViolent entry ALCS LC OCLC
David HarsentDavid Harsent
Jack CurtisJack Curtis
Jack Curtis BNFJack Curtis BNF
This record would match with the LC / DNB cluster if the ALCS data were in
VIAF
Jack CurtisJack Curtis
David LawrenceDavid Lawrence
Cold kill ALCS LC DNB OCLCDead sit round in a ring ALCS LC DNB OCLCDer kreis der toten DNB OCLCDown into darkness ALCS LC OCLCGeruch des todes ALCS DNB OCLCNothing like the night ALCS LC DNB OCLCQuatre morts assis en rond ALCS BNFVier doden in een kring ALCS OCLC
David LawrenceDavid Lawrence
David Lawrence BNFDavid Lawrence BNF
Metadata requirementsMetadata requirements
• Essential for matching process
• Contact, name and IPI identifiers
• Titles of works and their ISBNs
• Name: - prefix, forename, surname, suffix, birth and death dates
• It would be nice to keep the contact id in the database without the identity of the party (non displayable)
• Possibility of engaging authors in their own data maintenance is appealing (WIKI)
Institution IdentifiersInstitution Identifiers
WorldCat Registry of Libraries
and research is working on a publisher registry by mining from WorldCat . Currently 1750 publishers mapped to 8.5 million resources. Only another 120 million resources to go !!
NISO I2 CommitteeNISO I2 Committee
http://www.niso.org/workrooms/i2
Common identifier for all
Institutions in the journal
supply chain
Discussion PointsDiscussion Points
• Potential use via identifiers of:
• Existing WorldCat APIs – Permalinks, XID, WorldCat API
• Work data service
• VIAF, WorldCat identities APIs
• Registries WorldCat Institutions, Publishers, Copyright evidence (BRR)
• Further cooperation / collaboration?
• Action items