Developments in catalogues and data sharing

Post on 17-Nov-2014

1,808 views 2 download

Tags:

description

A talk given at the Bodleian libaries 'From cataloguing to metadata' event in November 2011 Personal opinions on changing trends in library metadata creation and consumption. Also considers the challenges and rewards associated providing and licensing data for re-use by machines and the people that program them.

transcript

• How our catalogues are evolving• Opening and sharing the data within them

• Ed Chamberlain• Systems Development Librarian – Cambridge University Library

Systems Development Librarian at the other place

Data ‘munger’

Data consumer?

Control over data creation

Control over data consumption

Control over data environment

Control over data technology

No longer the single authority for content and data

Commercial, social and academic discovery mechanisms

Explosion of digital content

Illusion of ‘all on the web’

Studies into Google Generation / ‘Generation Y’ 1

Cambridge Arcadia IRIS report 2009 2

Preference for search engine over catalogue

Online over in-building

Trust tutors and peers over Librarian

Still respect the library ‘brand’ 1) ”The Google generation: the information behaviour of the researcher of the future”Aslib Proceedings, V60, issue 4 10.1108/00012530810887953

2) Arcadia IRIS Project report - http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf

So far …

Evolution of catalogues

Changes in exposure of data

To come? Greater sharing of data

Library data used in non-library environments

Keyword based discovery services

New ways to exploit old data

Relevancy ranking

Rich faceting

Greater linking

Search is the new browse

Repositories and archives

Is the OPAC dead?

Citations

Abstracts

Table of Contents

Tags Public lists Reader reviews

Dramatic growth in access pointsInput from true subject specialists

o Lack of structureo No quality controlo Compromise of sanctity?

Web scale - resource discovery concept taken further Primo Central Summon Ebsco Discovery Worldcat local

Hathi trust data can be used for full text searching of print collections

Catalogue data is now: Consumed as keywords (not

left anchored) Facted (not browsed) Supplemented Transformed Merged Amalgamated

Our local catalogues

National / international aggregations

Joe Public

Teenage software developer / hacker

Booksellers

Web start-ups

Search engines

Wikipedia

Other libraries

Research group website

Bibliographic data linked to many aspects of successful teaching and research

Citation lists – measure output

Shared bibliography – core of research group work

Reading lists – backbone of undergraduate teaching

High quality data needed for re-use

Not all possible whilst data resides in the library ‘silo’

“Library catalogues have imposed on them librarian or supplier-made decisions about what can/can’t be searched and in what way. Some of these decisions are limited by current cataloguing rules, but not all; often the data is recorded, but not in a usable way, or is there but isn’t tapped by the interface. For example, in most catalogues you can limit by publication type to newspapers, but you can’t limit by frequency of the issues.”

“Releasing data means that people can start to use it in the way they want to.”

Success of distributed access outside of cultural heritage

Single point of discovery?

Taxpayer generated – give it back!

Why not share?

Past few years have seen a massive release of public data in government and cultural heritage sectors Open Government Data - http://data.gov.uk Open Knowledge Foundation - http://okfn.org

EU Commission mandate to open data

Shared in ways for easy reuse and linking

RLUK and JISC initiative

Galleries, libraries, archives, museums

The Discovery principles propose that:

'Open metadata creates the opportunity for enhancing impact through the release of descriptive data about library, archival and museum resources. It allows such data to be made freely available and innovatively reused to serve researchers, teachers, students, service providers and the wider community in the UK and internationally.'

http://discovery.ac.uk

Why not?

WorldCat has done this for years

Schema.org microdata– some semantic structure

Use case for catalogue data in an advertising environment?

Google taken 10% (so far)

<h1 itemprop="name”>The Cambridge companion to Spenser edited by Andrew Hadfield. [electronic resource] /</h1>

<span style="display: none;" itemprop="publisher">Cambridge University Press,</span> <span style="display: none;" itemprop="datePublished">2001.</span>

Application Programme Interface (API)

Layered over LMS

Expose catalogue data feeds for developers

Anyone can use them

Simple request, simple response

http://www.lib.cam.ac.uk/api

http://www.lib.cam.ac.uk/api/voyager/newtonSearch.cgi?searchArg=darwin&databases=depfacaedb

COMET project

80% of CUL bib records converted to Resource Description Framework (RDF)

Enriched with direct links to the Library of Congress

Vocab in-line with British Library work

OCLC FAST and VIAF authority sources

http://data.lib.cam.ac.uk

Marc21 …001 1000346245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D. 1003-1171 /

DC XML …<dc:identifer>1000346</dc:identifer><dc:title>Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171</dc:title>

RDF triples …<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171"

1. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171" .2. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/1cb251ec0d568de6a929b520c4aed8d1> .3. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> .4. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/identifier> "UkCU1000346" .5. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/issued> "1981" .6. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> .7. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/language> <http://id.loc.gov/vocabulary/iso639-2/eng> .8. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://RDVocab.info/ElementsplaceOfPublication> <http://id.loc.gov/vocabulary/countries/ii>

The Linking Open Data cloud diagram - http://richard.cyganiak.de/2007/10/lod

Wikipedia

Archives Hub

British Library BNB

British Museum

Library of Congress

LOD at Bibliothèque nationale de France

BBC Nature

University of Southampton

Open University

More data out there for cataloguers to reuse

More access points in records

Better mechanisms for record enrichment

Scope for revised cataloguing workflows

Records have a permanent identity on the web

Initial attempts with RDF

Newer lightweight formats and databases

Focus on citation metadata for the sciences

New ways for scientists to share and work with bibliography

http://openbiblio.net/

http://openbiblio.net/principles/

If developers are now consumers of our data …

Most Cambridge data could be released under a permissive license (PDDL)

Europeana Digital Library approve Creative Commons ‘Zero’ licensing of data

British Library BNB – Creative Commons ‘Zero’

OCLC looking at attribution only licensing

Move away from ‘non-commercial’ wording

Open Data Commons Public Domain Dedication and License

(PDDL)

No one wants OCLC to go under (partners on COMET)

Valued partners

Focus on sharing ‘non-marc21’ formats of greater use to the non-Librarian

Vendors aim to profit from services based on data rather than data for its own sake?

Based on a 40 year old format

Based on a need to print a human readable card

Syntax, vocabulary, field names and content all intertwined

According to OCLC Research : Only 10% of all Marc tags in Worldcat

appear in 100% of all Worldcat records

65% of tags appear in less that 1% of records.

AACR2 / MARC21 uses punctuation to denote content (100$d)

Mixed fields (text and numbers) (020$a)

Duplication author name format One hundred notes fields (or close

enough) ?

df100$aBradford, Gamaliel$d1863 - 1932. <authorParsed><surname>Bradford</surname><restOfName> Gamaliel</restOfName><birthDate>1863</birthDate><birthDateNormalised>18630101</birthDateNormalised><deathDate>1932</deathDate><deathDateNormalised>19320101</deathDateNormalised></authorParsed>

Marc21 is binary encoded

Web-friendly standards are now the norm (XML/JSON) 1

Numbers for field names?

Bad character encoding allowed

LOC Bibliographic Framework Transition declares a shift away from Marc21

Is the delay in introduction of RDA until we get a ‘better container’ ?

No system vendor is going forward with Marc21

Will take 10+ years

What is to come next?

Steering for RDA and Marc replacement needs non-librarian input or ownership

Offer from NISO to take the work on

Karen Coyle criticises the Marc21 Bibliographic Framework Transition Initiative for not including museums, publishing, and IT professionals …

She argues that our data is not just for us to consume alone …

“The next data carrier for libraries needs to be developed as a truly open effort. It should be led by a neutral organization (possibly ad hoc) that can bring together the wide range of interested parties and make sure that all voices are heard. Technical development should be done by computer professionals with expertise in metadata design. The resulting system should be rigorous yet flexible enough to allow growth and specialization.”

http://kcoyle.blogspot.com/2011/08/bibliographic-framework-transition.html

It becomes (even) easier to go to Amazon

Our status as authoritative data providers will be (further) eroded

No-one will want to play with us if we cannot learn to share

http://www.discovery.ac.uk - Discovery

Ncg4lib mailing list

http://okfn.org - Open Knowledge Foundation

http://data.lib.cam.ac.uk

Ed Chamberlain

@edchamberlain emc59@cam.ac.uk http://www.slideshare.net/EdmundChamberlain/