+ All Categories
Home > Education > Developments in catalogues and data sharing

Developments in catalogues and data sharing

Date post: 17-Nov-2014
Category:
Upload: edmund-chamberlain
View: 1,808 times
Download: 2 times
Share this document with a friend
Description:
A talk given at the Bodleian libaries 'From cataloguing to metadata' event in November 2011 Personal opinions on changing trends in library metadata creation and consumption. Also considers the challenges and rewards associated providing and licensing data for re-use by machines and the people that program them.
Popular Tags:
49
How our catalogues are evolving Opening and sharing the data within them Ed Chamberlain Systems Development Librarian – Cambridge University Library
Transcript
Page 1: Developments in catalogues and data sharing

• How our catalogues are evolving• Opening and sharing the data within them

• Ed Chamberlain• Systems Development Librarian – Cambridge University Library

Page 2: Developments in catalogues and data sharing

Systems Development Librarian at the other place

Data ‘munger’

Data consumer?

Page 3: Developments in catalogues and data sharing

Control over data creation

Control over data consumption

Control over data environment

Control over data technology

Page 4: Developments in catalogues and data sharing
Page 5: Developments in catalogues and data sharing
Page 6: Developments in catalogues and data sharing

No longer the single authority for content and data

Commercial, social and academic discovery mechanisms

Explosion of digital content

Illusion of ‘all on the web’

Page 7: Developments in catalogues and data sharing
Page 8: Developments in catalogues and data sharing

Studies into Google Generation / ‘Generation Y’ 1

Cambridge Arcadia IRIS report 2009 2

Preference for search engine over catalogue

Online over in-building

Trust tutors and peers over Librarian

Still respect the library ‘brand’ 1) ”The Google generation: the information behaviour of the researcher of the future”Aslib Proceedings, V60, issue 4 10.1108/00012530810887953

2) Arcadia IRIS Project report - http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf

Page 9: Developments in catalogues and data sharing

So far …

Evolution of catalogues

Changes in exposure of data

To come? Greater sharing of data

Library data used in non-library environments

Page 10: Developments in catalogues and data sharing

Keyword based discovery services

New ways to exploit old data

Relevancy ranking

Rich faceting

Greater linking

Search is the new browse

Repositories and archives

Is the OPAC dead?

Page 11: Developments in catalogues and data sharing
Page 12: Developments in catalogues and data sharing

Citations

Abstracts

Table of Contents

Page 13: Developments in catalogues and data sharing
Page 14: Developments in catalogues and data sharing

Tags Public lists Reader reviews

Dramatic growth in access pointsInput from true subject specialists

o Lack of structureo No quality controlo Compromise of sanctity?

Page 15: Developments in catalogues and data sharing

Web scale - resource discovery concept taken further Primo Central Summon Ebsco Discovery Worldcat local

Hathi trust data can be used for full text searching of print collections

Page 16: Developments in catalogues and data sharing

Catalogue data is now: Consumed as keywords (not

left anchored) Facted (not browsed) Supplemented Transformed Merged Amalgamated

Page 17: Developments in catalogues and data sharing
Page 18: Developments in catalogues and data sharing
Page 19: Developments in catalogues and data sharing

Our local catalogues

National / international aggregations

Joe Public

Teenage software developer / hacker

Booksellers

Web start-ups

Search engines

Wikipedia

Other libraries

Research group website

Page 20: Developments in catalogues and data sharing

Bibliographic data linked to many aspects of successful teaching and research

Citation lists – measure output

Shared bibliography – core of research group work

Reading lists – backbone of undergraduate teaching

High quality data needed for re-use

Not all possible whilst data resides in the library ‘silo’

Page 21: Developments in catalogues and data sharing

“Library catalogues have imposed on them librarian or supplier-made decisions about what can/can’t be searched and in what way. Some of these decisions are limited by current cataloguing rules, but not all; often the data is recorded, but not in a usable way, or is there but isn’t tapped by the interface. For example, in most catalogues you can limit by publication type to newspapers, but you can’t limit by frequency of the issues.”

“Releasing data means that people can start to use it in the way they want to.”

Page 22: Developments in catalogues and data sharing

Success of distributed access outside of cultural heritage

Single point of discovery?

Taxpayer generated – give it back!

Why not share?

Page 23: Developments in catalogues and data sharing

Past few years have seen a massive release of public data in government and cultural heritage sectors Open Government Data - http://data.gov.uk Open Knowledge Foundation - http://okfn.org

EU Commission mandate to open data

Shared in ways for easy reuse and linking

Page 24: Developments in catalogues and data sharing

RLUK and JISC initiative

Galleries, libraries, archives, museums

The Discovery principles propose that:

'Open metadata creates the opportunity for enhancing impact through the release of descriptive data about library, archival and museum resources. It allows such data to be made freely available and innovatively reused to serve researchers, teachers, students, service providers and the wider community in the UK and internationally.'

http://discovery.ac.uk

Page 25: Developments in catalogues and data sharing
Page 26: Developments in catalogues and data sharing

Why not?

WorldCat has done this for years

Schema.org microdata– some semantic structure

Use case for catalogue data in an advertising environment?

Google taken 10% (so far)

Page 27: Developments in catalogues and data sharing

<h1 itemprop="name”>The Cambridge companion to Spenser edited by Andrew Hadfield. [electronic resource] /</h1>

<span style="display: none;" itemprop="publisher">Cambridge University Press,</span> <span style="display: none;" itemprop="datePublished">2001.</span>

Page 28: Developments in catalogues and data sharing

Application Programme Interface (API)

Layered over LMS

Expose catalogue data feeds for developers

Anyone can use them

Simple request, simple response

http://www.lib.cam.ac.uk/api

Page 29: Developments in catalogues and data sharing

http://www.lib.cam.ac.uk/api/voyager/newtonSearch.cgi?searchArg=darwin&databases=depfacaedb

Page 30: Developments in catalogues and data sharing
Page 31: Developments in catalogues and data sharing

COMET project

80% of CUL bib records converted to Resource Description Framework (RDF)

Enriched with direct links to the Library of Congress

Vocab in-line with British Library work

OCLC FAST and VIAF authority sources

http://data.lib.cam.ac.uk

Page 32: Developments in catalogues and data sharing

Marc21 …001 1000346245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D. 1003-1171 /

DC XML …<dc:identifer>1000346</dc:identifer><dc:title>Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171</dc:title>

RDF triples …<http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171"

Page 33: Developments in catalogues and data sharing

1. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> "Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171" .2. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/1cb251ec0d568de6a929b520c4aed8d1> .3. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> .4. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/identifier> "UkCU1000346" .5. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/issued> "1981" .6. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> .7. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/language> <http://id.loc.gov/vocabulary/iso639-2/eng> .8. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://RDVocab.info/ElementsplaceOfPublication> <http://id.loc.gov/vocabulary/countries/ii>

Page 34: Developments in catalogues and data sharing
Page 35: Developments in catalogues and data sharing

The Linking Open Data cloud diagram - http://richard.cyganiak.de/2007/10/lod

Page 36: Developments in catalogues and data sharing

Wikipedia

Archives Hub

British Library BNB

British Museum

Library of Congress

LOD at Bibliothèque nationale de France

BBC Nature

University of Southampton

Open University

Page 37: Developments in catalogues and data sharing

More data out there for cataloguers to reuse

More access points in records

Better mechanisms for record enrichment

Scope for revised cataloguing workflows

Records have a permanent identity on the web

Page 38: Developments in catalogues and data sharing

Initial attempts with RDF

Newer lightweight formats and databases

Focus on citation metadata for the sciences

New ways for scientists to share and work with bibliography

http://openbiblio.net/

http://openbiblio.net/principles/

Page 39: Developments in catalogues and data sharing

If developers are now consumers of our data …

Page 40: Developments in catalogues and data sharing

Most Cambridge data could be released under a permissive license (PDDL)

Europeana Digital Library approve Creative Commons ‘Zero’ licensing of data

British Library BNB – Creative Commons ‘Zero’

OCLC looking at attribution only licensing

Move away from ‘non-commercial’ wording

Open Data Commons Public Domain Dedication and License

(PDDL)

Page 41: Developments in catalogues and data sharing

No one wants OCLC to go under (partners on COMET)

Valued partners

Focus on sharing ‘non-marc21’ formats of greater use to the non-Librarian

Vendors aim to profit from services based on data rather than data for its own sake?

Page 42: Developments in catalogues and data sharing

Based on a 40 year old format

Based on a need to print a human readable card

Syntax, vocabulary, field names and content all intertwined

According to OCLC Research : Only 10% of all Marc tags in Worldcat

appear in 100% of all Worldcat records

65% of tags appear in less that 1% of records.

Page 43: Developments in catalogues and data sharing

AACR2 / MARC21 uses punctuation to denote content (100$d)

Mixed fields (text and numbers) (020$a)

Duplication author name format One hundred notes fields (or close

enough) ?

df100$aBradford, Gamaliel$d1863 - 1932. <authorParsed><surname>Bradford</surname><restOfName> Gamaliel</restOfName><birthDate>1863</birthDate><birthDateNormalised>18630101</birthDateNormalised><deathDate>1932</deathDate><deathDateNormalised>19320101</deathDateNormalised></authorParsed>

Page 44: Developments in catalogues and data sharing

Marc21 is binary encoded

Web-friendly standards are now the norm (XML/JSON) 1

Numbers for field names?

Bad character encoding allowed

Page 45: Developments in catalogues and data sharing

LOC Bibliographic Framework Transition declares a shift away from Marc21

Is the delay in introduction of RDA until we get a ‘better container’ ?

No system vendor is going forward with Marc21

Will take 10+ years

What is to come next?

Page 46: Developments in catalogues and data sharing

Steering for RDA and Marc replacement needs non-librarian input or ownership

Offer from NISO to take the work on

Karen Coyle criticises the Marc21 Bibliographic Framework Transition Initiative for not including museums, publishing, and IT professionals …

She argues that our data is not just for us to consume alone …

“The next data carrier for libraries needs to be developed as a truly open effort. It should be led by a neutral organization (possibly ad hoc) that can bring together the wide range of interested parties and make sure that all voices are heard. Technical development should be done by computer professionals with expertise in metadata design. The resulting system should be rigorous yet flexible enough to allow growth and specialization.”

http://kcoyle.blogspot.com/2011/08/bibliographic-framework-transition.html

Page 47: Developments in catalogues and data sharing

It becomes (even) easier to go to Amazon

Our status as authoritative data providers will be (further) eroded

No-one will want to play with us if we cannot learn to share

Page 48: Developments in catalogues and data sharing

http://www.discovery.ac.uk - Discovery

Ncg4lib mailing list

http://okfn.org - Open Knowledge Foundation

http://data.lib.cam.ac.uk

Page 49: Developments in catalogues and data sharing

Ed Chamberlain

@edchamberlain [email protected] http://www.slideshare.net/EdmundChamberlain/


Recommended