Open Archives Initiative Protocol for Metadata Harvesting

Post on 20-Jan-2015

1,529 views 0 download

Tags:

description

Dublin Core conference 2009 Seoul, Oct 2009

transcript

10/2009 Dublin Core conference 2009, Seoul 1

The Open Archives Initiative Protocol for Metadata Harvesting

Muriel Foulonneau

Tudor Research Centre

muriel.foulonneau@tudor.lu

The protocol was born

To create a minimal layer of interoperability between distributed repositories of scientific publications

An alternative to federated search Networking of digital repositories

2Oct 2009 muriel.foulonneau@tudor.lu

Oct 2009 muriel.foulonneau@tudor.lu

“OAI divides the world between data providers and service providers”

Oct 2009 muriel.foulonneau@tudor.lu

Sharing metadata : Data aggregation

The portal gathers metadata and implements its own retrieval system

Mill?<title>My resource</title><date>04

Eg. Search engines, union catalogs, OAI

Oct 2009 muriel.foulonneau@tudor.lu

The OAI framework

Service provider

Harvester

Repository

Data provider

Data provider Data provider

Repository

Data provider

Repository

Agregator

Mechanisms to transfer large datasets Resumption tokens Incremental harvesting

Portal interface

Incremental harvest

6

<title>My resource</title><date>04

Harvester

Data providers What’s new since the last time I came?

•New or modified records•Deleted records

muriel.foulonneau@tudor.luOct 2009

Oct 2009 muriel.foulonneau@tudor.lu

OAI is based on standards

HTTP protocol XML and XML Schemas Dublin Core

Dublin Core

MARC21

MODS

Multiple representations of an object

School of arts for girls Kiz Sanayi Mektebi]

oai:lcoa1.loc.gov:loc.pnp/cph.3b23005

muriel.foulonneau@tudor.luOct 2009

Oct 2009 muriel.foulonneau@tudor.lu

OAI repositories can be organized in sets

April, 20065

What do sets represent?

Journals: issues

Institutional repositories:

Departments, research centers, etc.

EPrint Archives:Subject,

Publication Status

Cultural Heritage Repositories:Collections with Intent

Set representations may be constrained by the software package used.

Enable selective harvesting Sets can overlap: 1 item in multiple sets Can be described (eg with DC or DC Collection)

Oct 2009 muriel.foulonneau@tudor.lu

OAI supports 6 verbs

Identifyhttp://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify

ListSetshttp://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets

ListRecords http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc

ListMetadataFormats

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats ListIdentifiers

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc

GetRecord

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc

Oct 2009 muriel.foulonneau@tudor.lu

An OAI response

<record>- <header>  <identifier>oai:images.library.uiuc.edu:emblems/324</identifier>   <datestamp>2003-10-22</datestamp>   <setSpec>emblems</setSpec>   </header>- <metadata>- <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">  <dc:creator>Müller, Johann Heinrich Traugott, 1631-1675</dc:creator>     <dc:identifier>http://images.library.uiuc.edu:8081/u?/emblems,324</dc:identifier>   </oai_dc:dc>  </metadata>  </record>

About section often not used Eg to state rights on the metadata record

Oct 2009 muriel.foulonneau@tudor.lu

Examples of repositories

Library of Congress

http://memory.loc.gov/cgi-bin/oai2_0

ContentDM at UIUC

http://images.library.uiuc.edu:8081/cgi-bin/oai.exe

Ohio State Knowledge Bank

https://kb.osu.edu/dspace-oai/request

PictureAustralia

Aggregates from large institutions Web crawling for small ones Flickr for individuals

13

“Using OAI has the advantage that only new and changed records need to be harvested, while for web crawl harvesting all records have to be re-harvested each time a harvest is run.”

http://www.pictureaustralia.org/schemas/pa/index.html

muriel.foulonneau@tudor.luOct 2009

DRIVER – aggregation as an infrastructure

14muriel.foulonneau@tudor.luOct 2009

Europeana

15muriel.foulonneau@tudor.luOct 2009

IVOA – synchronization of service repositories

16muriel.foulonneau@tudor.luOct 2009

Oct 2009 muriel.foulonneau@tudor.lu

Turn key systems

ContentDM : http://contentdm.com/ Digitool : http://www.exlibrisgroup.com/digitool.htm DSpace : http://www.dspace.org/ EPrints : http://software.eprints.org/

Interoperability in practiceQuality issues

with OAI aggregations

18Oct 2009 muriel.foulonneau@tudor.lu

Oct 2009 muriel.foulonneau@tudor.lu

Metadata formats

DC, QDC, ETDMS, MODS, MARC, EAD, …

Require an XML schema

Most implementations only use simple DC

Example of values found in DC:Date

September 29–October 28, 51 AD; 1970

second half of IXth century AD; 1978

Rebuilt 1984Possibly Vth/VIth century AD; 1935

Planted 1985n/a

n.d.

Mid IInd century AD; 1973

Jul-51

circa 900 AD

ca. 701 BC

Begun 14th century

184-?

1839

18–?

August 23, 2000

between 1827 and 183

VIIIth/IXth century AD ? (TC);1965

Vth-VIth century AD (McNamee); IVth century AD (Cribiore); 1982

20

XVIII DynastyWinter 2003

era of redevelopmentvarious2002-001980, refurbished 1997China: Neolithic Period (5000 BCE-ca 1600 BCE)?19691968

21. Nouemb. Anno. 1564.And finisshed on the euen of thanunciacion of our said bilissid Lady falling on the

wednesday the xxiiij daye of Marche. in the xix yeer of Kyng Edwarde the fourthe[1479]]19193xxxx Oct xxVarious1938-05-381963 to 1953

[not after 1579]163[5?]

muriel.foulonneau@tudor.luOct 2009

Who is a metadata made for?

machine Dc:type “Text.Correspondence.Letter” Dc:language “wln”

human

Dc:type Correspondence Dc:language “wallon”

Who knows ? Dc:date “197- “ Dc:description “First ed. Cf. BM. “

muriel.foulonneau@tudor.luOct 2009

Oct 2009 muriel.foulonneau@tudor.lu

Improving quality

Quality certificates for open access repositories DINI - Deutsche Initiative für Netzwerkinformation

Best practices for OAI and shareable metadata by the Digital Library Federation and the National Science Digital Library

http://www.diglib.org/pubs/dlf108.pdf

Meeting with software providers

Test environment (eg Europeana)

Community guidelines

Conclusion

The protocol « crossed the chasm »?

The objective is to create a network of repositories rather than networking individual resources

Lack of specific mechanism to relate resources to each other

Approach to linked data and OAI-ORE

23Oct 2009 muriel.foulonneau@tudor.lu

OAI-PMH

http://www.openarchives.org/pmh/

Best practices for OAI and shareable metadata

http://www.diglib.org/pubs/dlf108.pdf

Tim Cole and Muriel Foulonneau, Using the Open Archives Initiative Protocol for Metadata Harvesting, Libraries Unlimited, 2007

Muriel Foulonneau and Jenn Riley Metadata for Digital resources, Chandos Publishing, 2008

muriel.foulonneau@tudor.lu

References

Oct 2009