+ All Categories
Home > Technology > Open Archives Initiative Protocol for Metadata Harvesting

Open Archives Initiative Protocol for Metadata Harvesting

Date post: 20-Jan-2015
Category:
Upload: chessmu
View: 1,528 times
Download: 0 times
Share this document with a friend
Description:
Dublin Core conference 2009 Seoul, Oct 2009
Popular Tags:
24
10/2009 Dublin Core conference 2009, Seoul 1 The Open Archives Initiative Protocol for Metadata Harvesting Muriel Foulonneau Tudor Research Centre [email protected]
Transcript
Page 1: Open Archives Initiative Protocol for Metadata Harvesting

10/2009 Dublin Core conference 2009, Seoul 1

The Open Archives Initiative Protocol for Metadata Harvesting

Muriel Foulonneau

Tudor Research Centre

[email protected]

Page 2: Open Archives Initiative Protocol for Metadata Harvesting

The protocol was born

To create a minimal layer of interoperability between distributed repositories of scientific publications

An alternative to federated search Networking of digital repositories

2Oct 2009 [email protected]

Page 3: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

“OAI divides the world between data providers and service providers”

Page 4: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

Sharing metadata : Data aggregation

The portal gathers metadata and implements its own retrieval system

Mill?<title>My resource</title><date>04

Eg. Search engines, union catalogs, OAI

Page 5: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

The OAI framework

Service provider

Harvester

Repository

Data provider

Data provider Data provider

Repository

Data provider

Repository

Agregator

Mechanisms to transfer large datasets Resumption tokens Incremental harvesting

Portal interface

Page 6: Open Archives Initiative Protocol for Metadata Harvesting

Incremental harvest

6

<title>My resource</title><date>04

Harvester

Data providers What’s new since the last time I came?

•New or modified records•Deleted records

[email protected] 2009

Page 7: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

OAI is based on standards

HTTP protocol XML and XML Schemas Dublin Core

Page 8: Open Archives Initiative Protocol for Metadata Harvesting

Dublin Core

MARC21

MODS

Multiple representations of an object

School of arts for girls Kiz Sanayi Mektebi]

oai:lcoa1.loc.gov:loc.pnp/cph.3b23005

[email protected] 2009

Page 9: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

OAI repositories can be organized in sets

April, 20065

What do sets represent?

Journals: issues

Institutional repositories:

Departments, research centers, etc.

EPrint Archives:Subject,

Publication Status

Cultural Heritage Repositories:Collections with Intent

Set representations may be constrained by the software package used.

Enable selective harvesting Sets can overlap: 1 item in multiple sets Can be described (eg with DC or DC Collection)

Page 10: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

OAI supports 6 verbs

Identifyhttp://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify

ListSetshttp://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets

ListRecords http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc

ListMetadataFormats

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats ListIdentifiers

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListIdentifiers&metadataPrefix=oai_dc

GetRecord

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc

Page 11: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

An OAI response

<record>- <header>  <identifier>oai:images.library.uiuc.edu:emblems/324</identifier>   <datestamp>2003-10-22</datestamp>   <setSpec>emblems</setSpec>   </header>- <metadata>- <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">  <dc:creator>Müller, Johann Heinrich Traugott, 1631-1675</dc:creator>     <dc:identifier>http://images.library.uiuc.edu:8081/u?/emblems,324</dc:identifier>   </oai_dc:dc>  </metadata>  </record>

About section often not used Eg to state rights on the metadata record

Page 12: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

Examples of repositories

Library of Congress

http://memory.loc.gov/cgi-bin/oai2_0

ContentDM at UIUC

http://images.library.uiuc.edu:8081/cgi-bin/oai.exe

Ohio State Knowledge Bank

https://kb.osu.edu/dspace-oai/request

Page 13: Open Archives Initiative Protocol for Metadata Harvesting

PictureAustralia

Aggregates from large institutions Web crawling for small ones Flickr for individuals

13

“Using OAI has the advantage that only new and changed records need to be harvested, while for web crawl harvesting all records have to be re-harvested each time a harvest is run.”

http://www.pictureaustralia.org/schemas/pa/index.html

[email protected] 2009

Page 14: Open Archives Initiative Protocol for Metadata Harvesting

DRIVER – aggregation as an infrastructure

[email protected] 2009

Page 15: Open Archives Initiative Protocol for Metadata Harvesting

Europeana

[email protected] 2009

Page 16: Open Archives Initiative Protocol for Metadata Harvesting

IVOA – synchronization of service repositories

[email protected] 2009

Page 17: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

Turn key systems

ContentDM : http://contentdm.com/ Digitool : http://www.exlibrisgroup.com/digitool.htm DSpace : http://www.dspace.org/ EPrints : http://software.eprints.org/

Page 18: Open Archives Initiative Protocol for Metadata Harvesting

Interoperability in practiceQuality issues

with OAI aggregations

18Oct 2009 [email protected]

Page 19: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

Metadata formats

DC, QDC, ETDMS, MODS, MARC, EAD, …

Require an XML schema

Most implementations only use simple DC

Page 20: Open Archives Initiative Protocol for Metadata Harvesting

Example of values found in DC:Date

September 29–October 28, 51 AD; 1970

second half of IXth century AD; 1978

Rebuilt 1984Possibly Vth/VIth century AD; 1935

Planted 1985n/a

n.d.

Mid IInd century AD; 1973

Jul-51

circa 900 AD

ca. 701 BC

Begun 14th century

184-?

1839

18–?

August 23, 2000

between 1827 and 183

VIIIth/IXth century AD ? (TC);1965

Vth-VIth century AD (McNamee); IVth century AD (Cribiore); 1982

20

XVIII DynastyWinter 2003

era of redevelopmentvarious2002-001980, refurbished 1997China: Neolithic Period (5000 BCE-ca 1600 BCE)?19691968

21. Nouemb. Anno. 1564.And finisshed on the euen of thanunciacion of our said bilissid Lady falling on the

wednesday the xxiiij daye of Marche. in the xix yeer of Kyng Edwarde the fourthe[1479]]19193xxxx Oct xxVarious1938-05-381963 to 1953

[not after 1579]163[5?]

[email protected] 2009

Page 21: Open Archives Initiative Protocol for Metadata Harvesting

Who is a metadata made for?

machine Dc:type “Text.Correspondence.Letter” Dc:language “wln”

human

Dc:type Correspondence Dc:language “wallon”

Who knows ? Dc:date “197- “ Dc:description “First ed. Cf. BM. “

[email protected] 2009

Page 22: Open Archives Initiative Protocol for Metadata Harvesting

Oct 2009 [email protected]

Improving quality

Quality certificates for open access repositories DINI - Deutsche Initiative für Netzwerkinformation

Best practices for OAI and shareable metadata by the Digital Library Federation and the National Science Digital Library

http://www.diglib.org/pubs/dlf108.pdf

Meeting with software providers

Test environment (eg Europeana)

Community guidelines

Page 23: Open Archives Initiative Protocol for Metadata Harvesting

Conclusion

The protocol « crossed the chasm »?

The objective is to create a network of repositories rather than networking individual resources

Lack of specific mechanism to relate resources to each other

Approach to linked data and OAI-ORE

23Oct 2009 [email protected]

Page 24: Open Archives Initiative Protocol for Metadata Harvesting

OAI-PMH

http://www.openarchives.org/pmh/

Best practices for OAI and shareable metadata

http://www.diglib.org/pubs/dlf108.pdf

Tim Cole and Muriel Foulonneau, Using the Open Archives Initiative Protocol for Metadata Harvesting, Libraries Unlimited, 2007

Muriel Foulonneau and Jenn Riley Metadata for Digital resources, Chandos Publishing, 2008

[email protected]

References

Oct 2009


Recommended