Y.T.
a brief history of the OAI
00Kaynak: Herbert van de Sompel
Y.T.
Open Archives Initiative – Protocol for Metadata Harvesting
Yaşar Tonta
HÜ BBY
DOK 422: Information Networks
Y.T.
The Open Archives Initiative has been set up to create a forum to discuss and solve matters of interoperability between preprint solutions, as a way to promote their global acceptance. Paul Ginsparg, Rick Luce & Herbert Van de Sompel
the OAI roots
=> Santa Fe Convention: preprint metadata harvesting
Kaynak: Herbert van de Sompel
Y.T.
interest from other communities
• Digital Library Federation meetings ~ research library community has many materials for which they would like to ‘expose’ metadata
• OAI San Antonio meeting: ~ interest from librarians, publishers, others, ...
Kaynak: Herbert van de Sompel
Y.T.
resulting actions: organizational
• establish organizational stability for the OAI:
• institutional backing from CNI & DLF
• steering committee: policy guidance
• technical committee: technical specifications
• executive group: day to day coordination
• workshops: public dissemination, feedback
Kaynak: Herbert van de Sompel
Y.T.
resulting actions: technical
• [09/2000] revise specifications to allow adoption beyond preprints: technical committee
• [09/2000-01/2001] compile new specifications: editing by Carl and Herbert
• [11/2000-01/2001] alpha-test specifications: oai-alpha group
• [01/2001] discontinue the Santa Fe Convention
• [01/2001] release version 1.0 of the OAI protocol
Kaynak: Herbert van de Sompel
Y.T.
the OAI Metadata Harvesting protocol
11Kaynak: Herbert van de Sompel
Y.T.
The OAMH protocol is a low-barrier interoperability specification for the recurrent exchange of metadata between systems
Kaynak: Herbert van de Sompel
Y.T.
the OAMH protocol
service provider data provider
Requests
Replies
repos i tory
harves ter
6
Kaynak: Herbert van de Sompel
Y.T.
A&I
federated services
image
FTXT
OPAC
e-print
Kaynak: Herbert van de Sompel
Y.T.
metadata harvesting via OAMH
metadata
A&I
image
OPAC
e-print
FTXT
harvester
FTXT
Kaynak: Herbert van de Sompel
Y.T.
federated services via OAMH
metadata
A&I
image
FTXT
e-print
AuthorTitleAbstractIdentifer
OPAC
Kaynak: Herbert van de Sompel
Y.T.
core concepts in OAMH
• low-barrier interoperability
• data-provider & service-provider model
• metadata harvesting model OAMH protocol
Dublin Core
HTTP basedReply • XML Schema
• Self contained• shared metadata format and parallel, community-
specific metadata formats
Kaynak: Herbert van de Sompel
Y.T.
OAI harvesting toolsservice provider data provider
DatestampIdentifierSet
Records
repos i tory
harves ter
Kaynak: Herbert van de Sompel
Y.T.
OAI harvesting tools
Supporting protocol requests:• Identify• ListMetadataFormats• ListSets
Harvesting protocol requests:• ListRecords• ListIdentifiers• GetRecord
repos i tory
service provider data provider
harves ter
Kaynak: Herbert van de Sompel
Y.T.
supporting protocol requests
ListMetadataFormats
ListMetadataFormats / Time / Request REPEAT
• Format prefix• Format XML schema
/REPEAT
repos i tory
service provider data provider
harves ter
Kaynak: Herbert van de Sompel
Y.T.
harvesting requests
* from=a * until=b * set=klmListRecords * metadataPrefix=dc
ListRecords / Time / Request REPEAT
• Identifier• Datestamp
• Metadata/REPEAT
repos i tory
service provider data provider
harves ter
Kaynak: Herbert van de Sompel
Y.T.
Applications of the OAMH protocol?
• federated services [S&R, SDI, alerting, linking, ...]• database synchronization• harvesting the deep Web• ...
Kaynak: Herbert van de Sompel
Y.T.
OAI background
background in the e-prints (pre-prints) communityneed to provide ‘search’ services across multiple e-
prints archivesdistributed cross-searching felt not to be appropriateadopted approach based on metadata harvestingOAI has been linked to political agenda that wants to
change the academic publishing model, but......core activity is the OAI-MHP - the OAI Metadata
Harvesting Protocol
Y.T.
Y.T.
Y.T.
Y.T.
static repository 1
http://an.oai.org/ma/mini.xml
static repository n
http:// site1.org/mini/file1
Kaynak: Lagoze, http://eprints.rclis.org/archive/00000789/
Y.T.
static repository 1
http://an.oai.org/ma/mini.xml
static repository n
http:// site1.org/mini/file1
http://gateway.institution.org/oai/
staticrepository gateway
http://gateway.institution.org/oai/site1.org/mini/file1
http://gateway.institution.org/oai/an.oai.org/ma/mini.xml
Kaynak: Lagoze, http://eprints.rclis.org/archive/00000789/
Y.T.
static repository 1
http://an.oai.org/ma/mini.xml
static repository n
http:// site1.org/mini/file1
http://gateway.institution.org/oai/
staticrepository gateway
http://gateway.institution.org/oai/site1.org/mini/file1
http://gateway.institution.org/oai/an.oai.org/ma/mini.xml
OAI-PMH harvester
OAI-PMH
HTTP
HTTP
Kaynak: Lagoze, http://eprints.rclis.org/archive/00000789/
Y.T.
The OAI-PMH data model
Kaynak: http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html
Y.T.
Content transfer between archives using the OAI-PMH
Kaynak: http://www.dlib.org/dlib/december04/vandesompel/12vandesompel.html
Y.T.
OAICat
Y.T.
What’s in a name?
‘open’ means that specs are freely availablemay be some formal standards activity in the
futurecurrently at version 2.0‘archive’ as in e-print archive - i.e. repository of
documentsNOT ‘archive’ as used by the library and archival
communities
Y.T.
OAI-MHP
generic protocol for sharing metadata between services NOT a distributed search protocol
response
Serviceproviders
Repositories
request
Databases ofstuff - metadataand/or full-text.
May be partitionedinto ‘sets’.
Y.T.
OAI-MHP
requests sent as HTTP GETresponses returned as XML over HTTPOAI-MHP based on HTTP, XML, XML
schemas, XML namespaces6 requests
–Identify, ListIdentifiers, ListRecords, GetRecord, ListMetadataFormats, ListSets
large responses may be split using simple ‘resumption token’ mechanism
Y.T.
Harvesting metadata
service provider can ask repository for–all records
–records in particular set
–records modified in particular date span
metadata records returned using XMLsupport for arbitrary XML schemasrepositories MUST support ‘simple DC’ XML record
formatsome existing support for other schemas including an
XML encoding for MARC