Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Lessons in Cross-Repository Interoperabilitylearned from
the aDORe effort
Herbert Van de SompelResearch Library
Los Alamos National Laboratory, USA
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
The repository model
"Pattern Recognition: The 2003 OCLC Environmental Scan"http://www.oclc.org/membership/escan/toc.htm
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Credits
The reported material is based on the following work:o The LANL aDORe repository efforto The upcoming PhD thesis by Jeroen Bekaert (Advisor
Herbert Van de Sompel) regarding protocol-based interfaces for Open Archival Information Systems (OAIS)
o The NSF-funded Pathways project in collaboration with the Information Science group at Cornell University (Carl Lagoze, Sandy Payette, Simeon Warner)
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Outline
aDOReA few words about the aDORe architecture
A Federation of RepositoriesA new level of cross-repository interoperability
Pathways InterDisseminatorA context-sensitive service overlay for a federation of repositories
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
aDORe
DIDs
OAI-PMH request DID
DID
DID
DID
DID
DID
LANL
A&I Publisher
publisher
TechReport
A&I
A&I
baseURL(1)
baseURL(1)
baseURL(2)
baseURL(3)
baseURL(4)
baseURL(x)
FTXT
Ingest
ARC
BaseURL
OAI-PMH request
DID, METS, IMS-CP, ...
OAI-PMH request
OAI-PMH request
DIDDID + DIM
Profile/BehaviorRegistry
Registry of trans-formations
MPEG-21DIP
Engine
OpenURL
Identifier Locator
OpenURL gateway
OAI-PMH Federator
OpenURL
transformed content
Content-id or Package-id
baseURL(n) & Package-id
DIMInserter
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
FTXT
DID
DID
DID
DID
DID
DID
publisher
A&I Publisher
OpenURL
ARC
OpenURL
ARCOpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
baseURL(2)
baseURL(3)
baseURL(4)
baseURL(x)
RepositoryIndex
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
aDORe effort
aDORe is 2 things:o Standards-based, modular repository architecture
- Distributed architecture- Protocol-based interactions between modules- Applicable to create interoperable federations of
heterogeneous repositorieso Actual implementation of the architecture at LANL for
local storage of digital assets (currently in its 2nd version)
aDORe is not a producto Components of aDORe software, usable in other
environments, will be released
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
aDORe effort
• Standards used in aDORe include: o XML, o XML Schema, o MPEG-21 Digital Item Declaration, o MPEG-21 Digital Item Identification, o W3C XML Signatures,o OAI-PMH, o NISO OpenURL Framework for Context-Sensitive Services, o Internet Archive ARC file format, o OAIS concepts
DIDs
OAI-PMH request DID
DID
DID
DID
DID
DID
LANL
A&I Publisher
publisher
TechReport
A&I
A&I
baseURL(1)
baseURL(1)
baseURL(2)
baseURL(3)
baseURL(4)
baseURL(x)
FTXT
Ingest
ARC
BaseURL
OAI-PMH request
DID, METS, IMS-CP, ...
OAI-PMH request
OAI-PMH request
DIDDID + DIM
Profile/BehaviorRegistry
Registry of trans-formations
MPEG-21DIP
Engine
OpenURL
Identifier Locator
OpenURL gateway
OAI-PMH Federator
OpenURL
transformed content
Content-id or Package-id
baseURL(n) & Package-id
DIMInserter
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
FTXT
DID
DID
DID
DID
DID
DID
publisher
A&I Publisher
OpenURL
ARCOpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
baseURL(2)
baseURL(3)
baseURL(4)
baseURL(x)
RepositoryIndex
Compoundobjects
RepositoryRegistry
IdentifierLocator
DIDs
OAI-PMH request DID
DID
DID
DID
DID
DID
LANL
A&I Publisher
publisher
TechReport
A&I
A&I
baseURL(1)
baseURL(1)
baseURL(2)
baseURL(3)
baseURL(4)
baseURL(x)
FTXT
Ingest
ARC
BaseURL
OAI-PMH request
DID, METS, IMS-CP, ...
OAI-PMH request
OAI-PMH request
DIDDID + DIM
Profile/BehaviorRegistry
Registry of trans-formations
MPEG-21DIP
Engine
OpenURL
Identifier Locator
OpenURL gateway
OAI-PMH Federator
OpenURL
transformed content
Content-id or Package-id
baseURL(n) & Package-id
DIMInserter
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
DID
FTXT
DID
DID
DID
DID
DID
DID
publisher
A&I Publisher
OpenURL
ARCOpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
ARC
OpenURL
baseURL(2)
baseURL(3)
baseURL(4)
baseURL(x)
RepositoryIndex
OpenURLResolver
OAI-PMHFederator
DynamicDissemination
Engine
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
OAI-PMH Federator & OpenURL Resolver
aDORe front-end
Interfacestandard
identifierOAIS
AccessType
# items in
response
OAI-PMHFederator
OAI-PMH Package Identifier OAIS DIP 1 or more
OpenURLResolver
NISOOpenURL
Content Identifier, Package Identifier (with XML ID
fragment)
OAIS DIP &Result Set
1
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
aDORe effort
Standards
Distributedarchitecture
Protocol-basedcommunication
Insightsin
Cross-RepositoryInteroperability
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Outline
aDOReA few words about the aDORe architecture
A Federation of RepositoriesA new level of cross-repository interoperability
Pathways InterDisseminatorA context-sensitive service overlay for a federation of repositories
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
The interoperable repository model
I will try to show that:• a significantly higher level of cross-repository interoperability can be achieved with relatively modest means• those means are largely available and agreed upon in our community
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Part 1 : Requirements for a repository in a federation
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Repositories & Units of Communication
• Data-oriented research => not only textual materials, but also datasets,
software, simulations, dynamic knowledge presentations, …
• Research results represented by variety of digital media these media must receive status similar to that of
text in current system• Materials in various stages of certification:
units of communication not only ‘papers’ but also preprints, raw datasets, prototype simulations, …
• Facilitate collaboration re-use of units of communications
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Repositories & Units of Communication
• Handling this requires:o a compound object view of a unit of communicationo stop thinking in terms of metadata versus content
• Compound object:o Has a persistent identifiero Contain materials and metadata about those
materialso Can contain other compound objects
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Compound objects
URI_7
URI_3
URI_9
URIs:
• minted by different repositories• from different namespaces• not (necessarily) locators
compound object
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
XML-based representation of compound objects
URI_7
URI_3
URI_9
compound object
URI_7
URI_3
URI_9
MPEG-21 DIDLMETS
IMS/CPRDF
XML-basedrepresentation
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Repository Interop Interface 1: OAI-PMH & CO
OAI-PMHbaseURL_m
URI_7
URI_3
URI_9
OAI-PMHharvester
repository_a
• machine consumption • batches of compound objects• OAI-PMH datestamp ~ new version of object
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
OAI-PMH interface to OAIS (Jeroen Bekaert)
agent
baseURL(OAIPMH_CIID)?verb=ListMetadataFormats
baseURL(OAIPMH_CIID)?verb=ListRecords&metadataPrefix=info:pathways/svc/dip.rdf
list of DIP formats (ListMetadataFormats response)
list of DIPs (derived from most recent AIPs)
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Repository Interop Interface 1: OAI-PMH & CO
OAI-PMHharvester
URI_7
URI_3
URI_9
URI_7
URI_3
URI_9
URI_12
add valuerecombine
repository_b OAI-PMHbaseURL_n
• include provenance ~ version of compound object
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Repository Interop Interface 2: OpenURL & CO
OpenURLbaseURL_o
repository_n
OpenURL
baseURL_x? url_ver=Z39.88-2004 & rft_id=URI_7 & svc_id=info:pathways/svc/dip.*
• machine (& human) consumption • single object dissemination ~ identifier of compound object
URI_7
URI_3
URI_9
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
• ServiceType = Request a representation of the DO expressed using a compound object format
o Example: - svc_id = info:pathways/svc/dip.didl (request
MPEG-21 DIDL representation)- svc_id = info:pathways/svc/dip.mets (request
METS representation)- svc_id = info:pathways/svc/dip.rdf (request RDF
representation – see later)
• Other Entities could be added to Interface #2 (think Requester)
Repository Interop Interface 2: OpenURL & CO
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Repository Interop Interface 2: OpenURL & CO
OpenURLbaseURL_o
repository_n
OpenURL
• independent of nature of identifiers• ‘resolution’ independent of scheme-specific mechanisms• conceptual interface is persistent over time
• KEV & HTTP• XML & SOAP• …
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
OpenURL interface to OAIS (Jeroen Bekaert)
agent
list of ContextObjects
DIP (RDF)
BaseURL(OpenURL_CIID)?url_ver=Z39.88-2004&rft_id=ContentInfoIdentifier&svc_id=info:pathways/svc/dip
BaseURL(OpenURL_CIID)?url_ver=Z39.88-2004&rft_id=ContentInfoIdentifier&rft_val_fmt=info:ofi/fmt:kev:mtx:pathways&rft.aip=AIPIdentifier&svc_id=info:pathways/svc/dip.rdf
OpenURL
for each DIP format
list of ContextObjects
BaseURL(OpenURL_CIID)?url_ver=Z39.88-2004&rft_id=ContentInfoIdentifier&rft_val_fmt=info:ofi/fmt:kev:mtx:pathways &rft.aip=AIPIdentifier&svc_id=info:pathways/svc/dip
for each AIP (version)BaseURL(OpenURL_CIID)?url_ver=Z39.88-2004&rft_id=ContentInfoIdentifier&rft_val_fmt=info:ofi/fmt:kev:mtx:pathways &rft.aip=AIPIdentifier&svc_id=info:pathways/svc/dip
BaseURL(OpenURL_CIID)?url_ver=Z39.88-2004&rft_id=ContentInfoIdentifier&rft_val_fmt=info:ofi/fmt:kev:mtx:pathways&rft.aip=AIPIdentifier&svc_id=info:pathways/svc/dip.*
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Part 2 :Requirements for an infrastructure supporting a federation of repositories
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Repository Registry: Who is part of the Federation?
RepositoryRegistry
register
Per Repository:
• Repository identifier
• baseURL of OAI-PMH interface• baseURL of OpenURL interface
• whichever kind of information that helps downstream applications understand about the nature of the repository
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Object Registry: What is part of the Federation?
ObjectRegistry
harvest(identifiers)
Per compound object:
• Object identifier• Object datetime ~ OAI-PMH datestamp• OAI-PMH identifier• Repository identifier
of the object itself, and of its contained objects
SRUSRW
handle
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
OAI-PMH & OpenURL access to objects in federation
ObjectRegistry
RepositoryRegistry
URI_7
• List of existing copies• Per copy:
• OAI-PMH access info• OpenURL access info
URI_7
URI_3
URI_9
SRUSRW
handle
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Part 3 : Summary of requirements
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Summary of requirements
Requirement Repository Infrastructure
Compound Object model support
X
XML-based representations support
X ?
OAI-PMH CO support X
OpenURL CO support X
Repository Registry X
Object Registry X
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Summary of requirements
Many variations on the design possible, yet most of this can be achieved with:
• Off-the-shelf toolso OAI-PMH toolso Handle system, SRU/W toolso OpenURL toolso Tools to generate XML-based representations of
objects• Surprisingly little effort• A feasible amount of coordination/specification• Some shared infrastructure
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Outline
aDOReA few words about the aDORe architecture
A Federation of RepositoriesA new level of cross-repository interoperability
Pathways InterDisseminatorA context-sensitive service overlay for a federation of repositories
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Pathways InterDisseminator Service Overlay
• Pathways InterDisseminator: Dynamic Service-Oriented Overlay upon the federated architecture
• Assumes the existence of:• OpenURL Interface to all repositories in the federation• Object Registry (given an identifier, at which OpenURL
interface is the object available?)• Availability of an RDF-based representation of DO
compliant with a Pathways OWL core ontology• Is itself exposed as a different OpenURL Resolver
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Entity hasRepresentation Represen-tation
Location
hasLocation
Format
hasFormat
hasEntity
Pathways InterDisseminator : core ontology
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
DSpace
baseURL_y? url_ver=Z39.88-2004 & rft_id=URI_7 & svc_id=info:pathways/boostrap
Fedora
aDORe
baseURL_y? url_ver=Z39.88-2004 & rft_id=URI_7 & svc_id=info:pathways/dip.rdf
URI_7
URI_3
URI_9
RDF
magic
engine
OpenURL ContextObject
Container
InteropInterface 2 OpenURL
Service Overlay
OpenURL Application
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
• There can be many of these engines in a federation. The result is the ability to provide context-sensitive disseminations of DOs in (a federation of) repositories.
Pathways InterDisseminator Service Overlay
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
DSpace
baseURL_y? url_ver=Z39.88-2004 & rft_id=URI_7 & svc_id=info:magic/justdoit
Fedora
aDORe
baseURL_y? url_ver=Z39.88-2004 & rft_id=URI_7 & svc_id=info:pathways/dip.rdf
URI_7
URI_3
URI_9
RDF
serviceexecution
engine
web service
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Pathways InterDisseminator Demo
aDORe Digital Object in Demo
Type MIME identifier
Digital Object scholarly paper N/A DOI
Constituent Datastream 1
metadata recordapplication/xml
(MARCXML)
aDORe datastream id (info
URI)
Constituent Datastream 2
metadata recordapplication/xml
(original metadata)
aDORe datastream id (info
URI)
Constituent Datastream 3
fulltext file application/pdf
aDORe datastream id (info
URI)
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Demo
• Install TSCC coded (http://www.techsmith.com)• Launch movie Pathways_InterDisseminator.avi in same path as
this presentation
Herbert Van de SompelResearch Library, Los Alamos National Laboratory
OAI4, October 20-22 2005, CERN, Geneva, SwitzerlandRESEARCHLIBRARY
Comments, Flames, Questions