+ All Categories
Home > Documents > Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an...

Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an...

Date post: 05-Jan-2016
Category:
Upload: leona-reeves
View: 212 times
Download: 0 times
Share this document with a friend
Popular Tags:
37
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment Muriel Foulonneau ([email protected] ) Grainger Engineering Library University of Illinois at Urbana-Champaign UIUC June 2006
Transcript
Page 1: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

Slavic Digital Text Workshop 2006

The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for

Sharing Content in a Distributed Environment

Muriel Foulonneau ([email protected])

Grainger Engineering LibraryUniversity of Illinois at Urbana-Champaign

UIUC June 2006

Page 2: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 2

Outlines

Improving resource discoverability Hidden Web, portals and distributed digital

libraries Interoperability

Metadata and protocols The Open Archives Protocol for Metadata

Harvesting The protocol, examples of services and

repositories Issues for digital libraries of distributed

objects

Page 3: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 3

Improving resource

discoverability

Page 4: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 4

Sharing content

New services, new representations of the content, new audiences

Bring your content to attention of new users outside your immediate community 37% of visits to images of the State Library of New

South Wales came from the PictureAustralia portal in 2002/3

Page 5: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 5

Integrated Access to CIC Metadata

http://cicharvest.grainger.uiuc.edu/

Page 6: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 6

Thematic access to resources

Page 7: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 7

Russian Publics collection at UIUC

Page 8: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 8

On the CIC metadata portal

Page 9: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 9

Search on Google

Page 10: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 10

Multiple services use different features

Full text

Metadata

Collection descript.

Metadata AND resources

Metadata

Metadata AND resources

Page 11: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 11

Interoperability

Page 12: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 12

Content and services

Building services

=> New services need content with similar features

Collectionservice

Page 13: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 13

What is interoperability Interoperability is the capacity for different systems to

talk to each other

I need A standard language An interpreter

01-04-04

-“01-04-04”

- this is a month

- 01=“Jan”

Page 14: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 14

Various types of interoperability

Technical Protocols, hardware, … Mac/PC, Netscape/IE …

Organizational Who is in charge? Competence? Politics? Update?

Rules

Content – related = metadata What do you talk about? The “item” = Granularity

and nature of the object Semantic : date…. Created? Published? Syntactical : 04 January 2004 Linguistic : 04 Enero 2004

Page 15: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 15

Metadata

Are used to Manage Provide information Retrieve Preserve Define rights and conditions of use Describe structure

Descriptive Administrative Structural

Page 16: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 16

A metadata format

Is a set of elements or information, mandatory or not, to apply together in order to reach one of the above mentioned objectives

Standard As a text As a DTD in SGML As a Xschema in XML

=> MARC, EAD, MODS, Dublin Core, LOM, MPEG7, MyHomeCookedSchema …

Page 17: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 17

The Dublin Core Metadata Element Set

15 elements

Content Intellectual property

Instantiation

CoverageDescriptionRelationTypeSourceTitleSubject

RightsContributorPublisherCreator

LanguageIdentifierFormatDate

Page 18: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 18

Where metadata lay

“Internal” Webpage

Embedded TEI, EAD

External Catalogs XML records …

Includes a link to the resource

=> Third party metadata

Library of Congress home page<HTML><HEAD><TITLE>The Library of Congress</TITLE><META NAME="description" CONTENT="Home page of the Library of Congress, Washington, D.C. The Library of Congress is the nation's oldest federal cultural institution, and it serves as the research arm of Congress. […]."><META NAME="keywords" CONTENT="library of congress, home page, catalog, copyright office, […]">

Page 19: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 19

Sharing metadata : Federated search

My user wants “mills”…. Whatever that comes from

Federated searchMill?

<title>My resource</title><date>04

<title>My resource</title><date>04

<title>My resource</title><date>04

Eg. Z39.50, SRU/SRW, WAIS

Page 20: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 20

Sharing metadata : Data agregation

The portal gathers metadata (and resources?)

Mill?<title>My resource</title><date>04

Eg. Search engines, union catalogs, OAI

Page 21: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 21

OAI divides the world between data

providers and service providers

Page 22: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 22

The OAI framework

Service provider

Harvester

Repository

Data provider

Data provider

Data provider

Repository

Data provider

Repository

Aggregator

Page 23: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 23

OAI repositories can be organized in sets

April, 20065

What do sets represent?

Journals: issues

Institutional repositories:

Departments, research centers, etc.

EPrint Archives:Subject,

Publication Status

Cultural Heritage Repositories:Collections with Intent

Set representations may be constrained by the software package used.

Page 24: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 24

Honoré Daumier Lithograph (Brandeis University)

MARC Record

In XML

Dublin Core Record

In XML

In XML

In XML

Qualified Dublin Core Record

MODS record

Multiple representations of an object

Page 25: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 25

OAI is based on standards

HTTP protocol XML XML Schemas Dublin Core

Page 26: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 26

OAI supports 6 verbs

Identifyhttp://aerialphotos.grainger.uiuc.edu/oai.asp?verb=Identify

ListSetshttp://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListSets

ListRecords http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListRecords&metadataPrefix=oai_dc

ListMetadataFormats

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=ListMetadataFormats

ListIdentifiershttp://aerialphotos.grainger.uiuc.edu/oai.asp?verb=

ListIdentifiers&metadataPrefix=oai_dc GetRecord

http://aerialphotos.grainger.uiuc.edu/oai.asp?verb=GetRecord&identifier=oai:aerialphotos.grainger.uiuc.edu:AP-1A-1-1940&metadataPrefix=oai_dc

Page 27: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 27

An OAI response<record>- <header>  <identifier>oai:images.library.uiuc.edu:emblems/324</identifier>   <datestamp>2003-10-22</datestamp>   <setSpec>emblems</setSpec>   </header>- <metadata>- <oai_dc:dc xmlns:oai_dc="http://www.openarchives.org/OAI/2.0/oai_dc/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/oai_dc/ http://www.openarchives.org/OAI/2.0/oai_dc.xsd">  <dc:creator>Müller, Johann Heinrich Traugott, 1631-1675</dc:creator>     <dc:identifier>http://images.library.uiuc.edu:8081/u?/emblems,324</dc:identifier>   </oai_dc:dc>  </metadata>  </record>

Page 28: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 28

Examples of repositories

Library of Congresshttp://memory.loc.gov/cgi-bin/oai2_0

ContentDM at UIUChttp://images.library.uiuc.edu:8081/cgi-bin/

oai.exe

Ohio State Knowledge Bankhttps://kb.osu.edu/dspace-oai/request

Page 29: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 29

Examples of services

http://oaister.umdl.umich.edu

http://nsdl.org/

http://www.americansouth.org/

http://cicharvest.grainger.uiuc.edu/

http://imlsdcc.grainger.uiuc.edu/ http://www.language-archives.org/

http://www.pictureaustralia.org/

Page 30: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 30

Turn key systems and modules

CWIS : http://scout.wisc.edu/Projects/CWIS/ ContentDM : http://contentdm.com/ Digitool : http://www.exlibrisgroup.com/digitool.htm DSpace : http://www.dspace.org/ EPrints : http://software.eprints.org/ DLXS: http://www.dlxs.org/ OAICat: http://www.oclc.org/research/software/oai/cat.htm XMLFile:

http://www.dlib.vt.edu/projects/OAi/software/xmlfile/xmlfile.html

DLESE OAI software: http://dlese.org/oai/index.jsp

Page 31: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 31

Useful tools

UIUC OAI registryhttp://gita.grainger.uiuc.edu/registry/ OAI repository explorerhttp://re.cs.uct.ac.za/ Errolhttp://errol.oclc.org/

Page 32: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 32

Digital libraries of distributed objects

Page 33: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 33

Metadata shareability issues

Granularity Loss of context Completeness

DLF-NSDL Best practices on shareable metadatahttp://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?TableOfContents

Page 34: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 34

What is behind URLs

Page 35: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 35

Conveying actionable URLs

http://rama.grainger.uiuc.edu/assetactions/

View Resize Select

                           

   

                             

 

           

    

                           

   

                  

      

Annotate

Share

Page 36: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 36

Conclusions

Interoperability: technical, content-related and organizational, well OAI is the easy part

Works even better for particular communities with similar organizational structures and metadata formats

Extensions of the protocol for: Objects Actionable URLs

Page 37: Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.

June 15th, 2006

[email protected] of Illinois at UC 37

References and useful material

The Open Archives Websitehttp://www.openarchives.org/OAI/2.0/guidelines.htm DLF/NSDL best practices for OAI and

shareable metadatahttp://oai-best.comm.nsdl.org/cgi-bin/wiki.pl?TableOfContents OAForum Tutorialhttp://www.oaforum.org/tutorial/ Getting a Leg Up on OAIhttp://nsdl.comm.nsdl.org/meeting/session_docs/

2004/2620_National_Science_Digital_Library_Conference.doc


Recommended