Post on 21-Dec-2015
transcript
Outline
Brief history/overview of OAI
Why OAIster was created
OAIster: digital union catalog
EPrints, open access and OAIster
Google (and others) and OAIster
What is OAI?
OAI stands for Open Archives Initiative“…develops and promotes interoperability standards that aim to facilitate the efficient dissemination of content.”
Probably should have been called SAI: Shared Archives Initiative
Includes a Protocol for Metadata Harvesting (PMH), i.e., what we use to fill OAIster
Consists of data providers and service providers
Metadata records
Data providers use protocol to share their metadata records
Service providers harvest the metadata so they can provide a service using them
Metadata needs to beXML1.1 compliantUTF-8 enabledSufficient for discovery
OAI: what it is not
OAI ≠ open access “…defining and promoting machine interfaces that facilitate the
availability of content from a variety of providers. Openness does not mean ‘free’ or ‘unlimited’ access to the information repositories that conform to the OAI-PMH.”
However, a large majority of OAIster records are available to all and sundry
Perfect opportunity-- freely sharing free stuff
Why OAIster?
Initially, wanted to build the Academic HotBot (now we would say the Academic Google)
Essentially, a union catalog of digital objects that often can’t be roboted or spidered
Currently, have more records that link to “objects” than there are records in our OPAC: 13+ million
What does OAIster contain?
Pre-prints, post-prints, published articles, grey literature, scanned images, archival videos…
Harvest everything available except obvious test repositories
Keep nearly everything must have a valid digital object link must have decent metadata must be scholarly or informational
http://memory.loc.gov/mbrs/varsmp/0526.mpgLibrary of Congress Digitized Historical Collections
http://name.umdl.umich.edu/ADM0370.0002.001University of Michigan Digital Collections
Why do (should) people use it?
It’s big-- will pass 14 million shortly
It’s varied-- besides articles, photos, and videos, it contains datasets, audio files, finding aids, manuscripts…
It keeps growing-- as long as they keep paying my salary
EPrints in OAIster
Kept pace with EPrints movement
Currently actively harvest 138 EPrints repositories (another 18 are inactive)
All told, 190K+ records
A word about repository discovery…
EPrints in OAIster
Not a drop in the bucket, when consider that EPrints is less than a decade old
Don’t forget that OAIster contains more than records pointing to full-text
And it also doesn’t point to only open access materials
Restricted materials
Effort needed to partition restricted from freely accessible
Often, full-text objects are embargoed, but not reflected in metadata
Also felt not providing restricted materials could be a disservice
Benefit of EPrints in OAIster
All valid records in one place
Actively checked and re-harvested
Inclusive search, so end-users retrieve associated and serendipitous materials
When use OAIster, when Google?
Google vs. everything elseGoogle can’t get at everything, until it
starts using OAI itself (DSpace aside)Google contains junk, OAIster rarely
does (anymore)Most importantly, we use metadata,
Google doesn’t