Date post: | 22-Dec-2015 |
Category: |
Documents |
View: | 214 times |
Download: | 1 times |
Metadata 101
Amy Benson
NELINET, Inc.
November 7, 2005
Overview
Terms and definitions– What (the heck) do all those acronyms mean?
Categories of metadata schemes and tools– How do they relate to each other?
Uses and functions– What do you do with them?
Staying power– Which ones do you really have to pay attention to?
Standards
Increase interoperability Lower use and participation barriers Build larger communities of users which can
drive creation of a wider range of relevant services and tools (Windows vs Mac)
Improve chances of long term survival of materials
Prefer open over proprietary
Categories
Metadata containers– XML, RDF
Metadata standards– MARC, MODS, DC, EAD, TEI, ONIX, FGDC, GILS
Metadata content standards Transmission standards and protocols
– METS, OAI, SOAP, Z39.50, SRW Identifiers
– URI, URL, PURL, URN, DOI, ISTC
Metadata - What is it?
Data about data Information about any aspect of a resource -
size, location, attributes, topic, origin, use, audience, creator, quality, access rights, reviews… the list is endless
An aid to the discovery, identification, assessment, and management of described entities
Types of Metadata
Descriptive– What is it?
Discovery– How can I find it?
Structural– What files comprise it?
Administrative– When was it created?
Types of Metadata
Identifiers– How can I get to it?
Terms & conditions– Can I use it?
Preservation– Which key characteristics of the resource need to
be maintained?
Metadata Terms
Structured metadata Extensibility
– Modify to suit local needs
Granularity– Level at which item or collection of items is described
Interoperability– Works with other systems– Share data across systems
Metadata - Who needs it?
Impact of metadata on collection access– Without metadata there is no service to users– Metadata provides the means for resource
discovery, grouping, filtering, matching user needs– Keyword searching works only for resources that
are text-based - excludes photographs, data sets, objects, maps, audio, video…
Metadata itself as valuable content– Item descriptions, Finding aids, Reviews
Metadata
Description vs. discovery– Full description is important for collection inventory and
management - less so for discovery– Full description of a resource includes much information
that will never be part of a user’s search key Deep vs. shallow
– Basic discovery metadata supports broad, cross-domain searching that can lead users to more complete search mechanisms and descriptions
Interoperability
Interoperability allows different computer systems, networks, and software to work together and share information
Usually achieved by following standards Generally, an increase in specialization results
in a decrease in interoperability Allows different systems to make use of same
data
Interoperability
Advantages– Can increase awareness and use of collections– Reduces geographic and domain-specific isolation
of collections– Creates new avenues for scholarship– Likely to assist / promote the longevity of data and
collections– Holy Grail = one-stop access to the universe of
online resources
Interoperability
Disadvantages– Consensus– Compromise– Delays– Loss of independence– Uniformity– Increased implementation difficulties– Loss of specificity and detail
Worthy goal?
Interoperability
NINCH (National Initiative for a Networked Cultural Heritage) Guide to Good Practice first two of its six core principles:
1. Optimize interoperability of materials
2. Enable broadest use
Interoperability
Canadian Culture Online (CCO) Technical Standards and Guidelines– Technical requirements that CCO-funded projects
must meet – Six metadata elements are required when
describing objects to ensure interoperability title, creator, subject, date created, language, identifier
XML
eXtensible Markup Language– Based on SGML - Standardized General Markup Language– Developed by WWW Consortium (W3C)– Open standard (non-proprietary)– Uses language tags, similar to HTML
<title>Gone with the wind</title>
A structure for storing and tagging information, without prescribing how the information is displayed or used
XML
Data stored in XML can be of many types Its simple syntax is easy for machines to
process Natural language tags make XML
understandable to humans XML defines the syntax, but not the data
elements that make up an XML document
XML
The structure of XML allows for hierarchical relationships – often necessary for complex documents, 3-D objects, archives, etc.
XML is extensible – an important feature that allows tags to be created by users or a community of users
XML-encoded data is easily transformed or re-purposed
XML - Elements Example
<!DOCTYPE list [ <!ELEMENT list (book+)>
<!ELEMENT book (title, author*, date+, year, comment*, code*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (aulast*, aufirst*)>
<!ELEMENT aulast (#PCDATA)>
<!ELEMENT aufirst (#PCDATA)>
<!ELEMENT date (day*, month*)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
<!ELEMENT code (#PCDATA)> ]>
XML – Record Example
<book> <title>Weaving the Web</title>
<author><aulast>Berners-Lee,</aulast>
<aufirst>Tim</aufirst></author>
<date> <day>6</day>
<month>January</month></date>
<year>2002</year>
<comment>Interesting topic, but not too well written.</comment>
<code>nonfiction</code>
</book>
XML - Partial list of ONIX elements
RecipeML
XML
Usually, tags, definitions, and requirements are defined and adhered to by a specific community– DTD (Document Type Definition)
Describes the permissible data structure for an XML file
– Schema Also describes the permissible data structure for an XML
file Newer, XML-based way to define XML document types
XML DTDs and Schemas
DTDs and schemas– Lay out the logical structure of the data– Establish rules about which elements a document
may have, which are required, which can repeat, etc.– Establish a root element, parent and child elements,
and where data can be placed within hierarchy – DTDs can be placed within an XML file, or be external
to it, and then referenced– Schemas are external
XML – Simple DTD Example
<!DOCTYPE list [ <!ELEMENT list (book+)>
<!ELEMENT book (title, author*, date+, year, comment*, code*)>
<!ELEMENT title (#PCDATA)>
<!ELEMENT author (aulast*, aufirst*)>
<!ELEMENT aulast (#PCDATA)>
<!ELEMENT aufirst (#PCDATA)>
<!ELEMENT date (day*, month*)>
<!ELEMENT day (#PCDATA)>
<!ELEMENT month (#PCDATA)>
<!ELEMENT year (#PCDATA)>
<!ELEMENT comment (#PCDATA)>
<!ELEMENT code (#PCDATA)> ]>
XML – Ways to use XML
XML-encoded data is able to be re-purposed: re-used in multiple contexts
Due to its ability to be easily parsed, software can transform it in countless ways, thereby allowing:
Easy migration paths Alternative displays On-the-fly response to user needs
Transform XML for display via style sheets (XSL) and transformations (XSLT)
XML - XSL
XML prescribes the structure of a document/record, but not content or display
XSL - eXtensible Stylesheet Language– XML uses stylesheets to display the code in user-
friendly ways– Use different stylesheets to render the data in
different ways– Similar to Cascading stylesheets used for HTML
XML - XSLT
XML Stylesheet Language Transformations (XSLT)– A markup language and programming syntax for
processing XML – Is most often used to:
Transform XML to HTML for delivery to standard web clients
Transform XML from one set of XML tags to another
XML File
XML File Transformation
XML vs Traditional Database Software
If your information is…– Tightly structured– Fixed field length– Massive numbers of individual items
You need a database
If your information is…– Loosely structured– Variable field length– Massive record size
You need XML
XML Software
Software– XMLSpy: http://www.xmlspy.com/– XMetal: http://www.xmetal.com/– AxKit: http://axkit.org/– Cocoon: http://xml.apache.org/cocoon/
Used to– Assist with content authoring and coding– Apply dynamic transformations to XML content– Render HTML for standard web browsers, PDAs, cell
phones, etc.
Namespaces
A namespace identifies a specific set of elements
Namespaces allow metadata terms to be unambiguously used across applications– Defines what ‘Date’ or ‘Title’ means in a specific
usage, or namespace
Each namespace has a unique identifier associated with it
Namespaces - Example
<dc:DC xmlns:dc='http://purl.org/dc/elements/1.1/'>
<dc:title>Internet Ethics</dc:title> <dc:creator>Duncan Langford</dc:creator> <dc:format>Book</dc:format> <dc:identifier>ISBN 0333776267</dc:identifier>
Namespaces - Example
<d:studentxmlns:s='http://www.develop.com/student' ' xmlns:w='urn:schemas.develop.com:workshop'> <s:id>3235329</s:id> <s:name>Jeff Smith</s:name> <w:name>Emerging Metadata Topics</w:name> <s:institution>XNL</s:institution>
</d:student>
Resource Description Framework (RDF)
A structured framework for multiple resource description schemas
Problem: data providers offer well organized repositories of metadata, but use different description systems
Solution: RDF - a way for machines to understand multiple description systems or metadata schemas and the relationship(s) between them
RDF
Allows interoperability among multiple resource description methods– Communities define and state their metadata schemas in
XML documents– Systems use the definitions and statements to “understand”
the metadata In practice the element sets are namespaces which are
“called” or “stated” within RDF RDF schemas “owned” by known groups provide basis
for trusted metadata
RDF Example
MARC
Advantages– Rich set of descriptive elements– Highly interoperable within library community– Long, established history
Disadvantages– Low extensibility– As is, not interoperable beyond the library world– Weak on administrative, rights, and other kinds of
metadata important for digital resources
MARC
Future of MARC– Must MARC die? No. New life through XML
MARC XML from the Library of Congress (LC) MODS: a version of MARC encoded in XML,
developed by the Library of Congress Crosswalks between MARC and many other
metadata schemas already exist
MARC XML
LC has developed a MARC XML schema, stylesheets, and tools
The schema allows representation of a complete MARC record in XML– Lossless conversion
Will support new transformations to new uses of MARC data– MARC to MARCXML to Dublin Core and MODS
Metadata Object Description Schema (MODS)
Set of 20 bibliographic elements - a subset of the MARC 21 Format for Bibliographic Data
Not as complete as the full MARC format, but richer than Dublin Core (for example)
Highly interoperable with existing MARC records Uses language-based tags, rather than numbers like
MARC 21 (245, 650, etc.) Under development by the LC Network Development
and MARC Standards Office
MODS
XML-based– Intended to work with/complement other metadata
formats
Can be used for conversion of existing MARC records or to create new resource description records
Useful particularly for library applications that want to go beyond the OPAC
Shares features of MARC and Dublin Core
MODS Elements
TitleInfo Name TypeOfResource Genre PublicationInfo Language PhysicalDescription Abstract TableOfContents TargetAudience
Note Cartographics Subject Classification RelatedItem Identifier Location AccessCondition Extension RecordInfo
MODS Elements
Title element is mandatory, all others are optional
Elements can have subelements and attributes which provide refining detail for the element
Elements and sub-elements are repeatable, except in certain cases
Elements display in any order
MODS Example
MODS Implementation
MODS User Guidelines– http://www.loc.gov/standards/mods/registry.html
MODS Implementation Registry Contains descriptions of MODS projects
planned, in progress, and fully implemented– http://www.loc.gov/standards/mods/registry.html
Dublin Core (DC)
A method of describing resources intended to facilitate the discovery of electronic resources
Designed to allow simple description of resources by non-catalogers as well as specialists
National and International standard– ANSI/NISO standard Z39.85-2001– ISO standard 15836
Includes 15 “core” elements
Dublin Core Elements
Title Creator Subject Description Publisher Contributor Date Type
Format Identifier Source Language Relation Coverage Rights
Dublin Core
All elements optional and repeatable Elements display in any order Authority control not required Simple and Qualified DC Extensible Flexible International
Dublin Core
Simple– Lowest common denominator– Less rich– Discovery role – leads to resource or more complete
description of resource
Qualified– More precise– Less interoperable
Dublin Core Examples
Generic
Title=“The sound of music” HTML
<meta name = "DC.Title" content = “The sound of music”>
XML<?xml version="1.0"?> <metadata
xmlns:dc="http://purl.org/dc/elements/1.1/"><dc:title> The Sound of Music</dc:title> </metadata>
Dublin Core Examples - HTML
Dublin Core Examples - XML
DC Record in OCLC Connexion
Other Metadata Standards
Encoded Archival Description (EAD) Text Encoding Initiative (TEI) Visual Resources Association (VRA) Global Information Locator Service (GILS) Online Information Exchange (ONIX) Content Standards for Digital Geospatial
Metadata (CSDGM) aka FGDC Document Data Initiative (DDI)
ONline Information eXchange (ONIX)
Developed and maintained by EDItEUR jointly with Book Industry Communication and the Book Industry Study Group
ONIX is the international standard for representing and communicating book industry product information in electronic form
XML-based
ONIX
Highly focused on e-commerce of books ONIX was developed as a solution to two
perceived problems– (1) The need for richer book data online to improve
sales– (2) the widely varying format requirements of the
major book wholesalers and retailers - interoperability May appear in future library applications
CSDGM / FGDC
Primary standard for geospatial metadata All federal agencies are required to produce
and collect geospatial data in this format Allows for very detailed description
– 334 different metadata elements
Tremendous potential uses Challenge is to establish interoperability with
other metadata standards
Metadata for Images in XML - MIX
A XML-based set of technical data elements required to manage digital image collections
Encodes information such as image source, compression scheme, & image editing software
Currently being developed by LC and the NISO Technical Metadata for Digital Still Images Standards Committee
Draft 0.2 available for review and comment– http://www.loc.gov/standards/mix/
Document Data Initiative (DDI)
International, XML-based standard for the content, presentation, transport, and preservation of documentation for datasets in the social and behavioral sciences
Creating appropriate metadata will enable effective, efficient, and accurate use of the datasets
http://www.icpsr.umich.edu/DDI/codebook/
Crosswalks
Crosswalks map an element from one scheme to its closest equivalent in another scheme– Example: MARC 1XX field is mapped to DC ‘creator’
Instrumental for converting data in one format to another format - one that is potentially more widely accessible
Support the demand for cross-domain searching and interoperability
Crosswalks
There is rarely a one-to-one correlation between elements of different schemes– One to many - DC to MARC– Many to one or none - MARC to DC– None to one or many
MARC to DC– http://www.loc.gov/marc/marc2dc.html#unqualif
Content Standards
AACR (Anglo-American Cataloguing Rules)– “The rules cover the description of, and the
provision of access points for, all library materials commonly collected at the present time.”
– The current text is the 2nd ed, 2002 Revision (with 2003, 2004, and 2005 updates)
– The Joint Steering Committee for Revision of AACR (JSC) is working on a new code, “RDA: Resource Description and Access” scheduled to be published in 2008
Content Standards
International Standard Bibliographic Description (ISBD)– A family of standards to regularize the form and
content of bibliographic descriptions– Available for different material types: monographs,
computer files, etc.– Designed to promote record sharing and exchange
Content Standards
Book Industry Standards And Communications (BISAC)– Metadata Committee has the responsibility for the
continued development and maintenance of ONIX for Books in North America developed Metadata Best Practices document
– Intended as a response to the question, “I’ve downloaded the ONIX documentation. Now what?”
– http://www.bisg.org/docs/Best_Practices_Document.pdf
Content Standards
Describing Archives: A Content Standard (DACS)– Designed to facilitate consistent, appropriate, and
self-explanatory description of archival materials and creators of archival materials
– Replaces Archives, Personal Papers, and Manuscripts (APPM)
Content Standards
Western States Dublin Core Metadata Best Practices– Provide guidelines for creating metadata records for
digitized cultural heritage resources– Element set based on Dublin Core– http://www.cdpheritage.org/resource/metadata/wsdcmbp/
Content Standards
Cataloging Cultural Objects (CCO)– Provides guidelines for selecting, ordering, and
formatting data used to populate catalog records– Designed to promote good descriptive cataloging,
shared documentation, and enhanced end-user access
– Feb. 2005 draft available for review– A project of the Visual Resources Association– http://www.vraweb.org/ccoweb/
Content Standards
Descriptive Metadata Guidelines for RLG Cultural Materials– Designed to help institutions with decision making
about metadata for online access to collections– Can be used to create or review local best practice
in describing collections of cultural objects, regardless of the specific metadata standard used
– http://www.rlg.org/en/pdfs/RLG_desc_metadata.pdf
Application Profiles
Elements from one or more metadata standards combined to suit the needs of a specific community
May also include usage guidelines– Example: Title element is required
A Library Application Profile for Dublin Core is under development– Working draft is available from the DCMI web site
Authority Control Anyone?
Recommended, but not required by many schemas
Librarians know its value Controlled vocabularies: LCSH Thesauri
– Getty Art & Architecture Thesaurus; LC Thesaurus for Graphic Materials I & II
Pre-set searches
FAST
Faceted Application of Subject Terminology (FAST) LCSH is by far the most commonly used and widely
accepted subject vocabulary for general application Need for a new approach to subject vocabulary for
electronic resources Easy to maintain and amenable to automatic
authority control and computer manipulation
FAST
Maintains upward compatibility with LCSH, and any valid set of LC subject headings can be converted to FAST headings
Retains the advantages of a controlled vocabulary– Most LCSH headings are synthesized by catalogers
based on rules– For FAST, all headings (except chronological) are
established and only established headings can be assigned
Faceting of LCSHFaceting of LCSH
FA
ST
648 1775 - 1783650 American loyalists650 Revolution (United States, 1775-1783)650 Secret service650 Painters651 England651 United States651 Great Britain655 Biography655 History
650 American loyalists $z England.651 United States $x History $y Revolution, 1775-1783 $v
Biography.650 Secret service $z Great Britain.650 Painters $z United States.
LCS
H
Authority Control: FAST vs. LCSH
LCSH FAST
Many headings are established; most assigned headings are synthesized by catalogers based on rules
All headings (except chronological) are established
Very large number (billions plus) of possible headings
Faceting limits the number of possible headings to a few million
Most headings are distinct (based on NACO normalization rules*); some conflicts occur particularly with $x & $v
All headings are distinct; tagging and subfield coding provides no unique information
*http:\\www.loc.gov/catdir/pcc/naco/normrule.html
Metadata Encoding & Transmission Standard (METS)
A system for packaging metadata necessary for both the management of digital library objects within a repository and the exchange of such objects between repositories, or between repositories and their users
Used for: Digital collection repositories Developed by the Digital Library Federation
(DLF) and Library of Congress (LC)
Metadata Encoding & Transmission Standard (METS)
METS can be understood as a binder that unites metadata about a particular resource
A METS record includes six parts:– Header– Descriptive metadata– Administrative metadata– File groups– Structural map– Behavior section
100 Pixel GIF
800 Pixel JPG
1400 Pixel JPG
2000 Pixel JPG TIFF PDF TEI MrSid AIFF
Whole DocumentPage 1Page 2Page 3Page 4
Object Components(21 Files and counting…)
METS Schema
m etsHdr(M E TS
H ead er)
dm dSec(D esc rip t iveM etad a ta )
am dSec(A d m in s tra tive
M etad a ta )
fileSec(F iles )
structM ap(S tru c tu re )
behaviorSec(V iew ers )
MET S
Open Archives Initiative (OAI)
A tool that supports interoperability among multiple databases
OAI goal: coarse-granularity resource discovery
OAI handles simple discovery from multiple community-specific repositories with metadata crosswalked to unqualified Dublin Core
OAI
Roots are in the science community interested in locating and searching multiple repositories of pre- and e-prints of scientific papers
Not really an archive, the way we traditionally think of the word
OAI
Data providers expose (make available) the metadata for their collections
Service providers harvest the exposed metadata and aggregate it (so that one search does it all) and/or provide additional services related to the harvested metadata, such as providing easy access to recent additions, updated materials, pre-set searches, etc.
OAI
OAI Protocol for Metadata Harvesting– Metadata content must be encoded in XML and
have a corresponding XML schema for validation– Metadata must be supplied in unqualified Dublin
Core format, at least– Other metadata formats are optional– Metadata may optionally include a link to the actual
content / resource
OAI Infrastructure
repository
repository
repository
repository
Harvester
Service Provider
DC
DC
DC
DC
DC
OAI Infrastructure
user
Repository
search
OAI Infrastructure
user
Repository
search
repository
OAI Harvesters - Examples
Registered OAI Service Providers– http://www.openarchives.org/service/listproviders.html
OAIster– http://oaister.umdl.umich.edu/o/oaister/
OAI - Advantages
Data providers – more exposure of, and therefore, ideally, more access to one’s data
Overcome the geographical and domain-specific isolation that can occur
Service providers – more data in one place is of value to users
Service providers may offer additional services beyond increased access: prints, rights negotiation, etc.
Simple Object Access Protocol (SOAP)
A protocol that defines how to request services, objects, and information in a platform-independent manner using HTTP and XML
The main goal of SOAP is to facilitate interoperability between systems that need to interact– Can run applications as if local user
Used for: Web services & e-commerce
Z39.50
Z39.50 is a search and retrieval protocol, maintained by LC, capable of operating over TCP/IP
Negotiates queries with multiple, separate databases – does not harvest + create new db
Built in to some library software systems OAI not intended to replace other approaches, but
to provide an easy-to-use alternative for different constituencies and purposes
Search/Retrieve Web Service
The primary function of SRW is to allow a user to search remote databases of records
Protocol uses easily available technologies -- XML, SOAP, HTTP, URI -- to perform tasks traditionally done using proprietary solutions such as database queries and responses
Builds on Z39.50 and moves it forward– ZING: Z39.50 International: Next Generation
Functional Requirements for Bibliographic Records (FRBR)
A study by IFLA (International Federation of Library Associations) of the full range of functions performed by the bibliographic record– What do we use bibliographic records for?
Description, access, location, identification, annotations ...
The report provides a framework for the nature of and uses for bibliographic records
A conceptual model that can be used as a means to meet user needs and expectations
Functional Requirements for Bibliographic Records (FRBR)
Tasks we use bibliographic records for:– Finding– Identifying– Selecting– Obtaining access to resources
FRBR should allow systems to handle bibliographic data in new, useful ways that fulfill these tasks
Functional Requirements for Bibliographic Records (FRBR)
Conceptual model of relationships between bibliographic entities
Hierarchical relationships– Work
The intellectual product
– Expression An ‘expression’ of the parent work such as a translation,
edition, revisions, annotated text, etc. – Expressions entail additional intellectual effort
Functional Requirements for Bibliographic Records (FRBR)
Hierarchical relationships– Manifestation
Published runs of each expression in multiple formats over time
The level at which we traditionally create a catalog record
– Item Each copy of a specific manifestation Circulation records track items
Functional Requirements for Bibliographic Records (FRBR)
OCLC is researching the application of FRBR to WorldCat– “FRBRization”
They have created an algorithm that groups records automatically based on the Work/Expression/Manifestation/Item model
http://www.oclc.org/research/projects/frbr/algorithm.htm
OCLC & FRBR
OCLC Research has developed algorithm to build FRBR “work” sets using author/title keys
Fiction Finder Project: Research team mined record content from all records for fiction materials in WorldCat, applied FRBR algorithm to yield
– An enriched record view for every work of fiction represented in WorldCat
– Better search results displays for WorldCat fiction records including links to groups of related WorldCat records by language, format, manifestation/edition, etc.
xISBN
A web service that takes as input an ISBN and returns a list of other ISBNs of associated intellectual works
Developed by OCLC’s Office of Research Results intended for use by computer systems
to generate new searches such as in OPAC
RLG’s RedLightGreen
Search interface for the RLG union catalog of 126 million bibliographic records representing 42 million titles
FRBR-esque implementation– Uses FRBR concepts such as Work, Expression and
Manifestation for record clusters
Designed for the web-savvy undergraduate Offers filtering and grouping of search results
– http://www.redlightgreen.com
Identifiers
Four potential purposes– Locator
Where is the document I seek?
– Identifier Unique label for a resource
– Gatherers Groups like resources similar to a uniform title
– Differentiator Helps identify different versions of same resource
Identifiers
Uniform Resource Identifiers (URI) – Generic set of all names/addresses that refer to
resources on the Web including: Uniform Resource Locator (URL) Persistent Uniform Resource Locator (PURL) Uniform Resource Name (URN)
OpenURL DOI ISTC
Uniform Resource Locator (URL)
Web address or location at which a resource is held, not an identifier for the resource itself
Most common way to locate documents / items on the Web (http, ftp, mailto, etc.)
Not particularly stable or permanent– Error 404: File not Found
No metadata, but important starting point as we look at some of the related technologies
Persistent Uniform Resource Locator (PURL)
PURL Service is managed by OCLC Functionally, a PURL is a URL The PURL remains constant even if the URL
changes - its function is to automatically re-direct a user to the current URL
PURL system/resolver is updated by resource manager to reflect any changes to location of the file, or URL
PURLs
PURLs can be used both in documents and in cataloging systems
PURLs increase the probability of correct resolution and long-term access to resources
Use of PURLs can reduce the burden and expense of catalog maintenance (and business card printing)
PURL - Example
US Government is a big user of PURLs– http://www.ccny.cuny.edu/library/Divisions/
Government/iraqbib.html
OpenURL
OpenURL = context-sensitive linking OpenURL is a method of transporting metadata
and identifiers within URLs to allow for the delivery of context-sensitive services
For example, a URL can carry with it information such as author / title from a previous search to allow a system to re-execute a search in a second database without re-entry of the data by the user
OpenURL Metadata
OpenURL Example
OpenURL incorporates data from a citation search
Embeds metadata such as ISSN, date, volume number, pages, etc. in an OpenURL
A valid OpenURL incorporating the metadata: http://sfx.library.yale.edu/sfx_local?sid=Entrez:PubMed&id=pmid:16135848
Uniform Resource Name (URN)
Uniform Resource Names (URNs) are intended to serve as persistent, location-independent resource identifiers
Globally unique Never change Format
– urn:<namespace identifier>:<namespace specific string>
Use a resolver system to indicate current location of resource
Digital Object Identifier (DOI)
Overseen by the International DOI Foundation DOIs are persistent, location-independent
identifiers of resources Developed to enable management of
copyrightable materials in an electronic environment (locate, buy, sell, track, license)
Specific type / implementation of a URN
DOI
A two-part number with a prefix identifying the original publisher and a suffix identifying the specific work– Similar to the ISBN
A DOI resolution request for a specific resource would return one or more URLs - *locations* where a user could obtain access to the resource– Appropriate copy: online, text, free, illustrated, etc.
DOI
Applications of the DOI will require metadata The basis of the DOI metadata scheme is a
minimal "kernel" of elements DOI minimal kernel elements of metadata:
– DOI, DOI genre, identifier, title, type, origination, primary agent, agent role, and administrative data such as registrant, and date of registration
International Standard Text Work Codes (ISTC)
Type of URN Persistent and unique identifiers for textual
works – abstract, conceptual entities rather than specific bibliographic manifestations
International Standard Codes are also being developed for Audiovisual Works (ISAN) and Musical Works (ISWC)
Emerging ISO standard
ISTC
ISTC Registration Authority will be managed by a consortium comprised of CISAC, Nielsen BookData, and R.R. Bowker Inc.
ISTCs will be assigned by the Registration Authority and Regional Agencies
ISTCs can and will be assigned to works retrospectively
Each registered work must include basic metadata such as author, title, subject (ONIX)
ISTC
Similar to ISBN, but focused on the work versus the manifestation– Madame Bovary, Chez Gallimard, 2001
207041311X
– Madame Bovary, Penguin, 2001 0140448187
– Two ISBNS, one single ISTC for the work, Madame Bovary
ISTC
The ISTC will allow computer systems to bring together all manifestations of an intellectual work
What’s the point?– As multiple versions of books, documents, articles
proliferate, systems need a way to control presentation and access to users who generally don’t care about the difference between the Penguin 2001 edition and the Signet Classic 2001 edition
Semantic Web
The mother of all metadata projects, under development by the W3C
An extension of the current Web in which information is given well-defined meaning, understandable to people and computers
This in turn, provides better integration of existing information on the Web
Key components: URIs, XML, RDF
Summary
Planning and goal setting are two important factors for successful metadata implementation
Stick with open standards (non-proprietary), where possible
Keep an eye on XML, DC, OAI, METS - but don’t quote me
Questions?
Amy Benson
Program Director
NELINET Digital Services
NELINET, Inc.
508.597.1937
800.635.4638 x1937