Date post: | 25-Dec-2015 |
Category: |
Documents |
Upload: | cordelia-houston |
View: | 214 times |
Download: | 0 times |
Metadata and identifiers for e-journals
Copenhagen 13.-14.3.2000
Juha Hakala
Helsinki University Library
Contents
• Introduction
• Traditional cataloguing
• Full-text indexing
• Embedded metadata + Dublin Core
• DIEPER choices
• Identification of e-journals
Introduction
• Metadata = structured description of resource• Structure of metadata is defined in a format
– simple formats (AltaVista)– complex formats (MARC)– structured formats (Dublin Core)
• Choices have important cost and quality implications (good is not free)
Traditional cataloguing
• Routinely done for journals (ISSN DB)
• Articles indexed only selectively– Finnish article index Arto: 1100 journals;
65000 articles + 10 man years annually, 40 libraries co-operate in production
• Extending MARC cataloguing to all digitised articles is too expensive
• Any selection criteria for “good material”?
Full-text indexing
• Will not replace cataloguing...– In large databases precision still bad
• ...but we should follow what is happening– RDBMS become document-literate (Oracle
Intermedia)– new search techniques (e.g. fuzzy searching)– efficient use of language technologies– knowledge management
Embedded metadata (1)
• Three issues to solve: – semantics: in which metadata format should my
metadata be?– syntax: is it possible / feasible to embed
metadata into this document (does the document format allow inclusion of metadata)
– once topics 1 & 2 have been solved: are there tools for creating / harvesting / indexing my metadata?
Embedded metadata - syntax
• It must be possible to include metadata in non-compromised form & specify each data element separately
• Most document formats do not allow efficient metadata usage– “flat files”, image formats, Word97
• “This is Dublin Core identifier element, and there is an ISBN in it”
Embedded metadata - syntax (2)
• HTML 4.0– META tag enables sophisticated metadata – Explicit specification for how to embed Dublin
Core -based metadata (RFC 2731)
• XML/RDF– “Resource Description Framework makes data
machine understandable”– very versatile, but may be tough to implement
Embedded metadata - semantics
• Metadata formats tend to be domain specific, complex and hard to learn
• Dublin Core as an alternative:– simple (in its basic form)– generic (no domain dependency)– extensible (local elements possible)
• Is there any competition left?
Status of Dublin Core Initiative
• maintenance in reliable hands
• 15 elements stable (DC 1.1)
• syntax for HTML 4.0 stable
• core qualifiers under development– proposals published in December -99– agreement in DC-AC in March 2000– will result to 50-60 qualifiers
Tools for Dublin Core
• Metadata support in Web indexes becoming more popular
• Metadata creation emerging in document management systems
• Text editors: XML support in place, RDF yet to come
DIEPER choices
• Document format will be XML/RDF– extensible and open document format that will
become very popular in the future
• Metadata format will be based on DC– DC tags: Identifier, Title, Creator, Contributor,
Publisher, Language, Subject– Local tags: e.g. SerialsNumbering,
PlaceOfPublication, SizeSourcePrint
Identifiers for e-journals
• Two different issues:– how to identify journals themselves– how to identify articles and possibly sections of
articles (table of contents etc.)
• Do we need resolution mechanism (based on DOI or URN)
E-journals
• ISSN must be used, also for digitised journals– digitised version may have the same ISSN than
the original paper version
• ISSN should not be embedded on issues / articles, since this enhances recall too much
• Broadened scope: serials + integrating resources
Issues & articles
• SICI (Serial Item and Contribution Identifier) should be used
• ANSI/NISO standard (1996)– http://sunsite.berkeley.edu/SICI/
• Not widely supported yet; e-commerce is likely to change this– need to identify whatever that can be sold
• SICI generator available
Properties of SICI
• Extensible: can identify issue/article/section within article
• Can be created automatically (from structured source document)
• Complex– 0002-8231(1929)30:1<ZBDMSU>2.0.CO;2-Z
• Can be used as URN or DOI
URN & DOI
• Umbrella systems that provide e.g. persistent linkage between a reference and the resource via a resolution service
• DOI is a publisher-driven initiative, URN comes from the Internet community
• DOIs can be used as URNs, not vice versa
Digital object identifier
• Consist of prefix and suffix, separated by a slash– 10.1045/february2000-risher
• Suffix may be anything, there is no hint on its content
• Prefix identifies the publisher + indicates where to find a resolution service
Uniform resource name
• Consists of three parts:– string urn:– Namespace identifier (NID)– Namespace specific string (NSS)
• When NID is known, creating URNs from existing identifiers is trivially easy
• No hint on where to find resolution service
Business models
• DOI: annual payment for each DOI assigned– no decision yet on the size of the payment– flat fee for publisher ID
• URN: no price at all– but someone has to pay for the resolution
services
DIEPER policy
• URNs will be used, in order to enable URN-based resolution services
• ISSN/SICI will be used
• ISSN International Centre will assist in creation of URN resolution services– ISSN database will be contacted first, in order
to get the address of the resolution service