Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | maximilian-austin |
View: | 215 times |
Download: | 0 times |
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
An Overview of Persistent Identifiers
George M. Garrity
Microbiology and Molecular Genetics
Michigan State University
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
The phone call from Peru…
To provide the TEG with an overview of persistent identifiers and digital objects
Explore both the technical and social/policy issues
Provide some perspective on how persistent identifiers have been applied in two settings
Mature application - CrossRef
Evolving application - NamesforLife
Offer some thoughts on how PIDs might be applied to Certificates of Origin and Traditional Knowledge
My assignment
Disclaimers An end-user of persistent identifiers
Dual interests and IP in this space
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
So, what’s the problem?
“…link heterogeneous electronic libraries.
The difficulties inherent in this third objective ultimately led to this paper. ”
“But for the bioinformatician concerned with integrating and computing upon distributed information… In second place is perhaps naming (identifying), with all the gloriously idiosyncratic embedded semantics of local identifiers in disparate forms.”
Kahn and Wilensky1993
Clark 2003
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
So, what’s the problem?
“Even well-formed and properly applied names can serve as a source of confusion and considerable frustration. This is hardly a new problem.”
Garrity and Lyons2003
“Although used every day, identifiers are a mystery to many people, including people responsible for building complex information systems.”
Report of the NISOIdentifiers Round-Table 2006
“And now, a much more succinct way to say this: our systems are autistic. They don’t make inferences. When we learn something in one system or one area, it doesn’t carry over to other areas.”
McComb 2006
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Let’s start with some working definitions
An instance of an abstract data type that has two components: metadata and key metadata
Key metadata includes a handle
A handle is a globally unique identifier that is bound to the digital object
Digital objects
differ from database records and files,
are stored in network accessible repositories,
and are accessed using a repository access protocol.
Other key properties
Digital objects
From: Kahn and Wilenski 2006 Int J. Digit. Lib 6: 115-223
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Essential elements inHuman - machine communications
Machine - machine communications
Identifiers
Ideally… Exist as an unambiguous string
Context and application dependentActionable
Resolvable
Other points to considerSemantically opaque
Global or local
Unique or non-unique
Unanticipated uses
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Definitions (continued)
A name or an identifier for a resource that uniquely identifies that resource and will be forever associated with that resource. It will never be reassigned to any other resource and will not change regardless of where the resource is located or whatever protocol is used to access it.
Use of a well managed persistent identifier rather than a location will ensure that when a document is moved, or its ownership changes, the links to it will remain actionable.
PersistentIdentifiers
From: Diana Dack, Persistence is a Virtue Information Online Conference, Sydney. January 2001
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Definitions (continued)
Name resolution The process of mapping a persistent identifier to a URL that retrieves a resource. The URL locates the named resource identified by the persistent identifier (the name).
PID URLPID1
PID2
PID3
URL1
URL2
URL3
Resource
Identifies LocatesName resolution
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Inherent in the design of such systems….
Name registration&
Name resolution
Name registration&
Name resolution
AuthorityAuthority
PID URLPID1
PID2
PID3
URL1
URL2
URL3
ResourceMetadata
PID URL
IdentifiesIdentifies LocatesLocates
UserUser
Key metadataKey metadata
Global registryGlobal registry
DOIdirectory
URLURL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
URL
Content
Content
Assigner
DOIdirectory
DOIdirectory
DOIDOI
DOI
DOI
DOI
DOI
DOIDOI
DOI
DOI
DOI
DOI
DOI
DOI
doi>doi>doi>
Source: Norman Paskin, International DOI Foundation
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Comparing identifiers
A single unambiguous string
A numbering scheme
A label that identifies an entityISBN 0-387-98771-1
ATCC 27126*
L-681,572-001
A method of providing consistent syntax to denote class membership of an entity.
A formal standard or industry convention
An arbitrary internal system
Key point is establishing a 1:1 correspondence between labels and members
Enumeration
The number or label are simply strings
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Comparing identifiers (cont.)
A syntax by which an identifier can be expressed in a form suitable for use within a specific infrastructure.
Actionable identifiers
URI (URN and URL)
ISBN numbers as UPC/EAN identifiers
Does not mandate a method of creating labels
Does not create a managed environment
An infrastructure specification
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Includes Unique identifiers
A formalized infrastructure
Management policies for registration, structured interoperable metadata, policy, and governance mechanisms.
ExamplesUPC/EAN barcodes and RFID tags
Digital object identifiers (digital identifiers of objects)
A fully implementedidentifier system
Comparing identifiers (cont.)
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Desired properties of a candidate PID
Semantically opaque - avoid the pitfalls of embedded meaning
Governance - is there a technical and social framework overseeing the development, implementation and “marketing’ of the PID?
Persistence - is there a mechanism in place to guarantee persistence of issued PIDs, when so desired?
Registration - is there a mechanism for global registration of the PIDs or can anyone issue PIDs?
Metadata - is there a minimal requirement for metadata associated with each identified object?
Accepted standard - is there evidence that the PID is an accepted standard?
Globally unique - are the PIDs globally unique?
Widespread usage - how many PIDs have been issued and what is the rate of issuance of new PIDs?
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Desired properties of a candidate PID (cont)
Object/location - what does the PID identify?
Actionable - are network services attached/imbedded?
Unique - does the resolution service check for uniqueness at the local level?
Interoperability - can the identifiers be readily incorporated into other applications without modification or permission?
Granularity - can the identifiers be assigned to subcomponents (nesting of entities within entities).
Business model - is there a compelling business need for the PIDs to insure that the infrastructure can be maintained in a self-supporting manner?
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Comparison of identifier properties
OpaqueGovernancePersistentRegistrationMetadataAccepted standardGlobalWidespread useObjectActionableUniqueInteroperableAccession numbers - - V - V - - + + - - -LSID - - ? - V ? V ? - + + ?Gene names V - - - - + - + + - - -PURL - - - - + ? - - + + + +Taxid + - - - + - - ? + V + ?DNS - + - + - + + + - + + +Taxonomic names - + + v - + + + + - - -Handle + - + + + - + ? + + + +DOI + + + + + + + + + + + +
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
What does a Digital Object Identifier look like?
The prefix is assigned to the content provider by a DOI Registration Agency, or the Handle System directly.
The suffix is an opaque string supplied by the content provider.Handle software stores a mapping of the Handle to one or more
locations (or services) In virtually all cases today, the Handle is mapped to a location (URL).
http://dx.doi.org/10.1007/bergeysoutlineresolves to
http://141.150.157.80/bergeysoutline/main.htm
Which used to be:
http://www.springer-ny.com/bergeysoutline
10.1234/myownnumbers-123.00001
prefix suffix subsuffix
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Syntax of some other PIDs in “common” use
<Handle>::=<Handle Prefix> "/"<Handle Suffix>
http://hdl.handle.net/10.1099/ijs.0.64483-0
PersistentURLs
LSIDLSID Life ScienceIdentifiers
<purl>::=<protocol>/<resolver>/<name>http://purl.oclc.org/OCLC/OCLC/PURL/FAQ
urn:<LSID>:<AuthorityID>:<Namespace>:<Object>:<Rev>
http://lsid.biopathways.org/resolver/data/urn:LSID:ncbi.nlm.nih.gov:GenBank/accession:NT_001063:2
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Two implementations using DOIs
Independent membership association,founded and directed by STM publishers. Mission is to connect users to primary research literature through a DOI RA that performs reference cross-linking, subject to publisher-access controls.
The largest and most successful implementation of DOI services.
NamesforLife is a proprietary semantic resolution service developed at MSU. It provides a method for persistently linking the occurrence of a biological name or other technical term in third party content to managed information about its origins, formal definition, current usage, and related goods and services.
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
“…because as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns -- the ones we don't know we don't know.”
Rumsfeld’s axiom and knowledge bleed
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
The knowledge gradient
Unkno
wnun
know
ns
Know
n kn
owns
Basic and applied research advances
knowledge
Knowledge bleed results is a loss of
knowledge that has already been gained
Semantic resolution provides a mechanism to combat knowledge
bleed
Unkno
wnkn
owns
Know
n un
know
ns
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Ramifications of misunderstanding a name
Wrong assumptions, assertions, or hypotheses Misdiagnosis of infectious diseasesMisapplication of public policies
Highly significant
Significant Lost opportunities
Failure to reach potential customers potentially interested in marketed content, goods, and services at point of need.
The long-tail phenomenon*
Names trigger specificresponses
But, the concepts to which names apply are not static
May not always map 1:1
May require expertise for accurate interpretation
ABS Governance DialoguesThe Role of Documentation in ABS and TK Governance
Lima, Peru 21 January 2007
Some thoughts on selecting a PID for CO and TK
The intended use of the identifier
Syntactic rules governing the form of the identifier
What the identifier resolves to
The technical infrastructure that is available to support the identifier and the parties operating it
Policies governing creation, maintence, support, and persistence of the identifier
Information about any metadata related to the identifier that is or must be made available
A history about the identifier, including any changes in any of the above points over time.
Source: Report of the NISO Identifiers Roundtable 2006
Questions?