Date post: | 27-Mar-2015 |
Category: |
Documents |
Upload: | amelia-romero |
View: | 219 times |
Download: | 1 times |
doi>
Norman Paskin, International DOI Foundation
Digital Object Identifiers for Science Data
doi>Digital Object Identifier = DOI
• A name (not a location) for an entity on digital networks • A system for persistent and actionable identification and interoperable exchange
of managed information on digital networks – Standards-based components (detail in a moment)– Now to become an International Standard (in ISO TC46)
• Developed as cross-industry, cross-sector, not-for-profit effort managed by an open membership collaborative development body– International DOI Foundation (IDF)
• In widespread use now:– Over 15 million assigned, over 1000 naming authorities (users)– Key feature of scientific primary publishing as part of CrossRef system– Adopted for government documents (EC, OECD, UK, etc)
• In use, is a mechanism “behind the scenes”, – e.g. looks like a URL in a web context
• Offers interoperable common system for identification of science data: two projects considered as examples:– TIB project (citation of primary data sets)– Names for Life (biological taxonomy)
doi>• The word “identifier” can mean several different things, e.g.:
– Labels : Output of numbering schemes e.g. “ISBN 3-540-40465-1”– Specifications for using labels: e.g. on internet URL, URN, URI (URI =
Uniform Resource Identifier) – Implemented systems: Labels, following a specification, in a system e.g.
DOI system. Packaged system offering label + tools + implementation mechanisms
• Requirements: reliability, automated global access, and interoperability – Interoperability = the possibility of use in services outside the direct
control of the issuing assigner.
• Persistence implies interoperability (with the future)• Interoperability implies extensibility (do not know future uses)• Hence DOI is a generic framework applicable to any digital object
– Digital object can be a representation of any entity
Identifiers
Data ModelInternet
Resolution
Numbering scheme
Policies
DOI is the combination of these four components
doi>
DOI syntax can include any
existing identifier “label”formal or informal,
of any entity
• An identifier “container” e.g.– 10.1234/NP5678– 10.5678/ISBN-0-7645-4889-4– 10.2224/2004-10-ISO-DOI
• NISO Z39.84, DOI Syntax
Data ModelResolutionby Handle
Numbering scheme
Policies
doi>
Internet resolution allows a DOI to link to
any & multiple piecesof current data
• Resolve from DOI to data– initially to location (URL) – persistence
• May be to multiple data:– Multiple locations– Metadata– Services– Extensible user-defined
• Uses the Handle system- Implementing URI/URN concept- Running on TCP/IP (common co-inventor)- IETF RFCs 3650, 3651, 3652- See Release 1.0, September 2003 "Online Registries: The DNS
and Beyond...“ [doi:10.1340/309registries ]
Data ModelResolutionby Handle
Numbering scheme
Policies
doi>
<indecs> Data Dictionary
+DOI AP framework
• DOI Data Model = Metadata tools: – a data dictionary to define +– a grouping mechanism to relate
• Necessary for interoperability – “Enabling information that originates in one
context to be used in another in ways that are as highly automated as possible”.
• Able to use existing metadata – Mapped using a standard dictionary– Can describe any entity at any level of granularity– indecsDD which incorporates ISO MPEG 21 RDD
• IDF is the MPEG21 RDD registration authority
Data ModelResolutionby Handle
Numbering scheme
Policies
doi>
DOI policies allow any model
for practical implementations
• Implementation through IDF– Governance and agreed scope, policy, “rules of the road” – Technical infrastructure: resolution mechanism, proxy servers,
mirrors, back-up, central dictionary, – Social infrastructure: persistence commitments, fall-back procedures,
cost-recovery (self-sustaining), shared use of system– Not a standard but a Registration Authority/maintenance agency
• IDF delegates through Registration Agencies – Each can develop own applications– Use in “own brand” ways appropriate for their community
Data ModelResolutionby Handle
Numbering scheme
Policies
doi>
DOI to become ISO TC46/SC9 standard
Home of “identification numbering”: identifiers for semantically meaningful entities:
ISO 2108 International Standard Book Numbering (ISBN)
ISO 3297 International Standard Serial Number (ISSN)
ISO 3901 International Standard Recording Code (ISRC)
ISO 10444 International Standard Technical Report Number (ISRN)
ISO 10957 International Standard Music Number (ISMN)
ISO 15706 International Standard Audiovisual Number (ISAN)
ISO 15707 International Standard Musical Work Code (ISWC)
ISO Project 20925 Version identifier for Audiovisual Works (V-ISAN)
ISO Project 21047 International Standard Text Code (ISTC)
http://www.collectionscanada.ca/iso/tc46sc9/index.htmInformation and Documentation - Identification and Description
doi>
Resolve
The Handle resolution technology allows you to access any kind of Service associated with your DOI.eg
Services can include metadata services
Identify
DOI syntax can include any existing identifier, formal or informal, of any entityeg
10.2341/0-7645-4889-110.5678/978-0-7645-4889-410.1000/ISBN 076454889110.1234/Norman_presentation10.2224/2004-10-28-ISO-DOI
Describe
DOI metadata can be of any type, standard or proprietaryeg OnixForBooksOnixForSerialsIEEE/LOMMARCDublin CoreProprietary scheme
(to interoperate with anyone else in the DOI network, map to the <indecs> Data Dictionary (iDD).
DOI combination of components
A package of services is an Application Profile
doi>DOI and scientific data
• DOI is already the core technology for maintaining cross-reference – persistent links between a citation and internet access to article
• CrossRef system used by 350+ publishers representing bulk of STM articles (as pre-publication link builder) www.crossref.org
• 9,000 DOIs per day added to CrossRef. – Over 12 million DOIs now registered with CrossRef, – Over 850,000 assigned to books and conference proceedings.
• Several projects suggested to IDF using DOIs for data (not connected with CrossRef)– physico-chemical property data; biological microscopy images. – See Paskin, ICSTI 2002 paper
• Some projects have developed their own identifiers, very useful for their own area– E.g. Life Science Identifier (I3C/IBM): simple URN mechanism, non-generic, non-global– These can be incorporated into a DOI if needed to make globally interoperable and
extensible
• Two projects in particular have developed DOI applications:
doi>(1) TIB: Citation of Primary Data
• Problem: re-use of existing data sets– Attribution of data source: make data publications citable in a standard
way (cf. articles Citation Index) – Archiving of data in context so as to be discoverable and interoperable
(usable by others)
• Background – CODATA National Committee WG, grant-aided by DFG (Sept 2001 to May
2002): Report "Concept of Citing Scientific Primary Data“– Continuation as project for pilot implementation funded by DFG Oct 2003
to Oct 2005 at TIB (German National Library of Science & Technology)– Development of DOI registration agency for Data
• Solution:- DOIs for data sets, with associated metadata - Core management metadata applicable to all datasets - Structured metadata extensible to specific science disciplines
doi>(1) Citation of Primary Data: illustration of solution
• During her research for the World Data Center Climate (WDCC) Dr. Weather gains primary data about the weather in Hannover in the year 2003.– Primary data is tested, evaluated, stored and administrated at the WDCC.– Primary data is registered and allocated DOI at the TIB– With quality control of metadata, no change once allocated, etc
• Dr Weather can now cite this with a resolvable DOI e.g DOI:10.1594 /WDCC/W_Han_2003_MMB_210.1594 (Prefix) = TIB as the registration agency.
WDCC = research institute. W_Han_2003_MMB_2 = internal name of the Data
• DOI is resolvable directly, or via http as http://dx.doi.org/10.1594/WDCC/W_Han_2003_MMB_2
doi>(1) Citation of Primary Data: illustration of solution
Usage scenario 1:• Dr. Storm is reading publications from Dr. Weather in a journal and would like to analyse her
data under different aspects.• Can resolve the DOI to obtain the data set for use• In his publication ”Comparison of the weather from Hannover and Miami” Dr. Storm cites Dr.
Weather’s data using its DOI, referring to the uniqueness and own identity of the original data.• Citation example:
Weather, 2003: “Weather in Hannover for 2003”doi: 10.1594/WDCC/W_Han_2003_MMB_2
Usage scenario 2: • Mr. Nice is writing a paper about the sales figures of ice cream in Hannover in 2003, but he has
no information about the weather.• Searches via TIB central registration agency metadata search• Result is doi:10.1594/WDCC/W_Han_2003_MMB_2• He resolves the DOI to find the data.• The metadata refers him to the WDCC as publisher and data archive.• In his paper he cites the data using the DOI.
doi>(2) Names for life: Biological taxonomy
• Problem: “Future-proofing biological nomenclature”– See Garrity and Lyons, OMICS, 2003
• For a given nomenclature in a biological taxonomy, change occurs– e.g. new species recognised, species reassigned as the founding
species of new genera; synonyms; species split into subspecies which later became separate species;
– resulting in changes of names, genera, families, classes, relationships over time
– How does researcher keep track?
• Solution: DOI proposed as tool– a data model of nomenclature and taxonomy– enabling disambiguation of synonyms and competing taxonomies– a metadata resolution service– enabling dissemination of archived and updated information objects
through persistent links
macleodii(T)
communis
Alteromonas
vaga
nomenclature
(2) Names for Life: illustration of problem doi>
macleodii(T)
communis
Alteromonas
1972
vaga
nomenclature
communisvagahaloplanktis
Alteromonasmacleodii(T)
1972 1973
nomenclature
communisvagahaloplanktisrubra
Alteromonas
1972 1973 1976
macleodii(T)
nomenclature
communisvagahaloplanktisrubracitrea
Alteromonas
1972 1973 1976 1977
macleodii(T)
nomenclature
communisvagahaloplanktisrubracitreaesperjianaundina
Alteromonas
1972 1973 1976 1977 1978
macleodii(T)
nomenclature
communisvagahaloplanktisrubracitreaesperjianaundinaaurantia
Alteromonas
1972 1973 1976 1977 1978 1979
macleodii(T)
nomenclature
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedai
Alteromonas
1972 1973 1976 1977 1978 1979 1981
macleodii(T)
nomenclature
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceae
Alteromonas
1972 1973 1976 1977 1978 1979 1981 1982
macleodii(T)
nomenclature
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceae
vagacommunis(T)
Marinomonas Alteromonas
commune
vagum
1972 1973 1976 1977 1978 1979 1981 1982 1984
multiglobiferum
japonicumminutiumbiejerinckiimaris
maris
hiroshimense
pelagicumpusillum
jannaschiikreigii
Oceanosprillum
mariswilliamsae
linum(T) macleodii(T)
nomenclature
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedai
vaga benthicahanedai
Marinomonas Alteromonasputrifaciens(T)
Shewanella
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagum
Oceanosprillum
mariswilliamsae
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986
luteoviolaceae
communis(T)linum(T) macleodii(T)
nomenclature
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987
communisvagahaloplanktisrubracitreaesperjianaundinaaurantia
hanedailuteoviolaceaedenitrificans
vaga benthicahanedai
Marinomonas Alteromonas Shewanella
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagum
Oceanosprillum
mariswilliamsae
putrifaciens
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
vaga benthicahanedai
Marinomonas Alteromonas Shewanella
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagum
Oceanosprillum
mariswilliamsae
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988
colwelliana
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
vaga benthicahanedai
Marinomonas Shewanella
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagumbiejerinckii
pelagicummaris
hiroshimense
Oceanosprillum
mariswilliamsae
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonis
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990
colwelliana
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagumbiejerinckii
pelagicummaris
hiroshimense
Oceanosprillum
mariswilliamsae
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvagahaloplanktis
putrifacienshanedai
denitrificans
rubracitreaesperjianaundinaaurantia
luteoviolaceae
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagumbiejerinckii
pelagicummaris
hiroshimense
Oceanosprillum
mariswilliamsae
distinctafulginea
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubra
haloplanktis haloplanktis(T)
Pseudoalteromonas
undina
haloplanktis tetradonis
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagumbiejerinckii
pelagicummaris
hiroshimense
Oceanosprillum
mariswilliamsae
distinctafulginea
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubra
Pseudoalteromonas
undinaantartica
elyakoviii
haloplanktistetradonis
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagumbiejerinckii
pelagicummaris
hiroshimense
Oceanosprillum
mariswilliamsae
distinctafulginea
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubra
Pseudoalteromonas
undinaantartica
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolacea
bacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolytica
haloplanktistetradonis
mediterannea
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 2001
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagumbiejerinckii
pelagicummaris
hiroshimense
Oceanosprillum
mariswilliamsae
distinctafulginea
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubra
Pseudoalteromonas
undinaantartica
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolacea
bacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolyticatetrodonis
japonica
haloplanktistetradonis
mediterannea
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 2001 2002
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagumbiejerinckii
pelagicummaris
hiroshimense
Oceanosprillum
mariswilliamsae
distinctafulginea
Pseudoalteromonas
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolaceajaponicadenitrificanslivingstonensisalleyanna
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubraundinaantarticabacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolyticatetrodonis
haloplanktistetradonis
mediterannea
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
vaga benthicahanedaicolwellianaalgae
Marinomonas Shewanella
communisvagahaloplanktisrubracitreaesperjianaundinaaurantiaputrifacienshanedailuteoviolaceaedenitrificans
tetradonisatlanticacarageenovora
Alteromonas
colwelliana
1972 1973 1976 1977 1978 1979 1981 1982 1984 1986 1987 1988 1990 1992 1995 1997 2000 2001 2002 2004
japonicumminutiumbiejerinckiimaris
maris
hiroshimensemultiglobiferumpelagicumpusillumcommunejannaschiikreigiivagumbiejerinckii
pelagicummaris
hiroshimense
Oceanosprillum
mariswilliamsae
distinctafulginea
Pseudoalteromonas
elyakoviii
fridgidimarinageldimarinawoodyiiamazonensisbalticaoneidensispealeanaviolaceajaponicadenitrificanslivingstonensisalleyanna
atlanticaaurantiacarrageenovoracitreaesperjianaluteoviolaceanigrifacienspisicidarubraundinaantarticabacteriolyticaprydzensistunicatadistinctaelyakoviipeptidolyticatetrodonis
haloplanktistetradonis
11 others
mariniintestinasaireschlegelianagaetbuli
mediteranneaprimoryensis
haloplanktishaloplanktis(T)
putrifaciens(T)communis(T)linum(T) macleodii(T)
nomenclature
name
taxon
combinedname
exemplar
nomos
journalarticle
geneannotation
anyonline
information
strainrecord
links from the web
journalarticle
strainrecord
geneannotation
journalarticle
journalarticle
links to the web
DOI
DOIDOI
DOI
DOI
(2) Names for Life: illustration of solution
dissemination
name
taxon
combinedname
exemplar
nomos
By reasoning over information objects, construct services that can be offered through multiple resolution.
Look up this name Look up this name and all its synonyms and all its synonyms in PubMedin PubMed
Determine whether thisDetermine whether thisexemplar is part of a taxon exemplar is part of a taxon in another nomosin another nomos
Compare this name to Compare this name to the current state the current state (contents) (contents) of the taxonof the taxon
(2) Names for Life: illustration of solution
doi>
doi>Summary: DOI
• A system for persistent and actionable identification and interoperable exchange of managed information on digital networks – Standards-based components (detail in a moment)– Now to become an International Standard (in ISO TC46)
• Developed as cross-industry, cross-sector, not-for-profit effort managed by an open membership collaborative development body– International DOI Foundation (IDF)
• In widespread use now:– Over 15 million assigned, over 1000 naming authorities (users)– Key feature of scientific primary publishing as part of CrossRef system– Adopted for government documents (EC, OECD, UK, etc)
• In use, is a mechanism “behind the scenes”, – e.g. looks like a URL in a web context
• Offers interoperable common system for identification of science data: two projects considered as examples:– TIB project (citation of primary data sets)– Names for Life (biological taxonomy)