Creating Citable Data Identifiers

Post on 23-Feb-2016

31 views 0 download

Tags:

description

Creating Citable Data Identifiers. Ryan Scherle Mark Diggory. Mimosa house 807 South Virginia Dare Trail Kill Devil Hills, NC USA 27948. 1903-12- 17 36.019705 N, 75.668769 W. 79330-S84-A41 WP0ZZZ99ZTS392124. Loxosceles reclusa. Citing identifiers. Mimosa house - PowerPoint PPT Presentation

transcript

Creating Citable Data IdentifiersRyan ScherleMark Diggory

Mimosa house 807 South Virginia Dare Trail Kill Devil Hills, NC USA 27948

1903-12-17 36.019705 N, 75.668769 W

79330-S84-A41 WP0ZZZ99ZTS392124

Loxosceles reclusa

Citing identifiers Mimosa house 807 South Virginia Dare Trail 1903-12-17 27948 Loxosceles reclusa 36.019705 N, 75.668769 W 79330-S84-A41 WP0ZZZ99ZTS392124

Identifiers matter Some identifiers are machine-friendly,

some are human-friendly For citations, you need to strike a

balance Good identifiers are a critical selling

point for an repository

http://purl.dlib.indiana.edu/iudl/lilly/slocum/LL-SLO-009276

Principles ofcitable identifiers

1. Use DOIs http://dx.doi.org/10.5061/dryad.123ab Scientists are familiar with DOIs

1. Use DOIs http://dx.doi.org/10.5061/dryad.123ab Scientists are familiar with DOIs DOIs are supported by many tools and

services

1. Use DOIs http://dx.doi.org/10.5061/dryad.123ab Scientists are familiar with DOIs DOIs are supported by many tools and

services

Current support:Eprints Dspace FedoraNo No With work

2. Keep identifiers simple http://dx.doi.org/10.5061/dryad.123ab Complex identifiers are fine for machines, but

they’re bad for humans. Despite best intentions, humans sometimes

need to work with identifiers manually.

http://dx.doi.org/10.1179/1743131X11Y.0000000009

http://dx.doi.org/10.1016/B978-0-12-220851-5.00003-4

2. Keep identifiers simple http://dx.doi.org/10.5061/dryad.123ab Complex identifiers are fine for machines, but

they’re bad for humans. Despite best intentions, humans sometimes

need to work with identifiers manually.

Current support:Eprints Dspace FedoraYes Yes Yes

3. Use syntax to illustrate relationships http://dx.doi.org/10.5061/dryad.123ab/3 Adding a tiny bit of semantics to an

identifier is incredibly usefulhttp://files.eprints.org/691/http://files.eprints.org/447/http://files.eprints.org/556/

Useful for various human “hacks” Useful for statistics

3. Use syntax to illustrate relationships http://dx.doi.org/10.5061/dryad.123ab/3 Adding a tiny bit of semantics to an

identifier is incredibly useful

Current support:Eprints Dspace FedoraNo No With work

4. When “meaning-bearing” content changes, create a versioned identifier Scientists want data to be invariant to

enable reuse by machines Even a single bit makes a difference Watch out for implicit abstractions…

http://dx.doi.org/10.5061/dryad.123ab/thumbnail

What about DOI conventions?

5. When “meaningless” content changes, retain the current identifier Descriptive metadata must be editable

without creating a new identifier. Humans rarely care about metadata

changes, especially for citation purposes!

Caveat: machine-oriented systems may consider the “metadata” to be data, which requires identifier changes

Current versioning supportEPrints Support for flexible versioning/relationships,

but no support for expressing these relationships in identifiers.

DSpace None.

Fedora Implicit versioning of all data and metadata. This is highly useful, but it is too granular for citation purposes.

Principles of citable identifiers1. Use DOIs2. Keep identifiers simple3. Use syntax to illustrate relationships 4. When “meaning-bearing” content changes, create a versioned identifier5. When “meaningless” content changes, retain the current identifier

Hacking DSpace to support…

DOI identifier registrationSemantics in identifiersCitation publicationVersioning

DSpace identifier services Handle system independence

More future identifier systems will come. Granular control

Separate reservation from registration Citation

Registration of metadata with external services

DSpace identifier services

DataCite content service

Promoting accurate citationsAdded suggested citation formats up front

Versioning Versioning is item “editioning” Creation of new versions is a “user

mediated” process (submitter or reviewer)

Versioning does not alter the original item

Version relationships are maintained independent of the item’s metadata

Submission-based revisions

Result: Citable data versionsdoi:10.5061/dryad.bb7m4

Future technical directions Add metadata versioning under the

hood -- may need to rethink some of the current system

Integrate our changes to core DSpace Moving these features into the core

requires further discussion with the Dspace user community

How are we doing?For 186 articles associated with Dryad deposits:

77% had “good” citations to the data 2% had “bad” citations to the data 21% had no data citations

Standards for data citation are still evolving. Journals have yet to agree on where to place data citations, and authors are just starting to become familiar with the concept.

What should you do now? Analyze how data is used and cited

outside the repository Determine whether use is more

machine-oriented or more human-oriented

Design identifiers and identifier management to facilitate the observed uses

Thanks!

Ryan Scherleryan@scherle.org

Mark Diggorymdiggory@atmire.com