Evaluating the possibilities of DataCite for developing 'Open data metrics' on the production and...

Post on 15-Apr-2017

269 views 1 download

transcript

3:AM in BUCHAREST| September 28-29, 2016

Evaluating the possibilities of DataCite for

developing ‘Open data metrics’ on the

production and usage of datasets worldwide

Nicolas Robinson-Garcia, Phillipe Mongeon, Wei Jeng & Rodrigo Costas

Promotion of data sharing infrastructures

• Data repositories

• Data Citation Index

• Persistent linkages (DOIs)

Promotion of data sharing practices

• Data sharing incentives

• Perceived benefits

Sharing and citing data

DATA CITATIONS

Promotion of data sharing infrastructures

• Data repositories

• Data Citation Index

• Persistent linkages (DOIs)

Promotion of data sharing practices

• Data sharing incentives

• Perceived benefits

Sharing and citing data

DATA CITATIONS

Maximizing investment

Searching for evidences

of data sharing

Aims of this study

1.Who shares data?• Which countries are sharing scientific data

(in DataCite)?• Are there biases by discipline (in DataCite)?

2.Are there evidences of data reuse?• Are researchers using DOIs to link papers to

datasets?• Are they mentioning datasets through social

media?

Citations to publications• Based on researchers’ communication patterns

• Influenced by research evaluation schemes

• Highly standardized and extended within the scientific practice

Citations to datasets• Promoted by funding bodies

• Not embedded on scholarly communication patterns

• Heterogeneous forms of acknowledgement (paper, dataset, none…)

From citing papers to

citing data

The metadata schema of

DataCite

Mandatory fields

Source: DataCite Metadata Working Group (2016).

http://doi.org/10.5438/0012

Citations to data

Recommended format

Source: DataCite Metadata Working Group (2016).

http://doi.org/10.5438/0012

Creator (PublicationYear):

Title. Publisher. Identifier

Preliminary results

Types of data

Preliminary results

Availability of Publisher information

Preliminary results

What is a Publisher?

Preliminary results

Citations Altmetric.com

(Twitter)

DataCite records with DOI 6352875

Records with metrics (matched on DOI) 6432 14314

%records with metrics 0.10% 0.23%

Intensity (records with metrics/metric) 17.9 4.1

• Citation and altmetric analysis

• Matches based only on DOIs!

The no. 1s

The most cited dataset?

The most tweeted dataset?

Preliminary results

Country of origin based on publisher infoCountry # Records

UK 1728428Germany 966289

Switzerland 591062USA 560799

Canada 81471Spain 32795

Netherlands 26791Italy 25241

Australia 21059Ireland 19416Austria 18571

USA, UK 12981France 9443

Denmark 8804BE,DE,IT,NL,ES 5366

Sweden 2816Korea 2

Preliminary results

Country of origin based on publisher info

Bibliometric limitations

Technical

Heterogeneity of sources

Lack of basic data (affiliation)

Lack of standard normalisation

Conceptual

Publication vs. Data production patterns

Data citations vs. Data reuse

Conceptual heterogeneity

Some examples

Publication author vs. Data producer distribution

Authors (non disambiguated) WoS records Creators DataCite records

WANG, Y 56596 Geml, József 487363

ZHANG, Y 54203 Ryberg, Martin 487351

WANG, J 49817 Lumbsch, H.

Thorsten487350

LIU, Y 46307 Tedersoo, Leho 487350

LI, Y 45773 Hampe, Felix 487350

Most productive data

creator

Some examples

[ "

#IHaveWrittenMyOwnOneNewScientificPaperOnTheGeographyAndMarsLif

e. #IHaveAlreadySuccessfullyOFFICIALLYPublishedItOn #DiscoveryNews

#ScienceAlert #DiscoveryChannelIndia #AndAlsoOn

#DiscoveryCommunications

#AndItsOFFICIALPublicationIsThereOnAllOfItSoPlea

Heterogeneity of Publisher information

Phys.Rev. C75 (2007) 045203

ETH-Bibliothek Zürich, Bildarchiv / Fotograf: Unbekannt / Fel_027418-

VE / Public Domain Mark

(see the metadata of copyright: http://www.e-

pics.ethz.ch/index/ethbib.bildarchiv/ETHBIB.Bildarchiv_Fel_008192-

RE_257002.html)

JHEP 1311 (2013) 183

References

DOI numbers

Copyright

statements

Hashtags???