ORCID and data publication - Harvard...

Post on 16-Oct-2018

213 views 0 download


Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

ORCID and data publicationIdentifying knowledge contributors to motivate sharing


Gudmundur A. Thorisson <gt50@le.ac.uk> Tony Brookes bioinformatics group

Departments of GeneticsUniversity of Leicester

-- Outline --• Pretext: my route to workshop

• Ongoing & planned data publication projects

• Disease genetics data

• Planned integration with ORCID for researcher identification

• Role of ORCID in data publication ecosystem?

• [shameless] plug for Sept workshop on researcher identity

This work can be freely copied, redistributed and adapted, as long as proper attribution is given

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID



Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


Prof Anthony J Brookes GEN2PHEN coordinatorChair, Bioinformatics and GenomicsDepartment of GeneticsUniversity of Leicester, UK

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

The data sharing problem


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Lack of incentives for sharing

• Effort required to prepare, package and submit datasets to public repositories

• Time better spent writing papers & grants

• All sticks (funders, journals) - no carrots

• Need incentives - treat data as publications and credit creators


“[...] Many of the issues regarding data availability can be addressed if the principles of “publication” rather than “sharing” are applied. However, online data publication systems also need to develop mechanisms for data citation and indices of data access comparable to those for citation systems in print journals”

Costello, M. Motivating Online Publication of Data. BioScience (2009) vol. 59 (5) pp. 418-427

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Name ambiguity => attribution challenges


Are these authors all the same person?G. Thorisson, University of LeicesterG. A. Thorisson, University of LeicesterG. A. Thorisson, Cold Spring Harbor Laboratory

J. SmithJ. SmithJ. SmithJ. SmithJ. Smith [etc.]

Or these?

∼2/3 of the ∼6 million authors in MEDLINE share a last name and first initial with at least one other author, and an ambiguous name refers to ∼8 persons on average.Torvik and Smalheiser. Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data (2009) vol. 3 (3)

How about these?

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID




ORCID ID: B-1242-2010G. Thorisson, Univ. LeicesterG. A. Thorisson, Univ. LeicesterG. A. Thorisson, Cold Spring Harbor Lab.

ORCID ID: G-1442-2009J. Smith, Univ. North Pole

ORCID ID: D-2400-2010J. Smith, Luthor Corporation

ORCID - tackling the contributor identity problem

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID



Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


1. Diagnostic laboratories

2. Central ‘clearinghouse’

3. End-users (e.g. LSDB curators)

Publish data Retrieve Atom feeds

Submi&ng  muta,ons  from  diagnos,c  labs  using  “Café  RouGE  enabled”  so<ware  via  simple  bu@on  click

Data  are  shared  with  diverse  3rd  par,es  via  manual  retrieval  or  automated  feed-­‐based  monitoring/retrieval

Cafe Variome - facilitating exchange of genetic data

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


Cafe Variome - facilitating exchange of genetic data

dbSNP  (coding)UniProt


Submission  from  diag.  lab

Metadata  describing  varia,on  data  published  elsewhere

Data  shared    with  diverse  3rd  par,es  and  data  usage/cita,on  tracked  via  DOI


DOI  assigned  to  incoming  data  upload

Already  stable  IDs  so  no  DOI  assignedA@ribu,on  given  to  data  submi@ers

via  ORCID  unique  iden,fier

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


Cafe Variome - facilitating exchange of genetic data

dbSNP  (coding)UniProt


Submission  from  diag.  lab

Metadata  describing  varia,on  data  published  elsewhere

Data  shared    with  diverse  3rd  par,es  and  data  usage/cita,on  tracked  via  DOI


DOI  assigned  to  incoming  data  upload

Already  stable  IDs  so  no  DOI  assignedA@ribu,on  given  to  data  submi@ers

via  ORCID  unique  iden,fier

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


G. Thorisson, Univ. Leicestergthorisson@gmail.com

ORCID ID: A-883-2010

4x variants in BRCA2gene in patient X

Publication credit for Cafe Variome deposits

CV user has linked his user account with his ORCID profile

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


G. Thorisson, Univ. Leicestergthorisson@gmail.com

ORCID ID: A-883-2010

4x variants in BRCA2gene in patient X

G. A. Thorisson (A-883-2010). 4x variants in BRCA2 gene. Published online via Cafe Variome. 21 January (2011) doi:10.1255/caferouge.BRCA2-2352354

=> http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354

Publication credit for Cafe Variome deposits

CV user has linked his user account with his ORCID profile

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

GWAS nanopublications• Foray into semantic publishing

– GWAS Central as ‘nano-publisher’

– variant<->disease assertion as nanopub

rs19243 <associatedWith> Type II diabetes + condition & provenance

• Provenance part to include:– Contributors IDs

– Contributor roles:

• Author(s) on original GWAS paper

• Curator

• Registrant

• Citability: register DOI for nanopub?


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

BRIF - measuring bioresource use and impact

• Biobanks: collections of biomaterials + associated metadata – Identification: citing, acknowledging, tracking use of

– Evaluation: assess impact

– Attribution: crediting PIs, repository managers, technicians [?]

• Digital resources, incl. biomedical databases– E.g. locus-specific databases (LSDBs), variation archives (e.g. Cafe Variome)

– How to acknowledge researchers who:

• Maintain vital community resource (e.g. http://www.wormbase.org )

• Undertake value-adding curation

– Micro-attribution: Giardine, B. et al. Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nature Genetics advance on, (2011). http://dx.doi.org/10.1038/ng.785

• BRIF online group: http://bit.ly/brif-group


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Identifying & citing databases


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Identifying & citing databases• Bio-databases are often cited as a collection

– E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”

– Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Identifying & citing databases• Bio-databases are often cited as a collection

– E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”

– Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk

• Are DOIs appropriate? - db’s are not ‘unchanging entities’


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Identifying & citing databases• Bio-databases are often cited as a collection

– E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”

– Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk

• Are DOIs appropriate? - db’s are not ‘unchanging entities’

• Minimal information about a database - include DOI name?– What does the DOI point to? URL for database site vs. URL for db description


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Attributing contributions to bio-resources


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Attributing contributions to bio-resources

• Database curation– Management: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff

Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff

– Microattribution: fine-grained tracking of curator activity (insert/update/delete)


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Attributing contributions to bio-resources

• Database curation– Management: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff

Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff

– Microattribution: fine-grained tracking of curator activity (insert/update/delete)

• Biobanking activities– Principal Investigator responsible for project (aka ‘corresponding author’)

– Laboratory personnel?

– Clinical collaborators?


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Characterizing citations and contributions


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Characterizing citations and contributions• What is the nature of the resource citation?

– acknowledgement / earlier or related work

– reused data or materials

– extended methodology

– ‘..this study is flawed and complete rubbish!!’


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Characterizing citations and contributions• What is the nature of the resource citation?

– acknowledgement / earlier or related work

– reused data or materials

– extended methodology

– ‘..this study is flawed and complete rubbish!!’

• What is the nature of my contribution to the resource?– Paper: authored / undertook analysis / conceived of study / designed experiment

– Dataset: created / submitted / managed

– Database: curator / manager / PI responsible

– Biobank: sample collector / day-to-day manager / ??

– Temporal aspect:

• E.g. Mummi contributed in a curator role for SwissProt Jun 2004 to Oct 2009


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Semantic frameworks for scientific publishing


Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1).doi:10.1186/2041-1480-1-S1-S6

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Semantic frameworks for scientific publishing

my study <cito:extends> Thorisson et al. 2008 doi:10.433/888544jamaX

my study <cito:usesSamplesFrom> Biobank X doi:10.424/35xxjapan.5 ??

G. Thorisson (A-523-44-3423) <pro:manager> Biobank X doi:10.424/35xxjapan??


Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1).doi:10.1186/2041-1480-1-S1-S6

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


Role of ORCID?

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

• Who contributed to dataset 10.4259/psycho.5gtpq-thorisson?

• All data publications by ORCID A-883-2010 ?

• Which papers have cited the works of ORCID A-883-2010 ?

• Total no. citations to datasets by A-883-2010 in the last 2 years?

• Total no. downloads of datasets by A-883-2010?

• Which database projects has A-883-2010 contributed to?

• [...]

G. Thorisson, Univ. Leicestergthorisson@gmail.com

ORCID ID: A-883-2010

Why track all this stuff?Enable aggregation of contributions by unique researcher ID

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Current ORCID status & timeline

• Alpha prototype– Running on a sandbox website for limited testing

• partial functionality - based on ResearcherID software

• Early adopters / collaborators

• Looking to collaborate with projects– Gather use cases => feed requirements for ORCID

core system

– WHERE/HOW might ORCID be used to identify contributors?

– Joint fund-seeking to do pilot implementations


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Current ORCID status & timeline

• Alpha prototype– Running on a sandbox website for limited testing

• partial functionality - based on ResearcherID software

• Early adopters / collaborators

• Looking to collaborate with projects– Gather use cases => feed requirements for ORCID

core system

– WHERE/HOW might ORCID be used to identify contributors?

– Joint fund-seeking to do pilot implementations


• Timeline for live beta system: early 2012

Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID

Example: SageCite?

• i) dataset published in SageCommons– assigned DOI via DataCite

– attribution link deposited in ORCID

• ii) derivative datasets published in SageCommons– assigned DOI => DataCite

– attribution link deposited in ORCID

• iii) analysis workflow published via myExperiment– attribution => ORCID (creator/submitter & others who contributed)

– DOI (or not - not essential?)


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


Monday, 16 May 2011

Data Citation Principles workshop, Harvard 16-17 May 2011

http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID


GEN2PHEN Consortiumhttp://www.gen2phen.org/about-gen2phen/partners

Prof Anthony J. Brookes Bioinformatics Group

This work has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)under grant agreement number 200754 - the GEN2PHEN project.


Contact me! Gudmundur ‘Mummi’ Thorisson

<gt50@le.ac.uk> |<gthorisson@gmail.com>http://friendfeed.com/mummi


Monday, 16 May 2011