Post on 16-Oct-2018
transcript
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
ORCID and data publicationIdentifying knowledge contributors to motivate sharing
1
Gudmundur A. Thorisson <gt50@le.ac.uk> Tony Brookes bioinformatics group
Departments of GeneticsUniversity of Leicester
-- Outline --• Pretext: my route to workshop
• Ongoing & planned data publication projects
• Disease genetics data
• Planned integration with ORCID for researcher identification
• Role of ORCID in data publication ecosystem?
• [shameless] plug for Sept workshop on researcher identity
This work can be freely copied, redistributed and adapted, as long as proper attribution is given
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Pretext
2
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
3
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
44
Prof Anthony J Brookes GEN2PHEN coordinatorChair, Bioinformatics and GenomicsDepartment of GeneticsUniversity of Leicester, UK
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
5
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
The data sharing problem
6
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Lack of incentives for sharing
• Effort required to prepare, package and submit datasets to public repositories
• Time better spent writing papers & grants
• All sticks (funders, journals) - no carrots
• Need incentives - treat data as publications and credit creators
7
“[...] Many of the issues regarding data availability can be addressed if the principles of “publication” rather than “sharing” are applied. However, online data publication systems also need to develop mechanisms for data citation and indices of data access comparable to those for citation systems in print journals”
Costello, M. Motivating Online Publication of Data. BioScience (2009) vol. 59 (5) pp. 418-427
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Name ambiguity => attribution challenges
8
Are these authors all the same person?G. Thorisson, University of LeicesterG. A. Thorisson, University of LeicesterG. A. Thorisson, Cold Spring Harbor Laboratory
J. SmithJ. SmithJ. SmithJ. SmithJ. Smith [etc.]
Or these?
∼2/3 of the ∼6 million authors in MEDLINE share a last name and first initial with at least one other author, and an ambiguous name refers to ∼8 persons on average.Torvik and Smalheiser. Author name disambiguation in MEDLINE. ACM Transactions on Knowledge Discovery from Data (2009) vol. 3 (3)
How about these?
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
ORCID
F67572010
?
ORCID ID: B-1242-2010G. Thorisson, Univ. LeicesterG. A. Thorisson, Univ. LeicesterG. A. Thorisson, Cold Spring Harbor Lab.
ORCID ID: G-1442-2009J. Smith, Univ. North Pole
ORCID ID: D-2400-2010J. Smith, Luthor Corporation
ORCID - tackling the contributor identity problem
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Projects
10
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
1110
1. Diagnostic laboratories
2. Central ‘clearinghouse’
3. End-users (e.g. LSDB curators)
Publish data Retrieve Atom feeds
Submi&ng muta,ons from diagnos,c labs using “Café RouGE enabled” so<ware via simple bu@on click
Data are shared with diverse 3rd par,es via manual retrieval or automated feed-‐based monitoring/retrieval
Cafe Variome - facilitating exchange of genetic data
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
12
Cafe Variome - facilitating exchange of genetic data
dbSNP (coding)UniProt
PhenCode
Submission from diag. lab
Metadata describing varia,on data published elsewhere
Data shared with diverse 3rd par,es and data usage/cita,on tracked via DOI
✔
×
DOI assigned to incoming data upload
Already stable IDs so no DOI assignedA@ribu,on given to data submi@ers
via ORCID unique iden,fier
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
12
Cafe Variome - facilitating exchange of genetic data
dbSNP (coding)UniProt
PhenCode
Submission from diag. lab
Metadata describing varia,on data published elsewhere
Data shared with diverse 3rd par,es and data usage/cita,on tracked via DOI
✔
×
DOI assigned to incoming data upload
Already stable IDs so no DOI assignedA@ribu,on given to data submi@ers
via ORCID unique iden,fier
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
13
G. Thorisson, Univ. Leicestergthorisson@gmail.com
ORCID ID: A-883-2010
4x variants in BRCA2gene in patient X
Publication credit for Cafe Variome deposits
CV user has linked his user account with his ORCID profile
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
13
G. Thorisson, Univ. Leicestergthorisson@gmail.com
ORCID ID: A-883-2010
4x variants in BRCA2gene in patient X
G. A. Thorisson (A-883-2010). 4x variants in BRCA2 gene. Published online via Cafe Variome. 21 January (2011) doi:10.1255/caferouge.BRCA2-2352354
=> http://api.caferouge.org/atomserver/v1/caferouge/mutations/2352354
Publication credit for Cafe Variome deposits
CV user has linked his user account with his ORCID profile
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
GWAS nanopublications• Foray into semantic publishing
– GWAS Central as ‘nano-publisher’
– variant<->disease assertion as nanopub
rs19243 <associatedWith> Type II diabetes + condition & provenance
• Provenance part to include:– Contributors IDs
– Contributor roles:
• Author(s) on original GWAS paper
• Curator
• Registrant
• Citability: register DOI for nanopub?
14
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
BRIF - measuring bioresource use and impact
• Biobanks: collections of biomaterials + associated metadata – Identification: citing, acknowledging, tracking use of
– Evaluation: assess impact
– Attribution: crediting PIs, repository managers, technicians [?]
• Digital resources, incl. biomedical databases– E.g. locus-specific databases (LSDBs), variation archives (e.g. Cafe Variome)
– How to acknowledge researchers who:
• Maintain vital community resource (e.g. http://www.wormbase.org )
• Undertake value-adding curation
– Micro-attribution: Giardine, B. et al. Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach. Nature Genetics advance on, (2011). http://dx.doi.org/10.1038/ng.785
• BRIF online group: http://bit.ly/brif-group
15
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Identifying & citing databases
16
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Identifying & citing databases• Bio-databases are often cited as a collection
– E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”
– Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk
16
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Identifying & citing databases• Bio-databases are often cited as a collection
– E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”
– Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk
• Are DOIs appropriate? - db’s are not ‘unchanging entities’
16
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Identifying & citing databases• Bio-databases are often cited as a collection
– E.g. “In our analysis, we used release X of SwissProt” “Our results were compared with the COL3A1 database as of Jan11”
– Example: OI variant database:Dalgleish, R. (1998) Nucleic Acids Research 26(1), 253 http://dx.doi.org/10.1093/nar/26.1.253Dalgleish, R. (1997) Nucleic Acids Research 25(1), 181 http://dx.doi.org/10.1093/nar/25.1.181Osteogenesis Imperfecta Variant Database - https://oi.gene.le.ac.uk
• Are DOIs appropriate? - db’s are not ‘unchanging entities’
• Minimal information about a database - include DOI name?– What does the DOI point to? URL for database site vs. URL for db description
16
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Attributing contributions to bio-resources
17
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Attributing contributions to bio-resources
• Database curation– Management: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff
Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff
– Microattribution: fine-grained tracking of curator activity (insert/update/delete)
17
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Attributing contributions to bio-resources
• Database curation– Management: R. Dalgleish A-3523-534-144 <maintained> 10.5335/lsdb.oi.325dff
Temporary curator appointment: J. Smith G-1442-2009 <curated> 10.5335/lsdb.oi.325dff
– Microattribution: fine-grained tracking of curator activity (insert/update/delete)
• Biobanking activities– Principal Investigator responsible for project (aka ‘corresponding author’)
– Laboratory personnel?
– Clinical collaborators?
17
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Characterizing citations and contributions
18
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Characterizing citations and contributions• What is the nature of the resource citation?
– acknowledgement / earlier or related work
– reused data or materials
– extended methodology
– ‘..this study is flawed and complete rubbish!!’
18
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Characterizing citations and contributions• What is the nature of the resource citation?
– acknowledgement / earlier or related work
– reused data or materials
– extended methodology
– ‘..this study is flawed and complete rubbish!!’
• What is the nature of my contribution to the resource?– Paper: authored / undertook analysis / conceived of study / designed experiment
– Dataset: created / submitted / managed
– Database: curator / manager / PI responsible
– Biobank: sample collector / day-to-day manager / ??
– Temporal aspect:
• E.g. Mummi contributed in a curator role for SwissProt Jun 2004 to Oct 2009
18
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Semantic frameworks for scientific publishing
19
Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1).doi:10.1186/2041-1480-1-S1-S6
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Semantic frameworks for scientific publishing
my study <cito:extends> Thorisson et al. 2008 doi:10.433/888544jamaX
my study <cito:usesSamplesFrom> Biobank X doi:10.424/35xxjapan.5 ??
G. Thorisson (A-523-44-3423) <pro:manager> Biobank X doi:10.424/35xxjapan??
19
Shotton, D., 2010. CiTO, the Citation Typing Ontology. Journal of Biomedical Semantics, 1(Suppl 1).doi:10.1186/2041-1480-1-S1-S6
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
20
Role of ORCID?
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
• Who contributed to dataset 10.4259/psycho.5gtpq-thorisson?
• All data publications by ORCID A-883-2010 ?
• Which papers have cited the works of ORCID A-883-2010 ?
• Total no. citations to datasets by A-883-2010 in the last 2 years?
• Total no. downloads of datasets by A-883-2010?
• Which database projects has A-883-2010 contributed to?
• [...]
G. Thorisson, Univ. Leicestergthorisson@gmail.com
ORCID ID: A-883-2010
Why track all this stuff?Enable aggregation of contributions by unique researcher ID
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Current ORCID status & timeline
• Alpha prototype– Running on a sandbox website for limited testing
• partial functionality - based on ResearcherID software
• Early adopters / collaborators
• Looking to collaborate with projects– Gather use cases => feed requirements for ORCID
core system
– WHERE/HOW might ORCID be used to identify contributors?
– Joint fund-seeking to do pilot implementations
22
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Current ORCID status & timeline
• Alpha prototype– Running on a sandbox website for limited testing
• partial functionality - based on ResearcherID software
• Early adopters / collaborators
• Looking to collaborate with projects– Gather use cases => feed requirements for ORCID
core system
– WHERE/HOW might ORCID be used to identify contributors?
– Joint fund-seeking to do pilot implementations
22
• Timeline for live beta system: early 2012
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
Example: SageCite?
• i) dataset published in SageCommons– assigned DOI via DataCite
– attribution link deposited in ORCID
• ii) derivative datasets published in SageCommons– assigned DOI => DataCite
– attribution link deposited in ORCID
• iii) analysis workflow published via myExperiment– attribution => ORCID (creator/submitter & others who contributed)
– DOI (or not - not essential?)
23
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
24
Monday, 16 May 2011
Data Citation Principles workshop, Harvard 16-17 May 2011
http://www.orcid.org G. A. Thorisson, University of Leicester / ORCID
25
GEN2PHEN Consortiumhttp://www.gen2phen.org/about-gen2phen/partners
Prof Anthony J. Brookes Bioinformatics Group
This work has received funding from the European Community's Seventh Framework Programme (FP7/2007-2013)under grant agreement number 200754 - the GEN2PHEN project.
Acknowledgements
Contact me! Gudmundur ‘Mummi’ Thorisson
<gt50@le.ac.uk> |<gthorisson@gmail.com>http://friendfeed.com/mummi
http://www.linkedin.com/in/mummihttp://www.twitter.com/gthorisson
Monday, 16 May 2011