Show me the data!Data peer review at Scientific Data
Varsha Khodiyar, Scientific Data30.03.2017
2
Scientific Data, a Nature Research journalData Descriptor
Primary article type; sound science and facilitates data reuse
AnalysisNew analyses or meta-analyses of existing data
ArticleOriginal reports on advances in data sharing & reuse
CommentAnnouncements of broad interest; usually invited
www.nature.com/scientificdata
3
Under the hood of a Data Descriptor
• Context for data generation (background)
• How was data generated?• How was data processed?• Where is the data?
• Synthesis• Analysis• Conclusions
4
A key principle of publishing at Scientific Data
Wilkinson M.D., et al . The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3; 160018 (2016) doi:10.1038/sdata.2016.18
Findable – (meta)data is uniquely and persistently identifiable.
Accessible – data is reachable and accessible by humans and machines, using standard formats and protocols.
Interoperable – (meta)data is machine readable and annotated with resolvable vocabularies and ontologies.
Reusable – (meta)data is sufficiently well-described to allow integration with compatible data.
5
Data Descriptors have human and machine understandable components
Human readable representation of
studyi.e. article (HTML &
PDF)
Human readable representation of
studyi.e. article (HTML &
PDF)
6
Data Descriptors have human and machine understandable components
Machine accessible representation of
studyi.e. metadata
7
What types of data can be published?
7
Decades old dataset
Standalone dataset
Data that has been used in an analysis article
Large consortium dataset
Data from a single experiment
Any data that the researcher finds valuable and that others
might find useful too
Data associated with a high impact analysis article
8
When can a Data Descriptor be published?
8
After data analysis has been
published
Before analysis has been published
Authors not intending to analyse data
Data Descriptors can be submitted and published at
any point in the research workflow, i.e. whenever it makes most sense for your
data
After data analysis has been
published
Before the analysis has been published
Publication alongside analysis article
99
Why peer review data?
10
Researchers are sharing and reusing data
• Direct contact between researchers
(on request) is the most common
way of sharing data
• Repositories are second most
common method of sharing
Why might direct contact be the
most preferred method?Fig 2A & C; Kratz and Strasser, PLOS ONE (2015)
doi: 10.1371/journal.pone.0117619
11
Researchers see peer review as a mark of data quality
• Respondents trust peer review above all else: 72% (n = 175) say peer review confers high or complete confidence in the data
Figure 6B; Kratz and Strasser, PLOS ONE (2015) doi: 10.1371/journal.pone.0117619
1212
How is data peer reviewed at Scientific Data?
13
Editorial office
Susanna-Assunta SansoneHonorary Academic Editor
Andrew L. HuftonManaging Editor
Varsha K. KhodiyarData Curation Editor
14
Selection of Editorial Board members
Experts in their discipline
AND
Demonstrable experience of data standards, data reuse or data analysis in
their discipline
www.nature.com/sdata/about/editorial-board#eb
15
Data peer review
www.nature.com/sdata/policies/for-referees
Experimental Rigor and Technical Data Quality
Were data produced in a sound manner?
Technical quality of data – appropriate statistical analyses?
Experimental rigor - appropriate depth, coverage?
Completeness of the Description
Sufficient detail to allow others to reproduce these steps?
Sufficient detail to allow others to reuse this data?
Consistent with relevant minimum reporting standards?
Integrity of the Data Files and Repository Record
Do data files appear complete and match manuscript descriptions?
Are data archived to the most appropriate repository?
16
We capture metadata about the dataset being described in each Data Descriptor.
During the metadata curation process• Manuscript re-read• Data archive checked• Minor issues with the data and/or manuscript often identified
Metadata curation and final data checking
17
Why a Data Descriptor may be rejected
Reject without review• Out of scope or no data present
Reject after review• Serious flaws in the study design,
e.g. lack of crucial controls• Serious issues identified in the data
files by the peer reviewers
After rejection• Address concerns and resubmit to Scientific Data
• Resubmit to another data journal• Withdraw data from Scientific Data integrated repositories
Data should be technically reliable and suitable for use by others
1818
Ensuring your data is peer review ready
19
Create a data management plan
• Can avoid problems later• Increasingly required by funders• Critically evaluate existing practices – you may be setting standards for
your field• Some aspects of best practice may incur costs• Find people and resources that can help you
Datasets CodeMetadataResearch paper
Nature Genetics
20
Archive your data to the most appropriate repository
We currently list around 90 repositories, across biological, medical, physical and social sciences
www.nature.com/sdata/policies/repositories
Considerations:
1. Is there a discipline or data-specific repository for your data?
2. If no discipline or data-specific repository for your data exists, does your
funder or institution mandate deposition to a particular repository?
21
Spot the mistakes
Unhelpful document name
Formatting used to convey information
Special characters can cause text mining errors
Meaningless column titles
Undefined abbreviation No units are
given
22
Increasing intelligibilitySelf-explanatory document name
Removed cell formatting
Removed special characters
Meaningful column titles
Defined ‘BUN’
23
Increasing assessability
Information which was asterisked is now added to
results section
Added Units column
24
Increasing re-usability
Additional information to be added to methods section or table legend
25
Increasing reproducibility
• Include any additional information needed to understand the data, methods, parameters, e.g. which instrument (make and model) was used to measure blood carbon dioxide levels?
• Include availability statements for any code that was used to view, parse or analyse the data, in support of the conclusions.
26
Reporting Guidelines
2727
What happens when data is shared well?
28
Data reuse by other researchers in the same field
28
“The Data Descriptor made it easier to use the data, for me it was critical that everything was there…all the technical details like voxel size.”
Professor Daniele Marinazzo
29
29
www.bbc.co.uk/news/science-environment-33057402
Data reuse by the non-research community
30
Data reuse by the non-research community
30
http://www.nytimes.com/interactive/2014/12/30/science/history-of-ebola-in-24-outbreaks.html
31
Data peer review at Scientific Data
Data Archive
• Checked multiple times• Scientific reasoning underlying data reviewed by active researchers• Technical validity reviewed by discipline experts
Data Citations
• Citation accuracy confirmed by specialist editor• Citation format checked by editorial team• Data linkage tested by production team
Data Peer Review
• Does not have to be onerous• Can save overall reviewing time• Results in data that is reusable and useful!