SCIENCE IN THE OPEN, WHAT DOES
IT TAKE?MELISSA HAENDEL DATA JAMBOREE MARCH 3RD, 2017
@ontowonka
THERE ARE OVER 1500 PUBLIC DATABASES IN NUCLEIC ACIDS RESEARCH DATABASE COLLECTION
https://doi.org/10.1093/nar/gkw1188
HOW MANY OF THESE ARE TRULY OPEN?
OPENNESS IS AN NAR REQUIREMENT, BUT …
OPEN DATA IS FAIR DATA
http://www.nature.com/articles/sdata201618
Findable Accessible Interoperable Reusable
ANATOMY OF FAIRNESS
Metadata
Identifiers
Registration
Preservation
Standards
McMurry et al Identifiers for the 21st century bit.ly/identifiers-revision-2017
WHAT MAKES DATA FAIR: FINDABLE
F1. (meta)data are assigned a globally unique and persistent identifier
F2. data are described with rich metadata (defined by R1 below)
F3. metadata clearly and explicitly include the identifier of the data it describes
F4. (meta)data are registered or indexed in a searchable resource
WHAT MAKES DATA FAIR: ACCESSIBLE
A1. (meta) data are retrievable by their identifier using a standardized communications protocol
A1.1 the protocol is open, free, and universally implementable
A1.2 the protocol allows for an authentication and authorization procedure, where necessary
A2. metadata are accessible, even when the data are no longer available
WHAT MAKES DATA FAIR: INTEROPERABLE
I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIR principles
I3. (meta)data include qualified references to other (meta)data
WHAT MAKES DATA FAIR: REUSABLER1. meta(data) are richly described with a
plurality of accurate and relevant attributes
R1.1. (meta)data are released with a clear and accessible data usage license
R1.2. (meta)data are associated with detailed provenance
R1.3. (meta)data meet domain-relevant community standards
LETS CALL OUT A FEW MORE THINGS.
https://zenodo.org/record/203295
Findable Accessible Interoperable Reusable
FAIR-TLC
Traceable Licensed Connected
FAIR-TLC: TRACEABILITY
T1: ProvenanceThe data’s provenance is well documented and attributed (the data within the resource)
T2: AttributionThe contributions to the content (data, tools, algorithms, sources, etc.) are clearly declared.
Documentation on how to cite a record from a source or the whole resource
FAIR-TLC: LICENSURE
http://peterdesmet.com/posts/analyzing-gbif-data-licenses.html
Not all data resources are free to use, derive, and redistribute, even if they are publicly funded and seemingly publicly available.
FAIR-TLC: CONNECTEDBECAUSE AGGREGATED != INTEGRATED
FAIR-TLC AS AN EVAL RUBRIC Room for
improvement
bit.ly/open-science-prize
Open imaging
DISCUSSION: HOW DO WE DO BETTER?Make the right thing the easy
thing:- Carrots:
- Tenure & promotion cycles- Dedicated funding for increasing
FAIR-TLC- Sticks:
- Publication requirements- Funding requirements
- Tools:- Tracking tools- Documentation tools- Social tools
FAIR-TLC = OPEN SCIENCE
Findable Accessible Interoperable Reusable
Traceable Licensed Connected
Coming soon reusabledata.org