FAIR Assessment of Research DataAnusuriya Devaraju
PANGAEA, University of Bremen
Open Science WorkshopUniversity of Ljubljana
23rd January 2020
2
How much do you know about the FAIR principles?A. Haven’t heard of itB. Heard of it, but don’t know much about itC. Know the principles, but not in great detailD. I know everything about FAIR – Champion
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Research Data
4
• Research data is any information that has been collected, observed, generated or created to validate research findings.*
• Examples of research data
*https://library.leeds.ac.uk/info/14062/research_data_management/61/research_data_management_explained icons: https://icons8.com
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
FAIR Principles
6
• The idea was emerged from the Lorentz Workshop in 2014, and its principles were published* in 2016.
• FAIR are high-level guidelines to improve the reuse of digital resources, for humans and machines.
• Digital resources, e.g., data, software, workflow….
Findable Accessible Interoperable Reusable
*https://doi.org/10.1038/sdata.2016.18 icons: https://icons8.com
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
FAIR Principles Adoption
7
• Endorsed and recommended by funders, publishers, communities, etc.
https://doi.org/10.1038/d41586-019-00075-3https://libereurope.eu/blog/2018/07/13/fai
rdataconsultation/liber-fair-data-2/
https://www.dfg.de/en/research_funding/announcements_proposals/2019/info_wissenschaft_19_37/index.html
https://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-
mgt_en.pdf
https://www.force11.org/group/fairgroup/fairprinciples
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
8
To Be Findable:F1. (meta)data are assigned a globally unique and eternally persistent identifier.
F2. data are described with rich metadata.
F3. metadata specify the data identifier.
F4. (meta)data are registered or indexed in a searchable resource.
To Be Accessible:A1. (meta)data are retrievable by their identifier using a standardized communications protocol.
A1.1. the protocol is open, free, and universally implementable.
A1.2. the protocol allows for an authentication and authorization procedure, where necessary.
A2. metadata are accessible, even when the data are no longer available.
To Be Interoperable:I1. (meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (meta)data use vocabularies that follow FAIRprinciples.
I3. (meta)data include qualified references to other (meta)data.
To Be Re-usable:R1. meta(data) have a plurality of accurate and relevant attributes.
R1.1. (meta)data are released with a clear and accessible data usage license.
R1.2. (meta)data are associated with their provenance.
R1.3. (meta)data meet domain-relevant communitystandards.
15 FAIR PrinciplesWilkinson, M., Dumontier, M., Aalbersberg, I. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data 3, 160018 (2016), doi:10.1038/sdata.2016.18
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Example (F1 Principle)
9
(Meta)data are assigned a globally unique and eternally persistent identifier.
http://dataservices.gfz-potsdam.de/panmetaworks/showshort.php?id=escidoc:4047893
http://doi.org/10.5880/fidgeo.2019.011
Resolver Service
Cite
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Example (F4 Principle)
10
(Meta)data are registered or indexed in a searchable resource.
A dataset published via PANGAEA (A Trustworthy Repository for Earth and Environmental Science Data)
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Finding Relevant & Trustworthy Repositories
11
The Registry of Research Data Repositories,https://www.re3data.org/
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
FAIR Data Assessment
13
• FAIR principles are open to various interpretations.
• Metrics (also called indicators), and associated tools have been developed.
To Be Findable:F1. (meta)data are assigned a globally unique and
eternally persistent identifier.F2. data are described with rich metadata.F3. metadata specify the data identifier.F4. (meta)data are registered or indexed in a
searchable resource. http://oznome.csiro.au/5star/
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
FAIR Assessment
14
• For a comparison of existing FAIR assessment tools, see Christophe Bahim, Makx Dekkers, Brecht Wyns (2019). Results of an Analysis of Existing FAIR assessment tools. Research Data Alliance. DOI: 10.15497/RDA00035
• Assessment mechanisms• Manual self-assessment (survey,
checklist)• Automatic assessment• A combination of above
FAIRassist, https://fairassist.org/
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
From Principles to Practice
15
• FAIR principles are not strict measures, but rather ‘fundamental concepts’ which should be further developed and implemented.
• Be inclusive – disciplinary practices are heterogeneous (& valid).
• Who – Identify and keep the stakeholders in the loop!
• When – Integrate FAIR assessment as part of research data lifecycle• FAIR assessment mechanisms
• Manual assessment (e.g., through data authors and stewards) provides insightful analysis, but is not practically feasible.
• Automatic assessment saves costs, but at present not all components of the research data ecosystem are machine-friendly.
• Technical, non-technical (organization, education) solutions enabling FAIR data are important.• FAIR Data vs. Open Data
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
FAIRsFAIR - Fostering Fair Data Practices in Europe
17
• Aims to supply practical solutions for the use of the FAIR data principles throughout the research data life cycle.
• Budget: €10 million• Start date : 1st March 2019 • Duration: 36 months• 22 partners from 8 member
states• Project coordinator : DANS-
KNAW
https://www.fairsfair.eu
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Work Package 4: FAIR Certification
18
• FAIR assessment of data objects in trustworthy repositories• Use case-based iterative approach building on existing work on FAIR data
assessment.
• FAIR-enabling repositories• Alignment of FAIR with existing CoreTrustSeal core repository certification
requirements.
• European network of Trustworthy Digital Repositories (TDRs)• Collaborate with external repositories through the FAIRsFAIR Open Calls• FAIRsFAIR provides support for achieving CoreTrustSeal certification and
improving data FAIR.
Task 4.5
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Task 4.5 (FAIR Assessment of Data Objects)
19
1. Identify use cases (& stakeholders involved)
2. Develop requirements (including metrics) to assess the FAIRnessof a data object in the context of the selected use cases.
3. Implement a toolset that enables the stakeholders to evaluate the FAIRness of their data objects.
FAIR stakeholders; figure is derived from 8.3 Stakeholder Groups Assigned Actions (European Commission Expert Group on FAIR Data, 2018). Dotted
lines represent the stakeholders of the FAIRsFAIR Task 4.5 use cases.
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Use Cases and Stakeholders
20
Use Case 1 Use Case 2
Evaluator Researchers Trustworthy Data Repositories
Description Researchers self-assess their data objects manually to check certain aspects of objects FAIRness before depositing the objects in a repository.
A trustworthy data repository automatically assess its published datasets for their level of FAIRness.
Repository-of-interest DANS EASY PANGAEA
Repository domain(s) humanities, health sciences, social and behavioural sciences, oral history and spatial sciences.
Earth and environmental sciences
Data Publication
Before After
Image: Michael Thompson, Noun Project
ResearchersFAIR awareness and
education
Trustworthy Data Repositories
FAIR data assessment of published datasets
Image: Parkjisun, Noun Project
Pilot testing will cover 3 additional repositories, e.g., repositories
selected through the FAIRsFAIR Open Calls
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
FAIRsFAIR Data Assessment Metrics
21
• There are 13 metrics built on existing work.
• RDA FAIR Data Maturity Model
• DANS Fairdat/FAIREnough• WDS/RDA Assessment of
Data Fitness
• Iteratively improve and extend the metrics through a number of pilot tests.
FAIR Principles FAIRsFAIR Metric Identifier and Short
Name
FAIRsFAIR Metric and Maturity Levels
F1 FsF-F1-01DUniversally Unique Identifier
Does the data have a universally unique identifier assigned?• Yes• No
F1 FsF-F1-02DPersistent Identifier
Does the data have a persistent identifier assigned?• Yes• No
F2 FsF-F2-01MDescriptive Metadata
Are metadata elements to support data citation and discovery provided (e.g., creator, title, data identifier, publisher, title, creator, publication date/year, summary/keywords describing the data)?• Not provided • Partially provided• Completely provided
F3 FsF-F3-01MInclusion of data identifier in metadata
Does the metadata include the data identifier?• Yes• No
F4 FsF-F4-01MSearchable metadata
Is the metadata offered in such a way that it can be harvested?• Metadata is not offered• Metadata is offered through a metadata registry, e.g., general-purpose, domain/discipline specific or institutional registries• Metadata is offered as structured data on the data page for use by a web search engine
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Toolset Implementation
22
* Wilkinson, Mark D., Michel Dumontier, Susanna-Assunta Sansone, Luiz OlavoBonino da Silva Santos, Mario Prieto, Dominique Batista, Peter McQuilton, et al. 2019.
‘Evaluating FAIR Maturity through a Scalable, Automated, Community-Governed Framework’. Scientific Data 6 (1): 1–12. https://doi.org/10.1038/s41597-019-0184-5
Use Case 1 (Researchers)prototype available at
https://satifyd.dans.knaw.nl/
Use Case 2 (Data Repositories)Adapt and reuse existing
automatic assessment tool* to evaluate published data objects.
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Conclusions
23
• Use-case driven approach• Iterative implementation and
testing (through pilots)• Adapt and reuse
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
Conclusions
24
• Feb 2020 - D4.1: Draft recommendations on requirements for FAIR data objects in trustworthy data repositories
• Aug 2020 – M4.9 Pilots of FAIR data assessment (DANS EASY, PANGAEA)
• Aug 2021 – D4.5 FAIR Assessment of data objects in 5 repositories implemented and tested.
If you would like to participate in pilot testing, please contact [email protected]
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.
FAIRsFAIR Upcoming Activities
25
• Open Call for FAIR Champions (apply by 31st January 2020), https://www.fairsfair.eu/form/open-call-european-fair-champions
• We value your feedback!• FAIRsFAIR Landscape Analysis & Competence Centres outputs,
https://www.fairsfair.eu/fairsfair-landscape-analysis-competence-centres-outputs
• FAIRsFAIR workshop at IDCC2020 (register by 31st January 2020), https://www.fairsfair.eu/events/ten-things-you-can-do-support-fair-data-culture-fairsfair-workshop-idcc20
• FAIRsFAIR webinar on requirements for persistence and interoperability, 11th
February 11:00 AM CET, https://www.fairsfair.eu/persistence-and-interoperability-fair-research-data-management
• For more related events and outreach activities, please visit https://www.fairsfair.eu/
Devaraju, A., FAIR Assessment of Research Data, Open Science Workshop 2020.