Research data management : Open Research Data pilot, data management (plans), FAIR data, data...

Date post: 11-Apr-2017
Research data management Open Research Data pilot, data management (plans), FAIR data, data repositories, metadata Everlasting project, General assembly TU/e, 21-03-2017

Introducing myself, IEC/Library and RDM Programme of TUe


1. Horizon 2020: Open Research Data pilot 2. Requirements Open Research Data Pilot

+ Data management plan and FAIR data+ Depositing research data and 4TU.Centre of Research Data

3. Metadata

Horizon 2020guiding principles for research data management

1. Scientific integrity+ Traceability of research results, from the figure in a paper to the

underlying raw data2. Reuse

+ Build on previous results (data-driven science)+ Encourage collaboration/avoid duplication of effort (collaborative

science)+ Innovation/progress to market+ Involve citizens and society

Ideological shift: from trust to responsibility and accountability

These guiding principles are mentioned in “Guidelines on open access to scientific publications and research data in Horizon 2020” (p.. 5). They come back in the open research data pilot of Horizon 2020

Horizon 2020the Open Research Data (ORD) pilot

“The ORD pilot aims to improve and maximize access to and re-use of research data generated by Horizon 2020 projects…”

“The ORD pilot applies primarily to the data needed to validate the results presented in scientific publications.”

“Good research data management is not a goal in itself, but rather the key conduit leading to knowledge discovery and innovation…”

Some quotes from “Guidelines on FAIR data management in Horizon 2020” (p. 3) to illustrate the two just mentioned general guiding principles of the ORD pilot of Horizon 2020

ORD pilotis participation mandatory?

Participation is the default option. However, with opt-out possibilities at any stage, including after signature of the agreement

Participation is encouraged! See also: “Guidelines on open access to scientific publications and research data in Horizon 2020”, p. 8

ORD pilotis open access of data the goal?

The ORD pilot follows the principle “as open as possible, as closed as necessary”.“…the need to balance openness and protection of scientific information, commercialisation and Intellectual Property Rights (IPR), privacy concerns, security…”is recognized

So, no, making your research data open access available is not the aim of the ORD pilot. It’s not about open access but about reuse of research data. See also: “Guidelines on open access to scientific publications and research data in Horizon 2020”, p. 8

ORD pilotwhich data should be made available?

The ORD pilot applies primarily to:

“the ‘underlying data’ (the data needed to validate the results presented in scientific publications), including the associated metadata (i.e. metadata describing the research data deposited).”

Other data can also be provided.

See: “Guidelines on open access to scientific publications and research data in Horizon 2020”, p. 9

ORD pilotare costs eligible for refund?

“Costs related to open access to research data (…) are eligible for reimbursement during the duration of the project…”

In practice it will be less trivial how funder will handle this. Also the costs of for example an e-science expert or data steward…?

ORD pilotgood research data management

Besides reuse of research data, the ORD pilot also focuses on good data management!Good data management prepares for reuse or, reuse implies data management

“… participating in the ORD pilot does not necessarily mean opening up all your research data. Rather, the focus of the Pilot is on encouraging good data management as an essential element of research best practice.”

Op course the two – reuse and good data management – are connected. Good data management prepares for reuse; reuse implies data management.

ORD pilotFAIR principles

Good research data management is data management following the FAIR principles.

Research data should be Findable, Accessible, Interoperable and Reusable.

I’ll come back in detail on the FAIR principles The conditions set by Horizon 2020 with regard to research data management, come down to two requirements: (next slide)

ORD pilotrequirements

The conditions set by Horizon 2020 with regard to research data management, come down to two requirements:1. Formulate a data management plan;2. Deposit research data.

Single DMP for your project to cover its overall approach. However, where there are specific issues for individual datasets (e.g. regarding openness), you shoulds clearly spell this out

Data management plan

The data management plan provides information on the handling of research data during and after the end of the project, with the FAIR principles in mind: The data collection (newly generated data versus pre-existing data,

file formats, special tools needed, data size) Data storage and back-up (storage media, safe and secure storage) Data documentation (metadata); Whether, how and what data will be shared/made open access

during and after the project; Data preservation and archiving after the project

“Once a project has had its funding approved and has started, you must submit a first version of your DMP (as a deliverable) within the first six months of the project.”

“A data management plan (DMP) is required for all projects participating in the extended ORD pilot…” Participating projects will be required to develop a Data Management Plan (DMP), in which they will specify what data will be open: detailing what data the project will generate, whether and how it will be exploited or made accessible for verification and re-use, and how it will be curated and preserved. The DMP needs to be updated over the course of the project whenever significant changes arise, such as (but not limited to): New data Changes in consortium policies (new innovation potential, decision to file for a patent) Changes in consortium composition and external factors (new mebers joining or old members leaving) The DMP should be updated as a minimum in time with the periodic evaluation/assessment of the project A DMP is a deliverable, within the first six months of the project! Single DMP for your project to cover its overall approach. However, where there are specific issues for individual datasets (e.g. regarding openness), you should clearly spell this out. A Data Management Plan (DMP) provides information on: The data the research will generate How to ensure its curation, preservation and sustainability What parts of that data will be open (and how) Experience is that researchers have trouble understanding the last 2 bullets. Making data available via a project website is not archiving or preserving data Data sharing during the project, for project partners data sharing platforms. Dropbox is not recommended when sensitive/confidential data are at stake. Data sharing after the project: can be the same as archiving data after the project. Details per data set. Data documentation costs a lot of time. So, researchers are reluctant to do this well Data storage: don’t mention USB drives, etc. Try to use the institutional ICT infrastructure as much as possible. Secure storage is about access control. Who has access to the data during the project? Data collection: can be a lot of work because different partners with their data sets are involved

Data management planFAIR principles

Findable: easy to find by both humans and computer systemsData are assigned a DOI after research and described by rich metadata; naming conventions are used during research, versioning;

Accessible: easy to be obtained by humans and computersAccess to data (who, where, how long), storage during and archiving after project, can data be made open access?

Interoperable: easy to be combined with other data sets by humans and computers;Data-exchange between researchers, institutions, machines; are standard metadata and vocabularies used, open data formats?

Reusable: easy to be used for future research and to be processed further by humans and using computational methodsData quality and provenance; licenses added to data (who can use the data under which conditions)

It’s still unclear how to turn each of these components into reality!

FAIR data is what Horizon 2020 wants! However, it’s still unclear how to turn each of these components into reality. It’s still a pilot! DISCUSSION about the FAIR principles from a researchers point of view

DMP template Horizon 2020 (via DMPOnline): recommended but voluntary

DMP template by 4TU.Centre of Research Data Examples of H2020 DMPs:


Data management plantemplates

Deposit research data

‘Underlying data’ of a scholarly paper, including the associated metadata needed;

Preferably in a research data repository; Take measures to enable others to access, exploit, reproduce

and disseminate the deposited data; Provide information via the chosen repository about the tools

that are needed to validate the results.

Deposit research data4TU.Centre for Research Data #1

4TU.Centre for Research Data is for static data (‘frozen’ data sets, ‘milestone’ data sets) after the project has ended.

Deposit research data4TU.Centre for Research Data #1

Deposit research data4TU.Centre for Research Data #2

4TU.Centre for Research Data

With 4TU.centre for Research Data you're half way! However, is the glass half full or half empty

Deposit research data4TU.Centre for Research Data #2

Demo 4TU.Centre for Research Data

Research data managementDocumentation and metadata #1

Enhancing the re-usability and interchangeability of your measurement data:1. by adding a readme-file with data specific information on:

the size of the data set, what’s included and excluded; the provenance of the data (how you collected the data and

data manipulation steps); the parameters/variables used (how each was measured),

measurement units, codes/symbols used, etc.

2. by using the same metadata scheme

DISCUSSION Metadata scheme What elements or fields should be included? Labels of each element Data type of each element? Elements: mandatory, conditional or optional? Number of instances of each element?

Research data managementDocumentation and metadata #2

Battery data set: https://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-data-repository/#battery

Battery data metadata scheme: http://mcscience.com/home/mcscience-web-logos/experimentors/modenjay-experiment-platform/test-recipe-3-metadata/

DISCUSSION Is this useful? International Battery Seminar: http://www.internationalbatteryseminar.com/battery-research/

RDM desk: [email protected]

DMP support: Sjef Öllers: [email protected] Leon Osinski: [email protected] Website Data Coach: http://www.tue.nl/datacoach (with

information on funder policies, soon RDM programmewebsite)


Henri Rzepa: https://youtu.be/Ae205CNrk6w

1. Horizon 2020 participant portal online manual: open access and data management: http://ec.europa.eu/research/participants/docs/h2020-funding-guide/cross-cutting-issues/open-access-dissemination_en.htm

2. Horizon 2020 Guidelines on FAIR data management (version 3.0, 26-07-2016): http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf

3. Horizon 2020 Guidelines on open access to scientific publications and research data (version 3.1, 25-08-2016): http://ec.europa.eu/research/participants/data/ref/h2020/grants_manual/hi/oa_pilot/h2020-hi-oa-pilot-guide_en.pdf

4. Expert group on turning FAIR data into reality: http://ec.europa.eu/transparency/regexpert/index.cfm?do=groupDetail.groupDetail&groupID=3464&NewSearch=1&NewSearch=1

5. Data management plan template Horizon 2020: https://dmponline.dcc.ac.uk/6. Data management plan template 4TU.Centre for Research Data: http://researchdata.4tu.nl/en/planning-

research/data-management-plan/7. Examples of H2020 DMPs: http://www.dcc.ac.uk/resources/data-management-plans/guidance-examples8. 4TU.Centre of Research Data: http://data.4tu.nl9. Paper on FAIR data principles: http://dx.doi.org/10.3233/ISU-17082410. Battery data set: https://ti.arc.nasa.gov/tech/dash/pcoe/prognostic-data-repository/#battery11. Battery data metadata scheme: http://mcscience.com/home/mcscience-web-logos/experimentors/modenjay-

experiment-platform/test-recipe-3-metadata/12. TU/e Data Coach: http://www.tue.nl/datacoach

URL’s of mentioned and important webpages
