FAIR data overview

Post on 12-Jan-2017

652 views 0 download

transcript

FAIR DATA OVERVIEW

Luiz Olavo Bonino - luiz.bonino@dtls.nl

SUMMARY What is FAIR data?

The FAIR ecosystem

Plans and how to realise

(FAIR) DATA STEWARDSHIP

DATA STEWARDSHIP Combination of all expertise to treat data well in

a project:■ Experiment design and data-design;■ Re-use of existing data where possible;■ Planning of the storage, networking and

computing infrastructure;■ Data acquisition and processing;■ Data publishing in a format that allows functional

interlinking of data(sets) as well as in a format suitable for long-term preservation.

FAIR DATA STEWARDSHIP Combination of all expertise to treat data well in

a project:■ Experiment design and data-design;■ Re-use of existing data where possible;■ Planning of the storage, networking and computing

infrastructure;■ Data acquisition and processing;■ Data publishing in a format that allows functional

interlinking of data(sets) as well as in a format suitable for long-term preservation.

FAIR Data

DATA STEWARDSHIP – PROCESS VIEW

DATA STEWARDSHIP – SUSTAINABILITY VIEW

Produces Consumes

Produces Consumes

storage

sustainability

maintenance

license

privacy security

stewardship

access

?

Produces Consumes

RDFMIAPE

DBMS Excel

APISQL SPARQL Metadat

a

DICOM

MIRIAMSemantics

Produces Consumes

access

find

query

format

licenseintegrate

WHAT IS FAIR DATA?FAIR Data aims to support existing communities in their attempts to enable valuable scientific data and knowledge to be published and utilised in a ‘FAIR’ manner.

Findable - (meta)data is uniquely and persistently identifiable. Should have basic machine readable descriptive metadata.

Accessible - data is reachable and accessible by humans and machines using standard formats and protocols.

Interoperable - (meta)data is machine readable and annotated with resolvable vocabularies/ontologies.

Reusable - (meta)data is sufficiently well-described to allow (semi)automated integration with other compatible data sources.

THE FAIR ECOSYSTEM

FAIR Data Principles

FAIR Data Protocol

FAIR Data Resources

FAIR Data Core Technologies

FAIR Data Systems/Tools

Normative

Artefact

Software

FAIR ECOSYSTEM - NORMATIVE LEVEL FAIR Data Principles - general principles guiding

FAIR data solutions;

FAIR Data Protocol - complying with the FAIR Data Principles, provide guidelines for implementing FAIR data solutions, e.g., standards, APIs, technologies, …;

FAIR DATA PROTOCOL

Findable - standards for describing the dataset with the relevant metadata;

Accessible - standards for represent and access the data according to the defined usage license;

Interoperable - standards for machine readable descriptions of the (meta)data and (semantic)annotation;

Reusable - standards for semantic annotation of the (meta)data supporting machine reasoning, and standards for defining data provenance and support citation;

The standards include technologies (e.g., RDF, nano pub, JSON, OWL, etc.) as well as protocols and APIs.

FAIR ECOSYSTEM - ARTEFACT LEVEL FAIR Data Resource - datasets expressed using

one of the prescribed standards of the FAIR Data Protocol and with metadata complying with the protocol.

Annotation Ontology - reference conceptual model used to provide semantics to elements of FAIR Data Resources through annotation.

Controlled vocabularies, dictionaries, etc.

FAIR DATA RESOURCEDatasets expressed using one of the prescribed standards of the FAIR Data Protocol, with metadata complying with the protocol and license. The original dataset is transformed into a FAIR format and proper metadata and license are added to produce a FAIR Data Resource. The original and the FAIR version can co-exist, each one fulfilling its own purpose.

FAIR Conversion

FAIR Data Resource

FAIR DATA RESOURCE

Data Creation

FAIR Data Resource

FAIR Data Creation

FAIR Data Resource

SHARING DATAI would like to exploit common genotype-phenotype relations between Alzheimer’s

Disease and Huntington’s Disease…I need to combine AD and HD data…

I can help with that!I can help with that!

Source: Marcos Roos

SHARING DATA

Source: Marcos Roos

???

Here’s my data, have

fun!Here’s my data, have

fun!

SHARING LINKABLE DATA

Source: Marcos Roos

I can go straight to answering my questions with data from multiple data owners!

Patients will be so pleased with this speed-up!

Here’s my Linked Data,

have fun!Here’s my

Linked Data, have fun!

Raw data(many formats)

Raw data(many formats)

Processed data(primary storage format)

Initial transformation

Raw data(many formats)

Processed data(primary storage format)

ProvenanceInitial transformation

Raw data(many formats)

Processed data(primary storage format)

FAIR transformation

FAIR (meta)data(RDF,XML etc.)

ProvenanceInitial transformation

Raw data(many formats)

Processed data(primary storage format)

FAIR transformation

FAIR (meta)data(RDF,XML etc.)

ProvenanceInitial transformation

Raw data(many formats)

FAIR download(in local format)

Processed data(primary storage format)

FAIR transformation

FAIR (meta)data(RDF,XML etc.)

ProvenanceInitial transformation

Raw data(many formats)

FAIR download(in local format)

Processed data(primary storage format)

FAIR transformation

FAIR (meta)data(RDF,XML etc.)

High-PerformanceAnalysis

ProvenanceInitial transformation

Analysis transformation

FAIR DATA APPLICATION ECOSYSTEM (NL APPROACH)

FAIR DATA RESOURCE

FAIR transformation

FAIR Data Resource

BRING YOUR OWN DATA - BYOD Goals:

■ Learn how to make data linkable “hands-on” with experts

■ Create a “telling story” to demonstrate its use

Composition:■ Data owners – specialists on given datasets■ Data interoperability experts■ Domain experts

Source: Marcos Roos

BYOD

FAIRIFIER

FAIRIFIER

FAIR DATA MODEL REGISTRY

FAIRIFIER AND FAIR DATA MODEL REGISTRY

A particular class of FAIR Data System that provides access to published datasets. The datasets can be external or internal to the FAIR Data Point. Also, the source data can be a regular (non-FAIR) dataset or a FAIR Data Resource. If the source data is non-FAIR, the FAIR Data Point needs to made the necessary FAIR transformations on the fly.

FAIR DATA POINT

A particular class of FAIR Data System to provide support for data interoperability;

Supports publication and access to FAIR data. Fosters an ecosystems of applications and

services; Federated architecture: different FAIRports

(and other FAIR Data Systems) are interconnectable;

Supports citations of datasets and data items; Provides metrics for data usage and citation;

FAIR DATA PUBLICATION

FAIR DATA ACCESS

DISTRIBUTED ARCHITECTURE

F A IR

FAIRPORT ECOSYSTEM

FAIRPORT

WORK ORGANISATION (NL APPROACH)

HOW TO REALISEDTL

National FAIR data engineering team

BBMRI

2.0

Elixir

OtherProject

ODE

X4A

LL

ENPADASI

TRAI

TEA

TRIS

FAIR Data IG(P.I.s with

data)

Data SACG.M., B.M., M.G.

Data Core Team

FAIR Data Executive TeamL.B. (CTO), R.H., P.B., M.S., J.B., J.W.

engineers in theFAIR Data virtual team

Core FAIR

TechnologyFAIR Data V.T.

FAIR Data V.T.

FAIR Data V.T.

FAIR Data V.T.

FAIR Data V.T.

FAIR Data V.T.

Local “DTLs”/ projectsprojec

ts

QUESTIONS?