FAIRer Research

Post on 12-Aug-2015

110 views 0 download

Tags:

transcript

FAIRer ResearchProfessor Carole Goble CBE FREng FBCS

The University of Manchester, UKcarole.goble@manchester.ac.uk

STM Conference, London, 3rd Dec 2014

“An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment, [the complete data] and the complete set of instructions which generated the figures.” David Donoho, “Wavelab and Reproducible Research,” 1995

datasetsdata collectionsstandard operating proceduressoftwarealgorithmsconfigurationstools and appscodesworkflowsscriptscode librariesservices,system software infrastructure, compilershardware

Systems Biology

Systems Biology

Modelling Cycle

45 organisations 112 organisations 37 organisations

http://www.seek4science.org

Aggregated Commonsshare and interlinking methods, models, data,

samples…multi-stewardship, multi-disciplinary, mixed

Standards

DCATFOAF

Data

Models

Articles

ExternalDatabases

Metadata

YellowPages

Investigations

AssaysStudies

Towards Interoperable Bioscience Data, Nature Genetics, 2012

Standards, Structure, Interlink

Just Enough Results Model for things produced and used in experiments

http://www.fair-dom.org

Findable, Accessible, Interoperable, Reusable

Data, SOPs, Models, Methods

Multi-tenant CommonsPlatform

http://datafairport.org

Data discovery

Data assembly, cleaning, and refinement

Modeling

Statistical analysis

Data collection

InsightsInsights Scholarly Communication & Reporting

Scholarly Communication & Reporting

Material & Methods

BioSTIF

instruments and laboratory

Data discovery

Data assembly, cleaning, and refinement

Modeling

Statistical analysis

Data collection

InsightsInsights Scholarly Communication & Reporting

Scholarly Communication & Reporting

Material & Methods

Workflow Commons

"Mapping present and future predicted distribution patterns for a meso-grazer guild in the Baltic Sea" by Sonja Leidenberger et al

• 35 kinds of annotations• 5 Main Workflows• 14 Nested Workflows• 11 Configuration files• 25 Scripts• 10 Software dependencies • 1 Workflow management

system• 1 Web Service • Dataset: 90 galaxies observed

in 3 bands

José Enrique Ruiz (IAA-CSIC)

Galaxy Luminosity Profiling

Dependencies

Components

Rinse and Repeat Research

• Sweep Datasets

• Sweep Variables

• Sweep Steps

SHARING SENSITIVITYSENSITIVE SHARING

IMPLICATIONS FOR METRICSBEING FAIR

scientific ego-system for open sciencetrust, reciprocity, competition

famecompetitiveadvantage

productivitycredit

adoption kudos

for love

blamescooped uncredited misinterpretation scrutinyshameinsecuritycost/time/skillsdistractionresponsibilitydisruption staff churninertia

Howard Ratner, STM Innovations Seminar 2012was: Chair STM Future Labs Committee, CEO EVP Nature Publishing Group,

now: Director of Development for CHORUS (Clearinghouse for the Open Research of US)

http://www.youtube.com/watch?v=p-W4iLjLTrQ&list=PLC44A300051D052E5

http://www.myexperiment.org/packs/196.html

http://www.researchobject.org/

Outputs are first class citizens to be managed, credited and tracked: data, software

A Framework to Bundle and Relate multi-hosted (digital) resources of a scientific experiment or investigation using standard mechanisms & uniform access protocols. Carriers of Research Context

Research Objects

What is the RO Framework?

• A framework of models and conventions

• Representations

• API specifications

• Implementations mapped into legacy / commodity platforms

Knowledge TurnsUnit of Scholarly Currency, RO CommonsCirculate in the Scholarly EcosystemCitation? Credit? Link to Publishers?

Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013

Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013

Collaboration to support safe use of patient and research data for medical research

Farr CommonsResearch Object packages codes, study, and metadata to exchange coded descriptions of clinical study cohorts

Knowledge TurnsUnit of Scholarly Currency, RO CommonsCirculate in the Scholarly EcosystemCitation? Credit? Link to Publishers?arch, Discover, Index, Harvest, Port

Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012

Goble, De Roure, Bechhofer, Accelerating Knowledge Turns, I3CK, 2013

Profile FocusBody of knowledge around methods, workflows, software, data, person, rather than publication.Citation, credit

Release ResearchEvolution, Emergence, Discourse, ThreadedComparison, Historical review, Anti-SalamiForks, Merges, Fixivity, Citation? Credit?Flow across groups, projects and articles

Schopf, Treating Data Like Software: A Case for Production Quality Data, JCDL 2012

Reproduce ResearchRepeat, Replicate, Recompute, Reuse….Entropy, Citation? Credit?

icanhascheezburger.com

Zhao, et al . Why workflows break - Understanding and combating decay in Taverna workflows, 8th Intl Conf e-Science 2012

Can I repeat & defend my results?

Can I review, reproduce and compare my results/method with your results/method?

Can I review, replicate and certify

your results?

Can I transfer your results into my

research and reuse this method?

Hettne et al Structuring research methods and data with the research object model: genomics workflows as a case study 2014 http://www.jbiomedsem.com/content/pdf/2041-1480-5-41.pdf

ReproduceRepeat, Replicate, Recompute, Reuse….Entropy, Citation? Credit?

icanhascheezburger.com

Checklists aka Minimum Information Models, Reporting GuidelinesMinim Checklist Ontology, http://purl.org/net/mim/ns

Zhao et. al. A Checklist-Based Approach for Quality Assessment of Scientific Information 3rd In. Workshop on Linked Science, 2013

Hettne et al Structuring research methods and data with the research object model: genomics workflows as a case study 2014 http://www.jbiomedsem.com/content/pdf/2041-1480-5-41.pdf

Checklists

Versio

nin

gPro

venance

Dependencies

Progressive

MetadataProfiles

Depth: how deeply described

Coverage: how much is covered.

More specialised detail, fewer

services

More Stakeholders & ServicesCitation minimum

LibraryPublishers

Experiments

Science

PROVPAVVoID GIT

PAVNISO-JATS

Docker

DC

EXPO, ISA, JERM, OBI

MIAME, SBML, SED-ML

wfdesc

MIM Ontology

wfprov

VIVO-ISF

PID

Standards

Machine-processable

Technology Independent

Multi-platform

Incremental

W3C OADM

DOIs

URIsHandles

ORCID

OAI-ORE

RRIDs

host

service

Open Source/Store

Sci as a Service

Integrative fws

Virtual Machines

Portable Packaging

ReproZip

Workflows,makefilesProvStore

OMEX archive

bundle

Nanopub: represents structured data along with its provenance in a single publishable and citable entry

Galaxy workflows: re-enact the analysis

Research Object: aggregates the (digital) resources contributing to findings of (computational) research (results, data and software) as citable compound digital objects

http://isa-tools.github.io/soapdenovo2/http://sandbox.wf4ever-project.org/portal/ro?ro=http://sandbox.wf4ever-project.org/rodl/ROs/SOAP2denovo2-Aureus/

[Alejandra Gonzalez-BeltranPhilippe Rocca-Serra]

• Id & Cite fluid things

• Uniform handling 1st class citizens

• Compound, multi-authored

• Mixed, leaky containers

• Span outcomes, evolve outputs, emergence

• Profiles• Bridge

researchers, platforms, resources

Bechhofer, Why linked data is not enough for scientists, DOI: 10.1016/j.future.2011.08.004

[Norman Morrison]

Focus on Personal

Productivity not Public

Good

Auto-magical

Stealthy not Sneakyreduce the frictioninstrumentation

Training Time

From made RO to born RO

• Open research is like Open software• Multi-part, multi-contributor, updating• Tardis & Commons • Implications for metrics? publishing?• Learning from open software development

http://www.force11.org

• Barend Mons• Sean Bechhofer• Philip Bourne• Matthew Gamble• Raul Palma• Jun Zhao• Alan Williams• Stian Soiland-Reyes• Paul Groth• Tim Clark• Juliana Freire• Alejandra Gonzalez-Beltran• Philippe Rocca-Serra• Ian Cottam• Susanna Sansone• James Howison• James Herbsleb• Kristian Garza

All the members of the Wf4Ever teamiSOCO: Intelligent Software Components S.A., SpainUniversity of Manchester, School of Computer Science, Manchester, United KingdomUniversity of Oxford, Department of Zoology, Oxford, UKPoznan Supercomputing and Networking Center. Poznan, PolandIAA: Instituto de Astrofísica de Andalucía, Granada, SpainLeiden University Medical Centre, Centre for Human and Clinical Genetics, The Netherlands

Colleagues in Manchester’s Information Management GroupRO Advisory Board Members

http://www.researchobject.orghttp://www.wf4ever-project.orghttp://www.fair-dom.orghttp://www.datafairport.org