Date post: | 19-Jul-2015 |
Category: |
Data & Analytics |
Upload: | alejandra-gonzalez-beltran |
View: | 195 times |
Download: | 0 times |
Metadata challenges of reproducible research and re-usable data
BioSharing, ISA and STATO examples
Alejandra González-Beltrán, PhD Oxford e-Research Centre, University of [email protected] @alegonbel
OpenData & Reproducibility workshop: the Good Scientist in the Open Science era
21st April 2015 British Ecological Society, UK
Reproducible & Reusable Bioscience Research
Well-‐annotated & Structured Data
reasoning
analysis
exchange
integration
visualization
browsingretrieval
Community Standards Software Tools
Reproducible & Reusable Bioscience Research
Well-‐annotated & Structured Data
reasoning
analysis
exchange
integration
visualization
browsingretrieval
Community Standards Software Tools
A community mobilization to develop standards, e.g.:
! Structural and operational differences • organization types (open, close to members, society, WG etc.) • standards development (how to formulate, conduct and maintain) • adoption, uptake, outreach (link to journals, funders and commercial sector) • funds (sponsors, memberships, grants, volunteering)
de jure de facto
grass-roots groups
standard organizations
Nanotechnology Working Group
Types of reporting standards
Nanotechnology Working Group
Including minimum information reporting requirements, or checklists to report the same core, essential information
Including controlled vocabularies, taxonomies, thesauri, ontologies etc. to use the same word and refer to the same ‘thing’
Including conceptual model, conceptual schema from which an exchange format is derived to allow data to flow from one system to another
A web-based, curated and searchable registry ensuring that standards and databases are registered, informative and discoverable; also
monitoring the development and evolution of standards, their use in databases and the adoption of both in data policies.
Launched Jan 2011
Researchers, developers and curators lack support and guidance on how to best navigate and select content standards, understand their maturity, or find databases that implement them;
Funders, journals and librarians do not have enough information to make informed decisions on which content standards or database to recommended in policies, or funded or implemented
Goal: assist stakeholders to make informed decisions
The International Conference on Systems Biology (ICSB), 22-28 August, 2008 Susanna-Assunta Sansone www.ebi.ac.uk/net-project
Core functionalities: • search and filtering, e.g. by
funder • submissions forms to add
new records • “claim” functionality of
existing records • person’s profile (as
maintainer of records) associated to the ORCID profile (for credit, as incentive)
• visualization and views of content
Search, filter, submit, claim, view and more
14
) infrastructureThe Investigation/Study/Assay (
generic format for experimental description and data exchange
open source software toolscommunity engagement
investigation
assay(s) assay(s)
data data
external files in native or other for-
mats
pointers to data file names/location
investigationhigh level concept to link related studies
studythe central unit, containing information on the subject under study, its characteristics and any treatments applied.a study has associated assays
assaytest performed either on material taken from the sub-ject or on the whole initial subject, which produce quali-tative or quantitative meas-urements (data)
• environmental health• environmental genomics• metabolomics• metagenomics• nanotechnology• proteomics
• stem cell discovery• system biology• transcriptomics• toxicogenomics• communities
working to build a library of cellular signatures
investigation
assay(s) assay(s)
data data
external files in native or other for-
mats
pointers to data file names/location
investigationhigh level concept to link related studies
studythe central unit, containing information on the subject under study, its characteristics and any treatments applied.a study has associated assays
assaytest performed either on material taken from the sub-ject or on the whole initial subject, which produce quali-tative or quantitative meas-urements (data)
• environmental health• environmental genomics• metabolomics• metagenomics• nanotechnology• proteomics
• stem cell discovery• system biology• transcriptomics• toxicogenomics• communities
working to build a library of cellular signatures
The experimental plan
experimental design!sample characteristic(s)!
experimental variable(s)!
2-week systemic rat study using male Wistar rats (N=15 per dose group)
14 proprietary drug candidates from participating companies and 2 reference toxic compounds
InnoMed PredTox Project
The experimental plan
experimental design!sample characteristic(s)!
experimental variable(s)!
technology(s)!measurement(s)!protocols(s)!data file(s)!…!
http://www.nature.com/search?journal=sdata&q=ecology
http://www.nature.com/articles/sdata201513
http://www.nature.com/articles/sdata20158
24
http://isa-tools.github.io/stato/
• General-purpose statistics ontology (formal logic-based representation)
• Coverage for processes (e.g. statistical tests and their condition of application) and information needed or resulting from statistical methods (e.g. probability distributions, variable, spread and variation metrics)
• STATO also benefits from: (i) extensive documentation with the provision of textual and formal definitions; (ii) an associated R code snippets using the dedicated R-command metadata tag, aiming at facilitating teaching and learning while relying of the popular R language; (iii) query examples documentation, highlighting how the ontology can be harnessed for reviewers/tutors/student alike.
Developed in collaboration with Dr Burke, Senior Statistician, Nuffield Department of Population Health, University of Oxford
Reproducible & Reusable Bioscience Research
Well-‐annotated & Structured Data
reasoning
analysis
exchange
integration
visualization
browsingretrieval
Community Standards Software Tools
Questions?You can email us...
View our bloghttp://isatools.wordpress.com
Follow us on Twitter@isatools
View our websites
View our Git repo & contributehttp://github.com/ISA-tools
Thanks for your attention!