Date post: | 26-Jan-2015 |
Category: |
Technology |
Upload: | alejandra-gonzalez-beltran |
View: | 103 times |
Download: | 0 times |
Alejandra González-Beltrán, Ph.D
University of Oxford e-Research Centre, UK
From experimental planning to data publication: the ISA infrastructureand case studies in toxicology
OpenTox Europe - Mainz, Germany - 30th September, 2013
1
2
The data workflow
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
3
The data workflow
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
metadata
metadata
metadata
metadata
metadata
metadata
metadata tracking infrastructure
4
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
Use existing data
Perform new experiment
metadata
metadata
metadata
metadata
metadata
metadata
Traceability
Assessm
ent
Accountability
Evidence
Reusability
Reproducibility
Storage
Mining
Provenance
5
semantics
structure
6
semantics
structure
investigationstudyassay
7
8
infrastructureThe
generic format for experimental description and data exchange
open source software toolscommunity engagement
11
Run Assays4
SAMPLE1
SAMPLE2
SAMPLE3
SAMPLE4
SAMPLE5
SAMPLE6
SAMPLE7
SAMPLE8
SAMPLE9
SAMPLE10
SAMPLE11
SAMPLE 1
SAMPLE 2
SAMPLE 3
SAMPLE 4
SAMPLE 5
SAMPLE 6
SAMPLE 7
SAMPLE 8
SAMPLE 9
SAMPLE 10
SAMPLE 11
FILE 1
FILE 2
FILE 3
FILE 4
FILE 5
FILE 6
FILE 7
FILE 8
FIL
FIL
FIL
Experiment Design Analysis
Arabidopsis thaliana
Treatment groups
70% 90% 100%
Collect Samples1 2 3 5
6
Parses ISA-Tab datasets into R objects, allowing to update them and save them after analysis.
Bridges the ISA-Tab metadata to analysis pipelines of specific assay types, by building objects for use in other R packages downstream: currently considering mass spectrometry (xmcs package, xcmsSet) and DNA microarray (Biobase package, ExpressionSet)
Suggests packages in BioConductor that might be relevant for an assay type, according to the BioCViews annotations.
Gonzalez-Beltran et al. The Risa R/Bioconductor package: integrative data analysis from experimental metadata and back again. In press
Data Publication with
• New open-access, online-only publication for descriptions of scientifically valuable datasets
• Only content type: Data Descriptor, narrative + structured parts
• Initially focused on the life, environmental and biomedical sciences
• Data Descriptor will be complementary to traditional research journals and data repositories
• Designed to foster data sharing and reuse, and ultimately to accelerate scientific discoverywww.nature.com/scientificdata
Data Publication withhttp://www.nature.com/scientificdata/
• New open-access, online-only publication for descriptions of scientifically valuable datasets
• Only content type: Data Descriptor, narrative + structured parts
• Initially focused on the life, environmental and biomedical sciences
• Data Descriptor will be complementary to traditional research journals and data repositories
• Designed to foster data sharing and reuse, and ultimately to accelerate scientific discoverywww.nature.com/scientificdata
Data Publication withhttp://www.nature.com/scientificdata/
http://gigasciencejournal.com
1
20
A growing ecosystem of over 30 public and internal resources using the ISA metadata tracking framework (ISA-Tab and/or format) to facilitate standards-compliant collection, curation, management and reuse of investigations in an increasingly diverse set of life science domains, including:
• stem cell discovery• system biology• transcriptomics• toxicogenomics• also by communities working to build a library of cellular
signatures
• environmental health• environmental genomics• metabolomics• metagenomics• nanotechnology• proteomics
22
Suter et al 2011. EU Framework 6 Project: Predictive Toxicology (PredTox)—overview and outcome. Boitier et al 2011. A comparative integrated transcript analysis and functional characterization of differential mechanisms
for induction of liver hypertrophy in the rat
InnoMed PredTox ProjectGoal: earlier pre-clinical safety evaluation by combining results from ‘omics
technologies and conventional toxicology methods
23
2-week systemic rat study using male Wistar rats (N=15 per dose group)
14 proprietary drug candidates from
participating companies and 2 reference toxic
compounds
24
25
26 http://www.ebi.ac.uk/bioinvindex/study.seam?studyId=BII-S-8
27
Data Infrastructure for Chemical Safety
http://www.dixa-fp7.eu/about
28
Kohonen et al. 2013 The ToxBank Data Warehouse: a research cluster of 7 EU FP7 Health systems toxicology and toxicogenomics projects.
Safety Evaluation Ultimately Replacing Animal Testing-1 (SEURAT-1): looking at improving safety assessment without the need for animal experiments
ToxBank: cross-cluster infrastructure project
http://toxbank.net
29 https://wiki.nci.nih.gov/display/ICR/ISA-TAB-Nano
Nanotechnology Informatics Working Group
Thomas et al. 2013 ISA-TAB-Nano: A specification for sharing nanomaterial research data in spreadsheet-based format
Baker et al. 2013 Standardizing data
ISA-TAB-Nano
Extension of ISA-TAB format to represent nano-materials, small molecules and
biological specimens along with their assay characterisation data
30
Data Scientist
Visualization
Analysis
Planning
Data Management
Data CollectionPublication
31
Questions?
You can email [email protected]
View our bloghttp://isatools.wordpress.com
Follow us on Twitter@isatools
View our websitehttp://www.isa-tools.org
View our Git repo & contributehttp://github.com/ISA-tools