Post on 19-Apr-2020
transcript
An exemplar for data integration in the biomedical domain driven by the ISA framework
Shannan Ho SuiAMIA, March 19, 2013
http://stemcellcommons.org
This is a story about collaboration...
ISA
ISA
• Inconsistent data formats, experimental descriptions and results
Disparate Stem Cell Resources
Disparate Stem Cell Resources
• Inconsistent data formats, experimental descriptions and results
The Stem Cell Commons
• A shared data and analytical resource
• Bioinformatics support for research at the HSCI
• A community
Data repository
Analysis system
Support/consults
Susanna-Assunta Sansoneisacommons.org
user community
General-purpose, configurable format, designed to support the use of several standards checklists, terminologies and conversions to (a growing number of) other me t ad a t a formats , u s ed by publ i c repositories, e.g.
MAGE-Tab
SRA-xml SOFT
Pride-xml
Rationale for developing ISA
Capture all salient features of the experimental workflow
Make annotation explicit and discoverable
Support data provenance tracking
Use community standards
Susanna-Assunta Sansoneisacommons.org
ISA
Manual merging process
53 studies
1098 assays
87 studies
1179 assays
Curator
148 studies
2356 assays
ISA
Conversion driven by ISA-Tab
53 studies
1098 assays
87 studies
1179 assays
ISA-Tab
148 studies
2356 assays
Data uploads and annotation
Current Data Statistics
Filtering data using metadata as search facets
Experiment description
Experimental protocols and data downloads
ISA-Tab metadata downloads and export
Linking data to the Galaxy workflow engine
Refinery: An analysis and visualization framework
In development
Viewing and selecting samples in list view
Viewing and selecting samples in matrix view
Initiating workflows
Monitoring progress
Integration with the IGV genome browser
Challenges• Changing research culture(s) to recognize the value
of data sharing
• Manually curating the data for consistency and completeness
• Managing large volumes of data
• Standardizing workflows
• Ensuring interoperability when integrating multiple systems and tools
• Technical complexity of software development effort
Refinery
Psalm HaseleyNils Gehlenborg Richard Park Ilya SytchevPeter Park Shannan Ho Sui
ISA Commons
Philippe Rocca-Sera
Eamonn MaguireSusanna Sansone
Oxford e-Research CentreA growing community that uses the ISA metadata tracking framework to facilitate standards-compliant collection, curation, managementand reuse of datasets.
WikiPathways
Meet the TeamCenter for Stem Cell Bioinformatics
Winston HideProgram Leader
Shannan Ho SuiAnalytics
Oliver HofmannCore services
Ilya SytchevBioinformatics Developer
John HutchinsonHSCI Analyst
Sudeshna DasRepository
Stéphane CorlosquetBioinformatics Engineer
Emily MerrillBioinformatics Analyst
• Nils Gehlenborg• Richard Park• Psalm Haseley• Peter Park
Collaborators
• Eamonn Maguire• Philippe Rocca-Sera• Susanna Sansone