Post on 17-Jul-2015
transcript
SYSTEMS BIOLOGY
BIOINFORMATICS
ROSTOCKS E Ssimulation experiment management system
M2CATExtracting reproducible simulation studiesfrom model repositories using theCombineArchive Toolkit
MARTIN SCHARMDepartment of Systems Biology & Bioinformatics, University of Rostock
http://sems.uni-rostock.de
9th International CellML WorkshopWaiheke Island, New Zealand 2015
April 13, 2015 M2CAT | Martin Scharm 1
SYSTEMS BIOLOGY
BIOINFORMATICS
ROSTOCK
Improving the Management of Simulation Studies in Computational BiologyMartin Scharm, Vivek Garg, Srijana Kayastha, Martin Peters, Dagmar Waltemath
Events
S E Ssimulation experiment management system
https://sems.uni-rostock.de
de.NBI InfrastructureWe will provide data management and support for systems biol-ogy projects, with a focus on provenance and reproducibility ofexperimental and modelling results. de.NBI:SYSBIO is part of alarge German Network for Bioinformatics Infrastructure.WE ARE HIRING!
Ø
p-cyclincdc2-p
p-cyclincdc2
cdc2k
p-cyclin
cdc2k-P
ØcyclinØ
totalcdc2
SBGN-EDSBGN is a markup language to describe mod-els and exchange information about biological sys-tems graphically. We will further develop meth-ods and tools for SBGN-compliant visualisation ofmodel-related information. WE ARE HIRING!
CombineArchive ToolkitSharing in silico experiments is essential for the advance of researchin computational biology. The COMBINE archive is a digital containerformat to easen the management of numerous files and to enable theexchange of reproducible modelling results. We developed the Combin-eArchive Toolkit, consisting of a library, a web interface and a desktopapplication. It support scientists in creating, exploring, modifying, andsharing COMBINE archives.
2MT2MT is our web based platform todemonstrate the capabilities of SEMS-related tools. It exemplifies how ourmodel management solutions can beused in existing tools.
Models as graphsThe increasing diversity of model-related data that is nec-essary to perform a simulation study leads to new chal-lenges in model storage. We developed a concept forgraph-based storage of models and model-related data.Graphs reflect the models’ structure much better, enablelinking of model-related data on the storage layer, and al-low for an efficient search.
MasymosContaining SBML- and CellML models,linked semantic annotations (e.g., from bio-ontologies), simulation descriptions, graph-ical representations and other availabletypes of model-related data, out graphdatabase Masymos can now be queried forcomplete simulation experiments.
MorreOur retrieval engine for models applies In-formation Retrieval techniques to retrieverelevant models from MASYMOS. The pro-posed ranking and retrieval techniques fo-cus on the processing of model meta-information.
Ontology of DifferencesChanges in model versions are manifoldand appear on different layers. We de-velop an ontology of differences occurring inmodel versions. It will support researchersin analysing differences, discovering typicalchanges, summarising major changes andproviding statistics.
Version Control forComputational Models
With thousands of models available, a framework to track the differencesbetween models and their versions is essential to compare and combinemodels. Focusing on SBML and CellML, we developed an algorithm toaccurately detect and describe differences between versions of a modelwith respect to (i) the models’ encoding, (ii) the structure of biologicalnetworks, and (iii) mathematical expressions.
version x-1 version x version x+1
C
D
H E
A
B
C D E
F
G
A
B
D H E
F
G BiVeSArmed with our method for difference detec-tion, BiVeS is able to detect and communicatethe differences in computational models. Thedifferences are exported in several machine-and human-readable formats, ideally suited tobe integrated in other tools.
BudHatBudHat showcases how BiVeS improvesthe understanding of a model’s changes.BudHat calls BiVeS for the comparisontwo versions of a computational model anddisplays the obtained results in the webbrowser.
VW Summer School, March 9-13, 2015During the 2015 Whole Cell summer school we aim todevelop a standard-compliant, open version of the whole-cell model. Eleven tutors and 48 students will hack andcode, model and simulate, layout and annotate the whole-cell model using openly available software and COM-BINE standards. This event is funded by the VolkswagenStiftung.
HARMONY, April 19-23, 2015HARMONY is a hackathon-type meeting of the COMBINE Community,with a focus on development of the standards, interoperability and infras-tructure. Instead of general discussions or oral presentations, the time isdevoted to hands-on hacking and interaction between people focused onpractical development of software and standards. The HARMONY 2015is located at the Leucorea Wittenberg and it is hosted by the groups ofFalk Schreiber and Dagmar Waltemath.
m n
Workshop on Reproducible and Citable Dataand Models, September 14-16, 2015Computational biologists and experimentalists will learnabout standards, citable data, about how to make scien-tific results sustainable, available through open reposito-ries, and about how to find and reuse other people’s worksin a mixture of lectures and hands-on sessions. The work-shop is funded by the ERASYS-APP program.
Ron Henkel
Dagmar Waltemath
Martin ScharmMartin Peters
Vivek Garg
Srijana Kayastha
-
Status QuoModel repositories host tons of data
þ
þ
þ
þ⊕
⊕⊕
>>
>
þ Models
⊕ Documentation
> Simulation descriptions
April 13, 2015 M2CAT | Martin Scharm 3
GAPIt is challenging to extract reproducible studies.
þ
þ
þ
þ ⊕
⊕⊕
>>
>
Database ofmodels andrelated data
Data necessary toreproduce a
simulation study
þ Model file
þ Model file
þ Model file
⊕ Journal article
> SED-ML file
Reproduction is aCHALLENGE!
Extracting the data isalready a challenge!
Understanding andusing it is almost
impossible.
??
April 13, 2015 M2CAT | Martin Scharm 4
Reproducibilityis a challenge
April 13, 2015 M2CAT | Martin Scharm 5
The COMBINE Archiveone file to share them all
TM
April 13, 2015 M2CAT | Martin Scharm 6
The COMBINE Archiveone file to share them all
TM
April 13, 2015 M2CAT | Martin Scharm 6
M2CATfrom Masymos to CAT
internet
internet
SEARCHubiquitin
internet
RESULTSEXPORT
EXPORT
EXPORT
EXPORT
Query databasefor annotations, persons,simulation descriptions
Retrieve informationabout models, simulations,figures, documentation
Export simulation studyas COMBINE archive
Download archiveand open the studywith your favouritesimulation tool
Open archive in CATto modify its contents andto share it with others
Scharm et al. 2015: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit, BTW 2015, Hamburg
April 13, 2015 M2CAT | Martin Scharm 7
M2CATfrom Masymos to CAT
internet
internet
SEARCHubiquitin
internet
RESULTSEXPORT
EXPORT
EXPORT
EXPORT
Search for simulation studies in Masymos
Retrieve relevant results
Export the studies as COMBINE archives using theCombineArchive Toolkit
April 13, 2015 M2CAT | Martin Scharm 8
Searchfor simulation studies in Masymos
internet
internet
SEARCHubiquitin
internet
RESULTSEXPORT
EXPORT
EXPORT
EXPORT
Models Model related data
Document
Tyson1991 Cell Cycle 6
var
C2 pM CellReaction3 CP
Uniprot:P04551 Uniprot:P04551 GO:0005623Interpro:
IPR006670
isV
ersi
onO
f
isV
ersi
on
hasP
art
is
asProductasReactant isContainedIn
Pubmed:1831270
Kegg Pathwaysce04111
isDescribedBy
is
EC-Code: 3.1.3.16
isV
ersi
onO
f
Document
Model
sodium channel
sodium channel m
gate
time
envmt
has_annotation Pubmed:12991237
time timevm v m
is_connected is_connected
is_mapped_to
Document
SEDML
Modelreference
OutputDatagenera
torSimulation Task
Variable
Variable
SBO:Ontology
SBO:0000
SBO:544 SBO:236SBO:231
isA
SBO:064 SBO:545SBO:004 SBO:003
Henkel et al. 2014, Combining computational models, semantic annotations and simulation experiments in a graph database, Database
April 13, 2015 M2CAT | Martin Scharm 9
Searchfor simulation studies in Masymos
internet
internet
SEARCHubiquitin
internet
RESULTSEXPORT
EXPORT
EXPORT
EXPORT
Document
Tyson1991 Cell Cycle 6
var
C2 pM CellReaction3 CP
Uniprot:P04551 Uniprot:P04551 GO:0005623Interpro:
IPR006670
isV
ersi
onO
f
isV
ersi
on
hasP
art
is
asProductasReactant isContainedIn
Pubmed:1831270
Kegg Pathwaysce04111
isDescribedBy
is
EC-Code: 3.1.3.16
isV
ersi
onO
f
Show me models byTyson describing the cellcycle and have cdc2!
rule the world
Person
Annotation
1. (0.859) Tyson1991 - Cell Cycle 6 var2. (0.854) Tyson2001_Cell_Cycle_Regulation3. (0.477) Chen2004 - Cell Cycle Regulation
Henkel et al. 2010: Ranked retrieval of Computational Biology models, BMC Bioinformatics
April 13, 2015 M2CAT | Martin Scharm 10
Retrieverelevant data
internet
internet
SEARCHubiquitin
internet
RESULTSEXPORT
EXPORT
EXPORT
EXPORT
hitf
orTy
son Model file
Simulation description
+ Additional information
April 13, 2015 M2CAT | Martin Scharm 11
ExportCOMBINE archives using CAT
internet
internet
SEARCHubiquitin
internet
RESULTSEXPORT
EXPORT
EXPORT
EXPORT
Desktop WebApplication Interface
CombineArchive Library
CombineArchive Toolkit
create
science sucks - stern i4ever
modifyBioModelsDatabase
explore
CellML ModelRepository
share
science sucks - stern i4ever
science sucks - stern i4ever
science sucks - stern i4ever
science sucks - stern i4ever
Martin Scharm
Florian Wendland
Martin Peters
Dagmar Waltemath
Tom Theile
Markus Wolfien
Scharm et al. 2014: The CombineArchiveWeb application – A web based tool to handle files associated with modelling results, SWAT4LS, Berlin
ceur-ws.org/Vol-1320/paper_19.pdf
April 13, 2015 M2CAT | Martin Scharm 12
Exportexplore COMBINE archives at Web CAT
internet
internet
SEARCHubiquitin
internet
RESULTSEXPORT
EXPORT
EXPORT
EXPORT
various files fromdifferent resources
as much metaas available
April 13, 2015 M2CAT | Martin Scharm 13
Download the archiveand use it in other software
internet
internet
SEARCHubiquitin
internet
RESULTSEXPORT
EXPORT
EXPORT
EXPORT
April 13, 2015 M2CAT | Martin Scharm 14
Summary
• M2CAT implements a workflow to extract reproducible simulation studies frommodel repositories
• It searches in Masymos https://sems.uni-rostock.de/projects/masymos/
• And creates and displays COMBINE archives using theCombineArchive Toolkit https://sems.uni-rostock.de/projects/combinearchive/
• all is available from our website: http://sems.uni-rostock.de
April 13, 2015 M2CAT | Martin Scharm 15
SYSTEMS BIOLOGY
BIOINFORMATICS
ROSTOCKS E Ssimulation experiment management system
Thank you for your attention!
SEMS group
Dagmar WaltemathMartin PetersVivek GargSrijana KayasthaOlaf Wolkenhauer
@SemsProject
http://sems.uni-rostock.de
April 13, 2015 M2CAT | Martin Scharm 16