+ All Categories
Home > Science > M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive...

M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive...

Date post: 08-Aug-2015
Category:
Upload: martin-scharm
View: 195 times
Download: 0 times
Share this document with a friend
Popular Tags:
16
SYSTEMS BIOLOGY BIOINFORMATICS ROSTOCK SE S simulation experiment management system M2CAT Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit MARTIN SCHARM Department of Systems Biology & Bioinformatics, University of Rostock http://sems.uni-rostock.de SBI Research Seminar SS 2015 April 13, 2015 M2CAT | Martin Scharm 1
Transcript
Page 1: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

SYSTEMS BIOLOGY

BIOINFORMATICS

ROSTOCKS E Ssimulation experiment management system

M2CATExtracting reproducible simulation studiesfrom model repositories using theCombineArchive Toolkit

MARTIN SCHARMDepartment of Systems Biology & Bioinformatics, University of Rostock

http://sems.uni-rostock.de

SBI Research SeminarSS 2015

April 13, 2015 M2CAT | Martin Scharm 1

Page 2: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

SYSTEMS BIOLOGY

BIOINFORMATICS

ROSTOCK

Improving the Management of Simulation Studies in Computational BiologyMartin Scharm, Vivek Garg, Srijana Kayastha, Martin Peters, Dagmar Waltemath

Events

S E Ssimulation experiment management system

https://sems.uni-rostock.de

de.NBI InfrastructureWe will provide data management and support for systems biol-ogy projects, with a focus on provenance and reproducibility ofexperimental and modelling results. de.NBI:SYSBIO is part of alarge German Network for Bioinformatics Infrastructure.WE ARE HIRING!

Ø

p-cyclincdc2-p

p-cyclincdc2

cdc2k

p-cyclin

cdc2k-P

ØcyclinØ

totalcdc2

SBGN-EDSBGN is a markup language to describe mod-els and exchange information about biological sys-tems graphically. We will further develop meth-ods and tools for SBGN-compliant visualisation ofmodel-related information. WE ARE HIRING!

CombineArchive ToolkitSharing in silico experiments is essential for the advance of researchin computational biology. The COMBINE archive is a digital containerformat to easen the management of numerous files and to enable theexchange of reproducible modelling results. We developed the Combin-eArchive Toolkit, consisting of a library, a web interface and a desktopapplication. It support scientists in creating, exploring, modifying, andsharing COMBINE archives.

2MT2MT is our web based platform todemonstrate the capabilities of SEMS-related tools. It exemplifies how ourmodel management solutions can beused in existing tools.

Models as graphsThe increasing diversity of model-related data that is nec-essary to perform a simulation study leads to new chal-lenges in model storage. We developed a concept forgraph-based storage of models and model-related data.Graphs reflect the models’ structure much better, enablelinking of model-related data on the storage layer, and al-low for an efficient search.

MasymosContaining SBML- and CellML models,linked semantic annotations (e.g., from bio-ontologies), simulation descriptions, graph-ical representations and other availabletypes of model-related data, out graphdatabase Masymos can now be queried forcomplete simulation experiments.

MorreOur retrieval engine for models applies In-formation Retrieval techniques to retrieverelevant models from MASYMOS. The pro-posed ranking and retrieval techniques fo-cus on the processing of model meta-information.

Ontology of DifferencesChanges in model versions are manifoldand appear on different layers. We de-velop an ontology of differences occurring inmodel versions. It will support researchersin analysing differences, discovering typicalchanges, summarising major changes andproviding statistics.

Version Control forComputational Models

With thousands of models available, a framework to track the differencesbetween models and their versions is essential to compare and combinemodels. Focusing on SBML and CellML, we developed an algorithm toaccurately detect and describe differences between versions of a modelwith respect to (i) the models’ encoding, (ii) the structure of biologicalnetworks, and (iii) mathematical expressions.

version x-1 version x version x+1

C

D

H E

A

B

C D E

F

G

A

B

D H E

F

G BiVeSArmed with our method for difference detec-tion, BiVeS is able to detect and communicatethe differences in computational models. Thedifferences are exported in several machine-and human-readable formats, ideally suited tobe integrated in other tools.

BudHatBudHat showcases how BiVeS improvesthe understanding of a model’s changes.BudHat calls BiVeS for the comparisontwo versions of a computational model anddisplays the obtained results in the webbrowser.

VW Summer School, March 9-13, 2015During the 2015 Whole Cell summer school we aim todevelop a standard-compliant, open version of the whole-cell model. Eleven tutors and 48 students will hack andcode, model and simulate, layout and annotate the whole-cell model using openly available software and COM-BINE standards. This event is funded by the VolkswagenStiftung.

HARMONY, April 19-23, 2015HARMONY is a hackathon-type meeting of the COMBINE Community,with a focus on development of the standards, interoperability and infras-tructure. Instead of general discussions or oral presentations, the time isdevoted to hands-on hacking and interaction between people focused onpractical development of software and standards. The HARMONY 2015is located at the Leucorea Wittenberg and it is hosted by the groups ofFalk Schreiber and Dagmar Waltemath.

m n

Workshop on Reproducible and Citable Dataand Models, September 14-16, 2015Computational biologists and experimentalists will learnabout standards, citable data, about how to make scien-tific results sustainable, available through open reposito-ries, and about how to find and reuse other people’s worksin a mixture of lectures and hands-on sessions. The work-shop is funded by the ERASYS-APP program.

Ron Henkel

Dagmar Waltemath

Martin ScharmMartin Peters

Vivek Garg

Srijana Kayastha

-

Page 3: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Status QuoModel repositories host tons of data

þ

þ

þ

þ⊕

⊕⊕

>>

>

þ Models

⊕ Documentation

> Simulation descriptions

April 13, 2015 M2CAT | Martin Scharm 3

Page 4: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

GAPIt is challenging to extract reproducible studies.

þ

þ

þ

þ ⊕

⊕⊕

>>

>

Database ofmodels andrelated data

Data necessary toreproduce a

simulation study

þ Model file

þ Model file

þ Model file

⊕ Journal article

> SED-ML file

Reproduction is aCHALLENGE!

Extracting the data isalready a challenge!

Understanding andusing it is almost

impossible.

??

April 13, 2015 M2CAT | Martin Scharm 4

Page 5: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Reproducibilityis a challenge

http://www.nature.com/nature/journal/v483/n7391/fig_tab/483531a_T1.html

April 13, 2015 M2CAT | Martin Scharm 5

Page 6: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

The COMBINE Archiveone file to share them all

TM

April 13, 2015 M2CAT | Martin Scharm 6

Page 7: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

M2CATfrom Masymos to CAT

internet

internet

SEARCHubiquitin

internet

RESULTSEXPORT

EXPORT

EXPORT

EXPORT

Query databasefor annotations, persons,simulation descriptions

Retrieve informationabout models, simulations,figures, documentation

Export simulation studyas COMBINE archive

Download archiveand open the studywith your favouritesimulation tool

Open archive in CATto modify its contents andto share it with others

Scharm et al. 2015: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit, BTW 2015, Hamburg

April 13, 2015 M2CAT | Martin Scharm 7

Page 8: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

M2CATfrom Masymos to CAT

internet

internet

SEARCHubiquitin

internet

RESULTSEXPORT

EXPORT

EXPORT

EXPORT

Search for simulation studies in Masymos

Retrieve relevant results

Export the studies as COMBINE archives using theCombineArchive Toolkit

April 13, 2015 M2CAT | Martin Scharm 8

Page 9: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Searchfor simulation studies in Masymos

internet

internet

SEARCHubiquitin

internet

RESULTSEXPORT

EXPORT

EXPORT

EXPORT

Models Model related data

Document

Tyson1991 Cell Cycle 6

var

C2 pM CellReaction3 CP

Uniprot:P04551 Uniprot:P04551 GO:0005623Interpro:

IPR006670

isV

ersi

onO

f

isV

ersi

on

hasP

art

is

asProductasReactant isContainedIn

Pubmed:1831270

Kegg Pathwaysce04111

isDescribedBy

is

EC-Code: 3.1.3.16

isV

ersi

onO

f

Document

Model

sodium channel

sodium channel m

gate

time

envmt

has_annotation Pubmed:12991237

time timevm v m

is_connected is_connected

is_mapped_to

Document

SEDML

Modelreference

OutputDatagenera

torSimulation Task

Variable

Variable

SBO:Ontology

SBO:0000

SBO:544 SBO:236SBO:231

isA

SBO:064 SBO:545SBO:004 SBO:003

Henkel et al. 2014, Combining computational models, semantic annotations and simulation experiments in a graph database, Database

April 13, 2015 M2CAT | Martin Scharm 9

Page 10: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Searchfor simulation studies in Masymos

internet

internet

SEARCHubiquitin

internet

RESULTSEXPORT

EXPORT

EXPORT

EXPORT

Document

Tyson1991 Cell Cycle 6

var

C2 pM CellReaction3 CP

Uniprot:P04551 Uniprot:P04551 GO:0005623Interpro:

IPR006670

isV

ersi

onO

f

isV

ersi

on

hasP

art

is

asProductasReactant isContainedIn

Pubmed:1831270

Kegg Pathwaysce04111

isDescribedBy

is

EC-Code: 3.1.3.16

isV

ersi

onO

f

Show me models byTyson describing the cellcycle and have cdc2!

rule the world

Person

Annotation

1. (0.859) Tyson1991 - Cell Cycle 6 var2. (0.854) Tyson2001_Cell_Cycle_Regulation3. (0.477) Chen2004 - Cell Cycle Regulation

Henkel et al. 2010: Ranked retrieval of Computational Biology models, BMC Bioinformatics

April 13, 2015 M2CAT | Martin Scharm 10

Page 11: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Retrieverelevant data

internet

internet

SEARCHubiquitin

internet

RESULTSEXPORT

EXPORT

EXPORT

EXPORT

hitf

orTy

son Model file

Simulation description

+ Additional information

April 13, 2015 M2CAT | Martin Scharm 11

Page 12: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

ExportCOMBINE archives using CAT

internet

internet

SEARCHubiquitin

internet

RESULTSEXPORT

EXPORT

EXPORT

EXPORT

Desktop WebApplication Interface

CombineArchive Library

CombineArchive Toolkit

create

science sucks - stern i4ever

modifyBioModelsDatabase

explore

CellML ModelRepository

share

science sucks - stern i4ever

science sucks - stern i4ever

science sucks - stern i4ever

science sucks - stern i4ever

Martin Scharm

Florian Wendland

Martin Peters

Dagmar Waltemath

Tom Theile

Markus Wolfien

Scharm et al. 2014: The CombineArchiveWeb application – A web based tool to handle files associated with modelling results, SWAT4LS, Berlin

ceur-ws.org/Vol-1320/paper_19.pdf

April 13, 2015 M2CAT | Martin Scharm 12

Page 13: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Exportexplore COMBINE archives at Web CAT

internet

internet

SEARCHubiquitin

internet

RESULTSEXPORT

EXPORT

EXPORT

EXPORT

various files fromdifferent resources

as much metaas available

April 13, 2015 M2CAT | Martin Scharm 13

Page 14: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Download the archiveand use it in other software

internet

internet

SEARCHubiquitin

internet

RESULTSEXPORT

EXPORT

EXPORT

EXPORT

April 13, 2015 M2CAT | Martin Scharm 14

Page 15: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

Summary

• M2CAT implements a workflow to extract reproducible simulation studies frommodel repositories

• It searches in Masymos https://sems.uni-rostock.de/projects/masymos/

• And creates and displays COMBINE archives using theCombineArchive Toolkit https://sems.uni-rostock.de/projects/combinearchive/

• all is available from our website: http://sems.uni-rostock.de

April 13, 2015 M2CAT | Martin Scharm 15

Page 16: M2CAT: Extracting reproducible simulation studies from model repositories using the CombineArchive Toolkit

SYSTEMS BIOLOGY

BIOINFORMATICS

ROSTOCKS E Ssimulation experiment management system

Thank you for your attention!

SEMS group

Dagmar WaltemathMartin PetersVivek GargSrijana KayasthaOlaf Wolkenhauer

@SemsProject

http://sems.uni-rostock.de

April 13, 2015 M2CAT | Martin Scharm 16


Recommended