+ All Categories
Home > Documents > Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop,...

Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop,...

Date post: 15-Jan-2016
Category:
Upload: isabell-swenson
View: 216 times
Download: 0 times
Share this document with a friend
47
Taverna the story from up- above Antoon Goderis The University of Manchester, UK http:// www.mygrid.org.uk/taverna http://www.omii.ac.uk DART workshop, Brisbane, Australia, 14 December 20
Transcript
Page 1: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

Tavernathe story from up-aboveAntoon Goderis

The University of Manchester, UKhttp://www.mygrid.org.uk/tavernahttp://www.omii.ac.uk

DART workshop, Brisbane, Australia, 14 December 2006

Page 2: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

2

Overview The situation in –omics Creating new biology using Taverna Taverna

Key traits Features on the OMII roadmap

Including today’s release

Page 3: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

3

Bioinformaticians & co.

Page 4: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

4

Open environmentData, Data, Data

EBI

SeqHoundSRS

National Center for Biotechnology Information (USA)

Cambridge, UKTokyo, Japan

Page 5: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

5

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Page 6: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

6

The situation in {genomics, transcriptomics, proteomics,

metabolomics ..} Lots of data Lots of parameters to choose An analysis takes a long time The analysis services are unreliable Lots of analysis steps Need to record and explain your steps

Page 7: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

7

Enter workflows Lots of data

[high throughput] Lots of parameters to choose

[best practice] An analysis takes a long time

[long running] The analysis services are unreliable

[fault tolerance] Lots of analysis steps

[data and control flow] Need to record and explain your steps

[provenance]

Page 8: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

8

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg

Workflow-based middleware

Page 9: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

9

myGrid myGrid http://www.mygrid.org.uk UK e-Science pilot project since 2001 Part of the Open Middleware Infrastructure Institute UK Build middleware for Life Scientists that enables them

to undertake in silico experiments and share those experiments and their results.

Individual scientists, in under-resourced labs, who use other people’s applications.

Open source. Workflows & Semantic Techologies for metadata

management. Data flows. Ad hoc & exploratory

Page 10: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

10

Overview The situation in -omics Creating new biology using Taverna Taverna

Key traits Features on the OMII roadmap

Including today’s release

Page 11: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

11

?200

Microarray + QTL

Genes captured in microarray experiment and present in QTL region

Phenotypic response investigated using microarray in form of expressed genes or evidence provided through QTL mapping

Genotype Phenotype

[Andy Brass, Steve Kemp, Paul Fisher, 2006]

Page 12: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

12

Key:

A – Retrieve genes in QTL region

B – Annotate genes with external database Ids

C – Cross-reference Ids with KEGG gene ids

D – Retrieve microarray data from MaxD database

E – For each KEGG gene get the pathways it’s involved in

F – For each pathway get a description of what it does

G – For each KEGG gene get a description of what it does

[Andy Brass, Steve Kemp, Paul Fisher, 2006]

Page 13: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

13

Result Captured the pathways returned by QTL and

Microarray workflows over the MaxD microarray database

Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance.

Manually analysis on the microarray and QTL data had failed to identify this gene as a candidate.

[Andy Brass, Steve Kemp, Paul Fisher, 2006]

Page 14: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

14

Trichuris muris (mouse whipworm) infection

Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite.

Manual experimentation: Two year study of candidate genes, processes unidentified

Workflows: trypanosomiasis cattle experiment, was reused without change.

Analysis of the data by a biologist found the processes in a couple of days.

[Joanne Pennock, Paul Fisher, 2006]

Page 15: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

15

Changing scientific practice Systematic and comprehensive automation.

Eliminated user bias and premature filtering of datasets and results leading to single sided, expert-driven hypotheses

Dry people hypothesise, wet people validate. “make sense of this data” -> “does this make sense?”

Workflow factories. Different dataset, different result

Accurate provenance.

Page 16: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

16

Overview The situation in -omics Creating new biology using Taverna Taverna

Key traits Features on the OMII roadmap

Including today’s release

Page 17: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

17

User Uptake ~25000 downloads Systems biology Proteomics Gene/protein annotation Microarray data analysis Medical image analysis Heart simulations High throughput

screening Phenotypical studies Plants, Mouse, Human Astronomy Dilbert Cartoons

Page 18: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

18

Finding and Sharing Tools

Taverna Workbench 3rd Party Applications and

Portals

WorkflowEnactor

Service Management

Results Management

ProvenancelogMetadata

DefaultDataStore

CustomStore

DAS

KAVE BAKLAVA

Feta

myExperiment

Utopia

ClientsClients

LSIDs

Workflow enactor

Page 19: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

19

Taverna workbench

Page 20: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

20

3000+ services Open domain services and

resources, Third party. Enforce NO common data model. No common typing, Missing

metadata.

Soaplab InstantSoap

Page 21: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

21

Services Landscape

Page 22: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

22

User Interaction Allows a workflow to call

out to an expert human user

E.g. Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline

[University of Bergen]

Page 23: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

23

Tools, Tools, Tools

Feta Search tool

Pedro Annotation tool

Page 24: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

24

Capture and Curation Effort

Ontology and Annotation Curation Team

Franck Tanoh and Katy Wolstencroft

Community Service Providers

Community Scientists

Page 25: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

25

Scufl Model

TavernaWorkbench

Shielding & Extensible

plug-ins

Workflow Execution

Application

Workflow enactor

Processor Processor

PlainWeb

Service

Soaplab

Processor

LocalJava App

Processor

WFEnactor

Processor

BioMOBY

Processor

SeqHound

Processor

BioMART

Processor

WSRF

Processor

Beanshell

Simple Conceptual Unified Flow Language

Nested workflows, Automatic iterations,Best guess data type handling

Page 26: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

26

Service incompatibility Fix up the services to be compatible or…. Shims – libraries of adapters. Automated data type matching using reasoning over

a mismatch and service ontology

Duncan Hull, myGridKhalid Belhajjame, ISPIDER

Page 27: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

27

Shimidentification

Mismatchdetection

Page 28: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

28

Service failure? Most services are owned by other people No control over service failure Some are research level

Workflows only as good as the services they connect. Notify failures Instigate retries Set criticality Substitute services

Page 29: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

29

Provenance Collection Observes events from

the workflow engine Populates an RDF triple

store with information from these events

Browse interface Simple browser replicates

Taverna’s existing result and status browser

Graphical browser ProQA Query API

urn:data:f2

urn:data:f2

urn:data1urn:data1

urn:data2urn:data2

urn:compareinvocation3urn:compareinvocation3

urn:data12

urn:data12

Blast_report

[input]

[output]

[input]

[distantlyDerivedFrom]

SwissProt_seq

[instanceOf]

Sequence_hit

[hasHits]

urn:hit2….

urn:hit2….

urn:hit1…urn:hit1…

urn:hit50…..

urn:hit50…..

[instanceOf]

[similar_sequence_to]

Data generated by services/workflows

Concepts

[ ]

[performsTask]

Find similar sequence

[contains]

Services

urn:data:3urn:data:3

urn:hit8….

urn:hit8….

urn:hit5…urn:hit5…

urn:hit10…..

urn:hit10…..

[contains]

[instanceOf]

urn:BlastNInvocation3urn:BlastNInvocation3

urn:invocation5urn:invocation5urn:data:f1

urn:data:f1

[output]

New sequence

Missed sequence

[hasName] [hasName

]

literalsDatumCollection

[type]

LSDatum

[type]Properties

[instanceOf]

[output]

[output]

[directlyDerivedFrom]

[Zhao et al 07 provenance challenge paper]

Page 30: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

30

Page 31: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

31

Provenance Tracking

From which Ensembl gene does pathway mmu004620 come from?

Page 32: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

32

Pathway_id KEGG_id Uniprot Ensembl_gene_id

Entrez

dF

dF

dF dF

Workflows over Results

Automatically backtrack through the data provenance graph

Page 33: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

33

A workflow marketplace

Page 34: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

34

webTaverna GUI - main

Page 35: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

35

Overview The situation in -omics Creating new biology using Taverna Taverna

Key traits Features on the OMII roadmap

Including today’s release

Page 36: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

36

Ingest Ingest

Early adoptersPioneers

Pioneers ConservativesEarly adoptersPioneers

myGridPre-release

myGrid Release

OMII-UKRelease

Software Engineering

XP

Software Engineering

Quality & Test

Evaluation Evaluation OMII Software Engineering

Quality & TestPrioritise & Plan

Prioritise & Plan

Production Applications & Professional ServicesApplications & Professional Services

myGridAlliance

myGridAlliance

Source-forgecommunity

Source-forgecommunity

Page 37: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

37

Who are the OMII Users?

Increasing variation in requirements with the scientific domain.

Different scientific/research domains

End Users

Application Developers

Service and Middleware Developers

Middleware Deployers

Diff

ere

nt a

ctivitie

s

Systems Administrators

Page 38: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

38

Taverna is now part of OMII-UK Taverna 1.5 – Today! Taverna 1.6 myExperiment

Page 39: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

39

Integrated provenance Raven release mechanism to simplify updates

for the user +/- 300 semantic annotations for core services Patterns for using proxies for bulk data

transactions Redeveloped plug in and enactor framework,

improved iteration events, data management

Taverna 1.5

Page 40: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

40

Integrated provenance

Taverna 1.5

Page 41: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

41

Integrated provenance Raven release mechanism to simplify updates for the

user

Taverna 1.5

Page 42: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

42

Integrated provenance Raven release mechanism to simplify updates for the

user +/- 300 semantic annotations for core services

Add_ncbi_to_string : beanshell script, need to ask Paul for more detailsInput:Output:

Kegg_gene_ids_all_species (bconv): converts external IDs to KEGG IDs [mapping]string: External ID . e.g. NCBI ID [Genebank_GI] return: KEGG gene ID [KEGG_record_id]

Get_pathways_by_genes: Search all pathways which include all the given genes [Searching]Input: List of KEGG genes id [KEGG_gene_id]Output: Return a list of pathway_id of specified KEGG genes_id

Merge_pathwaysStringlistConcatenated

This workflow takes in Entrez gene ids then adds the string "ncbi-geneid:" to the start of each gene id. These gene ids are then cross-referenced to KEGG gene ids. Each KEGG gene id is then sent to the KEGG pathway database and its relevant pathways returned.

Taverna 1.5

Page 43: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

43

Integrated provenance Raven release mechanism to simplify updates for the

user +/- 300 semantic annotations for core services Patterns for using proxies for bulk data transactions Redeveloped plug in and enactor framework, improved

iteration events, data management

Taverna 1.5

Page 44: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

44

Taverna 1.6 Due out Summer 2007

Revised enactment core Native support for long running workflows Data proxy to deal with bulk data transactions Improved service discovery and provenance

management

Page 45: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

46

Obtaining Taverna Taverna is available under the LGPL from our

project site on Sourceforge.net http://taverna.sourceforge.net

Win32, Solaris / Linux & OS-X Includes online and downloadable user

manual, examples etc. Support via project mailing lists

Page 46: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

47

Conclusions See plans for Taverna 2.0 on myGrid wiki Taverna development is user-driven

Please keep in touch and tell us what you would like to see by the myGrid mailing lists: Taverna Users, Taverna Hackers

Taverna http://taverna.sourceforge.netmyGrid http://www.mygrid.org.ukOMII-UK http://www.omii.ac.uk

Page 47: Taverna the story from up-above Antoon Goderis The University of Manchester, UK   DART workshop, Brisbane,

48

Phase1 myGrid researchers, Phase2 OMII-UK, myGrid Research Team

Peter Li, Paul Fisher, Andy Brass, Robert Stevens, Mark Wilkinson

EPSRC, Wellcome Foundation, EU

Acknowledgements


Recommended