VBI Web Services Workshop 26-27 May 2005
Performing In silico Experiments in a Service Based Architecture: Solutions and Issues
Chris Wroe, Phillip Lord, Robert Stevens & Carole Goble
The University of Manchester, UK
http://www.mygrid.org.uk
VBI Web Services Workshop 26-27 May 2005
EPSRC funded UK eScience Program Pilot Project
Thanks to the other members of the Taverna project, http://taverna.sf.net
VBI Web Services Workshop 26-27 May 2005
Core• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro
Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Jan Humble, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Ian Roberts, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson, Jimi Worthington and Chris Wroe.
Users• Simon Pearce and Claire Jennings, Institute of Human Genetics School of
Clinical Medical Sciences, University of Newcastle, UK• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester,
UK• Steve Kemp, Liverpool, UKPostgraduates• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, Keith Flanagan, Antoon
Goderis, Tracy Craddock, Alastair HampshireIndustrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)• Robin McEntire (GSK)Collaborators• Keith Decker
VBI Web Services Workshop 26-27 May 2005
Bioinformatics Services
• A typical HAD environment– Distributed, Autonomous and very, very Heterogeneous
• No standard API or calling mechanisms
• Complex types are often implicit – everything is String
• No domain typing – everything is String
• Numerous Services and growing
• Close the world – controlled, but constrained
• Open the world – uncontrolled, but versatile
VBI Web Services Workshop 26-27 May 2005
In silico Bioinformatics
• Bioinformatics experiments use 1, 2 up to N services chained together
• Ultimate result is the goal and some or all intermediates are part of the goal
• Intermediates are necessary for evidence gathering• Often need to be repeated• Often need to be re-purposed• Workflows offer a suitable model for bioinformatics
experiments
VBI Web Services Workshop 26-27 May 2005
Williams-Beuren Syndrome• Contiguous sporadic gene deletion disorder• 1/20,000 live births, caused by unequal
crossover (homologous recombination) during meiosis
• Haploinsufficiency of the region results in the phenotype
Chr 7 ~155 Mb
~1.5 Mb7q11.23
**
WBS
SVAS
Patient deletions
CTA-315H11
CTB-51J22
‘Gap’
Physical Map
VBI Web Services Workshop 26-27 May 2005
1. Identify new, overlapping sequence of interest2. Characterise the new sequence at nucleotide and amino acid
level
Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc
12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa
VBI Web Services Workshop 26-27 May 2005
• The individual scientist doodling
• Workflows & distributed queries to link up your own and others resources
• Data intensive, up stream pipelines
• Reuse - sharing and adapting workflows & resources, and their outcomes
• Semantic descriptions for discovery, validation & linkage
• Whole experiment lifecycle, including logging provenance
Middleware for data intensive in silico biology by bioinformaticians
Discovering and reusingexperiments
and resources
Managing lifecycle,
provenance and results
Sharingservices &
experiments
Personalisation
Forming experiments
Executing &
monitoring experiments
VBI Web Services Workshop 26-27 May 2005
An Open World • Open source• Open domain services and resources• Open community• Open application
– Nothing specific to biology but oriented to • Open model and open data
– No prescribed typing or domain data model
– A layered information model• Open architecture
– Service Oriented Architecture– Loosely coupled– Web services based– Assemble your own components– Designed to work together
TavernaFreefluo
Grimoire
Registry
EventNotification mIR
PedroAnnotation
FetaDiscovery
Info.Model
SoaplabGowlabBioNanny
MediatorPortal
LSIDs
KAVE
DQP
VBI Web Services Workshop 26-27 May 2005
Biologists
BioinformaticiansService Providers
Stakeholders
VBI Web Services Workshop 26-27 May 2005
• Jam today• Important for take up and
community building.• Take up leads to much better
understanding. • Energy of bioinformaticians and
service providers• Dealing with lots of legacy
remote services• Incorporating my bits and
pieces • Networking effects• Added value with added effort
Activation Energy
Cost
Ben
efit
VBI Web Services Workshop 26-27 May 2005
Scufl Simple Conceptual Unified Flow LanguageTaverna Writing, running workflows & examining resultsSOAPLAB Makes applications available
Freefluo Workflow engine to run workflows
Freefluo
SOAPLABWeb Service
Any Application
Web Service e.g. DDBJ BLAST
Taverna
SeqHoundService
Special processor
http://taverna.sourceforge.net/
VBI Web Services Workshop 26-27 May 2005
Viewer plug-ins
Service failure protocol
Viewer plug-ins
VBI Web Services Workshop 26-27 May 2005
1..*0..* uses
1
0..*
contains
0..*1
method method
1 0..*has instances
0..*
0..*
researchFocus
0..*
1
uses
10..*
0..*
1
acts in
1
0..*
initiates
1 1..*episodes
10..*
labBooks
scmInvestigator
1 0..*has participants10..*
participates in
selected studies
<<Resource>>Operations.Operation
Annotation.SemanticConcept
SubjectObject
Resources.Resource
+getId:URIString
ProgrammeResource
+name:String
<<Resource>>Study
+name:String+description:String+startTime:DateTime+endTime:DateTime+status:String
AgentExperimentInstance
Investigation
<<Resource>>ExperimentDesign
Programme
LabBookView
+name:String+rule:String
Life Science Identifiers
Model Driven Approach
OWL & RDFS OntologiesTo annotate and classify entities with a common vocabulary based on a common understanding.
RDF Knowledge Added Value to Experiment
Information Repository and Common Information model for e-Science
VBI Web Services Workshop 26-27 May 2005
Williams-Beuren Workflows
Characterisation of nucleotide sequence
Identification of overlapping sequence
Characterisation of protein sequence
VBI Web Services Workshop 26-27 May 2005
WBS Workflow Experience• Correct and Biologically meaningful results: Found all expected
results; plus unnoticed pseudo gene
• Automation: Saved time, increased productivity• Sharing: Other people have used and want to develop the workflows,
notably mouse and chicken
VBI Web Services Workshop 26-27 May 2005
Gene annotation pipelines Microarray analysis pipelinesFind differentially expressed genes, e.g. NF-kappa beta inhibitor protein
Autoimmune disease of the thyroid in which the immune system of an individual attacks cells in the thyroid gland resulting in hyperthyroidism
Graves Disease
VBI Web Services Workshop 26-27 May 2005
Trypanosomiasis in cattle Chicken genome
Mouse genome
Reuseadapting and sharing best practice and
know-how across a community
Chris Wroe, Carole Goble, Antoon Goderis, Phillip Lord, Simon Miles, Juri Papay, Pinar Alper, Luc Moreau Recycling workflows and services through discovery and reuse Concurrency and Computation: Practice and Engineering accepted for publication
VBI Web Services Workshop 26-27 May 2005
Thi
rd-
part
y to
ols
Utopia
Haystack LSID Launchpad
myGrid information
model
Applications
Core Services
External Services
Se
rvic
e &
wo
rkflo
w
dis
cove
ry
Feta semantic discovery
GRIMOIRES registry
Web portalsWeb
portals
Tavernae-Science workbench
Wor
kflo
w
en
act
me
nt
Taverna-Freefluoworkflow engine
Met
adat
a M
anag
emen
t KAVE metadata store
ProQAprovenance
manager
myGrid ontology
Soaplab
Gowlab
Termino
Lexical mark-up
Legacy applications
Web Services OGSA-DAI databases
Web Sites
OGSA-DQP service
e-Science coordination e-Science mediator
e-Science process patterns
e-Scien
ce even
ts
LSID support
Dat
a
Man
agem
ent
mIR myGrid information repository
Web Service (Grid Service) communication fabric
Web Service (Grid Service) communication fabric
Notification service
Pedro semantic publication
Pedro semantic publication
Java applications
Executable codes with an IDL
Custom databases
VBI Web Services Workshop 26-27 May 2005
• Taverna currently ships with access to over 1000 services
• But it wasn’t always the case!• Lack of available services, at
least at first• A lot of activation energy
needed that hopeful gets less as services get pooled
• Service partnerships and network effects
• If your service ain’t there, that’s an obstacle.
First, catch your service
VBI Web Services Workshop 26-27 May 2005
• Soaplab and Gowlab wrappers• http://industry.ebi.ac.uk/soaplab/• WSDL scavenging• Processor abstraction over
stereotypical invocation patterns of service families
• Many services are not plain WSDL
• API consumer in Taverna 1.1
Service Bootstrapping
VBI Web Services Workshop 26-27 May 2005
Classes and Interfaces
presented here
User selects appropriate
methods to be exposed within
Taverna
API Consumer Interface•Interoperate existing APIs with SOAP services, SoapLab, BioMoby, SeqHound, caBIG, BioJava, etc.
•Refine complex APIs to sets of task centric functionality
•Take advantage of myGrid infrastructure: monitoring, result browsing, provenance etc. and applies it to your APIs
•Taverna 1.1 onwards, download API consumer and toolset at http://taverna.sf.net
VBI Web Services Workshop 26-27 May 2005
Import into Taverna
Previously created API definition is imported – methods and constructors appear as components alongside other services.
VBI Web Services Workshop 26-27 May 2005
Invocation Heterogeneity• WSDL - single Web Service operation
described in a WSDL file. • Local Java or Beanshell function • Soaplab - CORBA-like stateful protocol of
the Web Service operations • Nested workflow - implemented by a Scufl
workflow.• BioMOBY processor.• SeqHound - a Representational State
Transfer style interface• BioMart - directly accesses queries over a
relational database.• Styx - executes a workflow subgraph
containing streamed services using P2P data transfer based on Styx Grid service protocol.
BLAST
createJob()
setProgram()
run()
getResults()
setDatabase()
setE_value()
blastQuery()
IBM Life Sciences BLAST service
SOAPLAB BLAST service
Processors
VBI Web Services Workshop 26-27 May 2005
Freefluo Workflow enactor
Scufl + Workflow Object Model
Processor Processor
PlainWeb
Service
Soaplab
Processor
LocalApp
Processor
Enactor
TavernaWorkbench
Processor
BioMOBY
Processor
SeqHound
Processor
BioMART
Three tiered abstraction
Application data flow layerScufl graph + service introspection
Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates
Processor invocation layer
Workflow Execution
VBI Web Services Workshop 26-27 May 2005
Architecture Confusagram
Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Duncan Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Robert Stevens, Anil Wipat and Chris Wroe Taverna: Lessons in creating a workflow environment for the life sciences in Concurrency and Computation: Practice and Engineering in press
VBI Web Services Workshop 26-27 May 2005
Soaplab Service
WSDL Web Service BioMOBY Service
Local Java Service
VBI Web Services Workshop 26-27 May 2005
Workflows are not the only game
Workflows
OGSA-DQP
Applications
e-Science coordination e-Science mediator
e-Science process patterns
e-Science events
Notification service
Mediator
Protein Phosphatases
VBI Web Services Workshop 26-27 May 2005
?
• How to select among 1000+ services?
• Mostly inputs & outputs are “string”
• Domain specific descriptions of capabilities
• Selection is part of workflow assembly by bioinformaticians
• Selection of alternates for failure also generally user defined, and usually replicas, but need not be.
So many services, so poorly described
VBI Web Services Workshop 26-27 May 2005
Semantic discovery• Publish and find services (and
workflows) with description using an ontology (in OWL/RDF)
• Define domain types for objects passed around and a set of dimensions with which service capabilities can be defined using processor abstraction
• Bootstrapping descriptions
• Mining and maintaining descriptions
• The Expert Annotator
• GRIMOIRE / WebDAV directory
• Tie into BioMOBY central
• http://phoebus.cs.man.ac.uk:8100/feta-beta/mygrid/descriptions/
Phillip Lord, Pinar Alper, Chris Wroe, and Carole Goble Feta: A light-weight architecture for user oriented semantic service discovery in Proc of 2nd European Semantic Web Conference, Crete, June 2005
VBI Web Services Workshop 26-27 May 2005
Web Interface
Processor
API
Processor
API
Generic Schema for Service (part of Information model) Specific
ApplicationOntologye.g. caCORE
Semantic Web ServicesLayered model
Wroe C, Goble CA, Greenwood M, Lord P, Miles S, Papay J, Payne T, Moreau L Automating Experiments Using Semantic Data on a Bioinformatics Grid in IEEE Intelligent Systems Jan/Feb 2004
We don’t describe WSDL, we describe operations and processors
We are classifying for people not machines, so don’t be too clever!
VBI Web Services Workshop 26-27 May 2005
Operation
name, descriptiontaskmethodresourceapplication
Service
namedescriptionauthororganisation
Parameter
name, descriptionsemantic typeformattransport typecollection typecollection format
WSDL based Web service
WSDL basedoperation
Soaplab servicebioMoby serviceworkflow
hasInput
hasOutput
Local Java code
subclasssubclass
VBI Web Services Workshop 26-27 May 2005
Service hassles• The workflow are only as
good as the services they link together.
• Licensing models • Instability and unreliability• BioNanny + QoS registry
description• Configurable fault tolerance
and fail over strategies for graceful failure
• Few alternates and genuine replica services
VBI Web Services Workshop 26-27 May 2005
Type management: Shims
Sequencei.e. last known 3000bp
Mask BLASTIdentify new sequences and determine their degree of identity
Sequence database entryFasta format sequenceGenbank format sequence
Alignment of full query sequence V full ‘new’ sequence
Old BLAST result
Simplify and Compare
Lister
Retrieve
BLAST2‘I want to identify new sequences which overlap with my query sequence and determine if they are useful’
• The fiddly bits necessitated by not having a common type system or object model, or building elaborate wrappers
• Adding functionality to Web Services
• Shim libraries; Automatic deployment at workflow assembly
• Beanshell scripts for quick and dirty scripting
VBI Web Services Workshop 26-27 May 2005
• Put the workflow together to duplicate how they did the linking without duplicating how they did the on-the-fly integration
• Post hoc analysis. Don’t analyse data piece by piece receive all data all at once
• Service interoperability but fragmented results
• Because integration needs smarter workflows and smart thinking about data types.
• Close the world with Shims or services and build domain objects.
• Smarter ways of visualising and linking intermediate results using provenance graphs
• Custom visualisation application
Provenance Record
Result Result Result Result Result
Input
Workflow Practices
VBI Web Services Workshop 26-27 May 2005
Gene annotation pipeline workflow Integration and visualisation of GD annotation workflow results
Provenance Record
Custom Data Model
Input
Result
Integrated results
VBI Web Services Workshop 26-27 May 2005
Integration and interoperation
e-Science Semantics
Configuration
Invocation model
Interface
Data format
Domain Semantics
e-Science Semantics
Configuration
Invocation model
Interface
Data format
Domain Semantics
Syntax Syntax
Provenance AnnotationService & Data
AnnotationApp & Shim Services
Information Model
• Information model is a container for domain semantics• Linking stuff together is Integration Lite
Data identity Data Identity
Ontologies
Custom Data Objects
Ontologies
Custom Data Objects
LSID
WorkflowsProcessors
Shims
Shims
VBI Web Services Workshop 26-27 May 2005
Take Homes• Our apps are providing real scientific results – or at least
the hypotheses…• The problem is not really gathering and coordinating
services, but gathering and coordinating the results• Are you interoperating or integrating• Careful thought has to go into the abstractions we apply to
services for finding them and running them• Activation energy vs reusability of service: ROI and
altruism• We need more services, more replicas of services, better
service interfaces and better reliability and stability• Most of our services turn out not to be vanilla WSDL• Light touch vs added value
VBI Web Services Workshop 26-27 May 2005
Performing In silico Experiments in a Service Based Architecture: Solutions and Issues
Chris Wroe, Phillip Lord, Robert Stevens & Carole Goble
The University of Manchester, UK
http://www.mygrid.org.uk