Architectural issues in a bioinformatics Gridhttp://www.mygrid.org.uk
Luc Moreau, University of Southampton, UK
myGrid
Overview
Bioinformatics background myGrid facts Service oriented architecture Architectural issues
Notification service Grid component model Service directory
Conclusions
Bioinformatics & Genomics
Large amounts of data
Highly heterogeneous Data types Data forms Community
Highly complex and inter-related
Volatile
Bioinformatics Data
Descriptive as well as numeric
Literature Analogy/
knowledge-based
Text Extractio
n
Bioinformatics Analysis
Different algorithms BLAST, FASTA, pSW
Different implementations WU-BLAST,
NCBI-BLAST Different service
providers NCBI, EBI, DDBJ
The Human Genome Project
The HGP will make available potentially thousands of targets for Understanding
biology & genetics Drug discovery Diagnostics
Many genes will be linked with diseases
Cancer HIV Parkinson’s Asthma Malaria Autoimmune (arthritis) Cardiovascular Antibacterial & antifungal
Drug Discovery
In silico experimentation
Discovery of resources and tools, staging of operations, sharing of results
Process is as important as outcome
Science is dynamic – change happens
Scientific discovery is personal & global
Provenance and history
Overview
Bioinformatics background myGrid facts Service oriented architecture Architectural issues
Notification service Grid component model Service directory
Conclusion
myGrid
EPSRC funded pilot project Generic middleware within application
setting 36 month in 42 month performance period Start 1st October
16 full-time post docs altogether 6 DTA studenships 1 technical project manager 1 system manager 1 secretarial post
myGrid consortium
Scientific Team Biologists and Bioinformaticians GSK, AZ, Merck KGaA, Manchester, EBI
Technical Team Manchester, Southampton, Newcastle,
Sheffield, EBI, Nottingham IBM, SUN GeneticXchange Network Inference, Epistemics Ltd
myGrid outcomes
e-Scientists Bioinformatics demonstrator (on cold carp)
Developers myGrid-in-a-Box developers kit
Integrating some existing bioinformatics tools with myGrid
Overview
Bioinformatics background myGrid facts Service oriented architecture Architectural issues
Notification service Grid component model Service directory
Conclusions
Overview
Bioinformatics background myGrid facts Service oriented architecture Architectural issues
Notification service Grid component model Service directory
Conclusions
Architectural Issues
Architectural Issues
Notification service
Vision
Asynchronous delivery and persistence of messages Topics can be created and discovered on the fly Subscribers can subscribe to topics, publishers can
publish messages on a given topic Peer to peer network of notification services Topology can be re-organized to enhance reliability Subscribers and publishers can negotiate over QoS
A notification service instance
Sub
scrib
er
Subscriberdelegator
publisherdelegator
QoS
notifications
Sub
scrib
er s
tub
Pub
lishe
r
Pub
lishe
r st
ub
Federated notification services
Strong communication links between hubs
Efficient data replication Simple notification routing
Hub-1
NS-1-1
S-1-1-1
P-1-1-1 P-1-3-2
P-1-1-2 NS-1-2
NS-1-3
Hub-2
P-2-2-1
NS-2-1
NS-2-2P-2-2-2
S-2-1-1
Hub-3
P-3-2-1
NS-3-1
NS-3-2
P-3-1-1
S-3-1-1
P-1-3-1
QoS Negotiation Protocol
Current status
Push and pull messaging Topic,message and publisher filter WSDL interface Workflow interaction Integration with mySQL, openJMS, tomcat and
Axis
Federated service (undergraduate project) QoS negotiation (PhD work underway) OGSA compliance
Experimentation
Windows and Unix platforms with Tomcat 4.0.5, Axis beta 3.0, OpenJMS 0.7.2 and mySQL 3.23.51
Aggregation test with 500 topics, 2,000 subscribers, 2,000 publishers and 10,000 registered subscriptions, 10,000 notifications
72 hours non-stop subscribing/publishing with the above populations
Architectural Issues
Notification service
Architectural Issues
Notification service Grid component model
Grid Component Model
The myGrid framework is a component model for flexible, simple and future-proof deployment and use of services on the Grid.
Problems Addressed
For service developers and deployers: Ease of development of sophisticated
services by separation of concerns and re-use of third party functionality.
Consistent distribution of functionality over a set of services, e.g. access control, support for fault-tolerance.
Application of solutions to the above to services deployed using technologies such as OGSA Grid Services, Web Services and Enterprise JavaBeans.
Problems Addressed
For service clients: Development of service clients that are not
limited by the range of standards known at deployment time.
Control over how service operations are invoked, so that they can make use of the most suitable protocols supported by a service.
Provision of a standard client interface hiding the differences in deployment philosophy that each middleware technology brings.
Application of solutions to the above to services deployed using technologies such as OGSA Grid Services, Web Services and Enterprise JavaBeans.
Nested Component Model
Framework
Current Status
Startpoints for Web Services Deployment within nested
containers Facades for exposing EJBs as Web
Services Performance tests
Current Implementation
Current Work
Automated deployment in nested containers
Definition of containers for deployment-time configuration
Using containers to provide minimal functionality of OGSA Grid Services
Startpoints for EJBs, Grid Services
Experimentation
Our experiments have shown that nesting in our containers is not costly compared to method invocation and nested inner classes
The cost of calling EJBs via the Web Service façade comes mostly from the use of SOAP, and the consequential requirement for conversion to/from objects
Architectural Issues
Notification service Grid component model
Architectural Issues
Notification service Grid component model Service directory
Service Directory Views
Multiple service directories will co-exist (IBM, Microsoft, EBI, local institutions)
Need to attach metadata to service directory entries
Metadata is personal to the scientist: trust, perceived QoS, ontological description
Need for a mechanism to allow scientists to add their metadata and to make it available to other users as a “regular service directory”.
Views: status
Currently in design phase Use cases in the process of being
finalized Preliminary specification of interfaces More work is needed on policy languages
Design to be finalized by end of January
First prototype of core functionality 4 months later
Overview
Bioinformatics background myGrid facts Service oriented architecture Architectural issues
Notification service Grid component model Service directory
Conclusions
Conclusions
More architectural issues being addressed Security (GSI, RBAC), but where is the
community going? Fault tolerance
Workflow enactment WSFL compatible enactment engine Support for fault tolerance,
checkpointing, migration Editor
Conclusions
4 months development cycle with “integration fest”
Our roadmap is based on a layered organisation of functionality
myGrid in Southampton
Luc Moreau, Michael Luck, David DeRoure
Terry Payne, Keith Decker Simon Miles, Juri Papay, John
Dickman, Xiaojian Liu, Claudia di Napoli, Vijay Dialani, Richard Lawley
www.mygrid.org.uk
m