Date post: | 18-Jan-2018 |
Category: |
Documents |
Upload: | miles-tucker |
View: | 224 times |
Download: | 0 times |
Portals and myGrid
Stefan Rennick EgglestoneMixed Reality LaboratoryUniversity of Nottingham
Introduction to myGrid
• a computer science pilot project working in the field of bioinformatics
• a consortium of the European Bioinformatics Institute, IT Innovations, 5 universities and some industrial partners
• ends June 2005 and other projects will develope infrastructure further
Presentation aims
• Introduce myGrid• Introduce bioinformatics• Introduce portal work in myGrid• Show some screenshots of portlets
Introduction to bioinformatics
• how to store, process and publish large volumes of biological data
• large databases, access and analysis services
• composite processes involve multiple databases and services
• Automation through workflows
Data in bioinformatics
• Commonly genetic sequences– DNA: GCGCATAGCGATGA– Protein: MAHPLGPHGVANA
• Meta information– Species, chromosome– Interesting features– Equipment used– First published paper referring to sequence
Data storage
• 3 international databases aim to store all DNA sequences (EMBL, GenBank, DDBJ)
• Protein sequences in SwissProt• Journals require submission before
publication• Smaller databases hold specialist
information
Using bioinformatics data
• Database access services– Fetch sequence for given ID– Fetch similar sequences
• Sequence analysis– Look for interesting regions of sequence
• Sequence prediction– Predict proteins generated by DNA sequence
Service interface types
• Web-page• Command-line tool set• Programming language library client• SOAP web-service with WSDL interface
Using services
• Often need to combine services with different interface types
• Cut-and-paste from web-page to file and run command-line tool
• Repetitive and time-consuming• Can be automated using scripts
Workflows
myGrid workflow technology
• Freefluo workflow enactor• Taverna – graphical workbench allowing
users to – Author workflows– Enact and browse results
• myGrid Information Repository
Authoring a workflow
Enacting a workflow
Browsing results
Including services in workflows
• Service invocation done by processor• Generic processor for SOAP/WSDL web-
services• Custom processor can wrap custom client• SOAPlab exposes command-line tools as
web-service
Portal in myGrid
• Taverna/Freefluo is production workflow system, so interface can’t be hacked around with
• Some interface limitiations– Difficult to start new workflow running using
results of enactment– Complex interface, so takes time to master
Text services work
• If enactment of a workflow produces a SwissProt protein sequence record, can extract from this PubMed ID of first paper referring to this protein
• Add extra workflow stages which look up related papers
• Might like to re-run these stages as a separate workflow on any new papers found
Input form
Monitoring progress
Results
MIR portal work
• Taverna/Freefluo/MIR interface caters for expert user
• Large numbers of users who won’t write workflows but might enact them
• Provide a simpler workflow enactment interface
• Portal useful – all biologists have browser on their desk
Collections of workflows
View workflow
View workflow results
View individual output param
Further details
• www.mygrid.org.uk• Twiki.mygrid.org.uk• Stefan Rennick Egglestone (
[email protected]• Ian Roberts ([email protected])• Presentation and notes will be at
www.mrl.nott.ac.uk/~sre