Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.

Post on 18-Jan-2018

224 views 0 download

description

Presentation aims Introduce my Grid Introduce bioinformatics Introduce portal work in my Grid Show some screenshots of portlets

transcript

Portals and myGrid

Stefan Rennick EgglestoneMixed Reality LaboratoryUniversity of Nottingham

Introduction to myGrid

• a computer science pilot project working in the field of bioinformatics

• a consortium of the European Bioinformatics Institute, IT Innovations, 5 universities and some industrial partners

• ends June 2005 and other projects will develope infrastructure further

Presentation aims

• Introduce myGrid• Introduce bioinformatics• Introduce portal work in myGrid• Show some screenshots of portlets

Introduction to bioinformatics

• how to store, process and publish large volumes of biological data

• large databases, access and analysis services

• composite processes involve multiple databases and services

• Automation through workflows

Data in bioinformatics

• Commonly genetic sequences– DNA: GCGCATAGCGATGA– Protein: MAHPLGPHGVANA

• Meta information– Species, chromosome– Interesting features– Equipment used– First published paper referring to sequence

Data storage

• 3 international databases aim to store all DNA sequences (EMBL, GenBank, DDBJ)

• Protein sequences in SwissProt• Journals require submission before

publication• Smaller databases hold specialist

information

Using bioinformatics data

• Database access services– Fetch sequence for given ID– Fetch similar sequences

• Sequence analysis– Look for interesting regions of sequence

• Sequence prediction– Predict proteins generated by DNA sequence

Service interface types

• Web-page• Command-line tool set• Programming language library client• SOAP web-service with WSDL interface

Using services

• Often need to combine services with different interface types

• Cut-and-paste from web-page to file and run command-line tool

• Repetitive and time-consuming• Can be automated using scripts

Workflows

myGrid workflow technology

• Freefluo workflow enactor• Taverna – graphical workbench allowing

users to – Author workflows– Enact and browse results

• myGrid Information Repository

Authoring a workflow

Enacting a workflow

Browsing results

Including services in workflows

• Service invocation done by processor• Generic processor for SOAP/WSDL web-

services• Custom processor can wrap custom client• SOAPlab exposes command-line tools as

web-service

Portal in myGrid

• Taverna/Freefluo is production workflow system, so interface can’t be hacked around with

• Some interface limitiations– Difficult to start new workflow running using

results of enactment– Complex interface, so takes time to master

Text services work

• If enactment of a workflow produces a SwissProt protein sequence record, can extract from this PubMed ID of first paper referring to this protein

• Add extra workflow stages which look up related papers

• Might like to re-run these stages as a separate workflow on any new papers found

Input form

Monitoring progress

Results

MIR portal work

• Taverna/Freefluo/MIR interface caters for expert user

• Large numbers of users who won’t write workflows but might enact them

• Provide a simpler workflow enactment interface

• Portal useful – all biologists have browser on their desk

Collections of workflows

View workflow

View workflow results

View individual output param

Further details

• www.mygrid.org.uk• Twiki.mygrid.org.uk• Stefan Rennick Egglestone (

sre@cs.nott.ac.uk0• Ian Roberts (i.roberts@dcs.sheff.ac.uk)• Presentation and notes will be at

www.mrl.nott.ac.uk/~sre