+ All Categories
Home > Documents > 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from...

1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from...

Date post: 20-Jan-2016
Category:
Upload: fay-freeman
View: 217 times
Download: 0 times
Share this document with a friend
Popular Tags:
31
1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole Goble and the rest of the my Grid team.
Transcript
Page 1: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

1

A myGrid Project Tutorial

Dr Mark Greenwood

University of Manchester

With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole Goble and the rest of the myGrid team.

Page 2: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

2

• Open Source Upper Middleware for Bioinformatics

• (Web) Service-based architecture• Targeted at Tool Developers,

Bioinformaticians and Service Providers

Newcastle

NottinghamManchester

Southampton

Hinxton

Sheffield

Page 3: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

3

myGrid PeopleCore• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis,

Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.

Users• Simon Pearce and Claire Jennings, Institute of Human Genetics School of

Clinical Medical Sciences, University of Newcastle, UK• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital,

Manchester, UKPostgraduates• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman,

Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair HampshireIndustrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)• Robin McEntire (GSK)Collaborators• Keith Decker

Page 4: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

4

Roadmap - start

services

data

Page 5: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

6

Tenet I• High level Middleware

services for data intensive resource interoperation for Bioinformatics– Information Grid not

computational Grid• Exploratory, ad hoc • For individuals• In silico experiment as

workflow• Distributed query processing• Information Management

Page 6: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

7

Tenet II• High level services for e-Science

experimental management;– Provenance– Event notification– Personalisation

• Sharing knowledge and sharing components– Scientific discovery is personal &

global.– Federated third party registries for

workflows and services– Workflow and service discovery for

reuse and repurposing

Registry

Re

giste

rF

ind

Annotate

Page 7: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

8

Tenet III

• Open Source and Open Services– No control or influence over

service providers

• Open to third party metadata and services

• Open extensible architecture– Assemble your own

components– Designed to work together– Toolkit

Freefluo

WfEE

TavernaViewUDDIregistry

EventNotification

mIR

PedroSemanticDiscovery

Info.Model

Soaplab

Gateway & Portal

LSID

HaystackProvenanceBrowser

Page 8: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

9

Tenet IV• (Web) Service architecture

– Publication, discovery, interoperation, composition, decommissioning of myGrid services

– WS-I -> OGSA / WSRF

• Metadata driven– Ontologies– Common information model– Semantic Web technologies

• RDF, OWL

Page 9: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

10

Tenet V

Middleware for

• Tool Developers • Bioinformaticians • Service Providers• Biologists are indirectly

supported by the portals and apps these develop.

Page 10: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

11

Roadmap

run workflows

services

workflows

data

discover services

data management

workflows

Page 11: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

12

Data-intensive bioinformatics

ID MURA_BACSU STANDARD; PRT; 429 AA.DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASEDE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINEDE ENOLPYRUVYL TRANSFERASE) (EPT).GN MURA OR MURZ.OS BACILLUS SUBTILIS.OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;OC BACILLUS.KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY).FT CONFLICT 374 374 S -> A (IN REF. 3).SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI

Page 12: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

13

Use ScenariosGraves’ Disease• Autoimmune disease of the thyroid • Simon Pearce and Claire Jennings, Institute of

Human Genetics School of Clinical Medical Sciences, University of Newcastle

• Discover all you can about a gene• Annotation pipelines and Gene expression analysis• Services from Japan, Hong Kong, various sites in UK

Williams-Beuren Syndrome• Microdeletion of 155 Mbases on Chromosome 7• Hannah Tipney, May Tassabehji, Andy Brass, St

Mary’s Hospital, Manchester, UK• Characterise an unknown gene• Annotation pipelines and Gene expression analysis

Services from USA, Japan, various sites in UK

Page 13: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

14

Manually filling a genomic gap

Two major steps:• Extend into the gap: Similarity searches; RepeatMasker, BLAST• Characterise the new sequence: NIX, Interpro, etc…

• Numerous web-based services (i.e. BLAST, RepeatMasker)• Cutting and pasting between screens• Large number of steps• Frequently repeated – info now rapidly added to public databases• Don’t always get results• Time consuming• Huge amount of interrelated data is produced – handled in lab book and

files saved to local hard drive• Mundane• Much knowledge remains undocumented• Bioinformatician does the analysis

Page 14: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

15

WBS Workflows:GenBank Accession No

GenBank Entry

Seqret

Nucleotide seq (Fasta)

GenScanCoding sequence

ORFs

prettyseq

restrict

cpgreport

RepeatMasker

ncbiBlastWrapper

sixpack

transeq

6 ORFs

Restriction enzyme map

CpG Island locations and %

Repetative elements

Translation/sequence file. Good for records and publications

Blastn Vs nr, est databases.

Amino Acid translation

epestfind

pepcoil

pepstats

pscan

Identifies PEST seq

Identifies FingerPRINTS

MW, length, charge, pI, etc

Predicts Coiled-coil regions

SignalPTargetPPSORTII

InterProPFAMPrositeSmart

Hydrophobic regions

Predicts cellular location

Identifies functional and structural domains/motifs

Pepwindow?Octanol?

ncbiBlastWrapper

URL inc GB identifier

tblastn Vs nr, est, est_mouse, est_human databases.Blastp Vs nr

RepeatMasker

Query nucleotide sequence ncbiBlastWrapper

Sort for appropriate Sequences only

Pink: Outputs/inputs of a servicePurple: Taylor-made servicesGreen: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns

RepeatMasker

Page 15: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

16

Graves’ Disease Bioinformatics

Annotation PipelineWhat is known about my

candidate gene?

Medline

OMIM

GO

BLAST

EMBL

DQP

Query

Genotype Assay Design System 3D Protein StructureIs this SNP present

in my samples?What is the structure of the protein

product encoded by my candidate gene?

Primer Design

Gene ID

Restriction FragmentLength Polymorphism experiment

SNP SNPSNP

Use primers designed by myGrid to amplify region flanking SNP on the gene

PDB

Query PDB & display proteinstructure

Obtain information about protein& extract information about active site

Swiss-ProtAMBITInterpro

Emboss Eprimer applicationin SoapLab

Selection of restriction enzyme

Talisman

SNP

Emboss Restrictin SoapLab

AMBIT

Determine whether coding SNPaffects the active site of the protein

Peter Li1, Claire Jennings2, Simon Pearce2 and Anil Wipat1, (2003)1School of Computing Science and 2Institute of Human Genetics, University of Newcastle-upon-Tyne.Candidate gene

pool

Page 16: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

17

Experiment life cycle

Discovering and reusing

experiments and resources

Managing lifecycle, provenance and

results of experiments

Sharingservices &

experiments

Personalisation

Forming experiments

Executing and monitoring

experiments

Page 17: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

18

(e-)Scientists…• …Experiment

• Can workflow be used as an experimental method?• How many times has this experiment been run?

• …Analyze• How do we manage the results to draw conclusions from

them?• How reliable are these results?

• …Collaborate• Can we share workflows, results, metadata etc?

• …Publish• Can we link to these workflows and results from our papers?

• …Review• Can I find, comprehend and review your work?• How was that result derived?

Page 18: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

19

Collections of Tasks

Finding

Description ServiceDiscovery

Enactment

BuildingWorkflow

Provenance

StorageData

ManagementQuerying

DomainTasks Service

Providers

Bioinformaticians

Scientists

Annotation providers

Page 19: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

20

Registry

mIR

Discovery View

HaystackProvenance

Browser

FreeFluoEnactor

TavernaWF Builder

PedroAnnotation tool

Ontology Store

Others

WSDLSoap-lab

Interface Description

Annotation/description

Annotation providers

Query &Retrieve Workflow

Execution

Store data/knowledge

Scientists

Bioinformaticians

invoking

Querying/sharing/federating/registering

ServiceProviders

Data descriptions

Vocabulary

Page 20: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

21

Web Service (Grid Service) communication fabricWeb Service (Grid Service) communication fabric

AMBITText Extraction

Service

Provenance

Personalisation

Event Notification

Gateway

Service and WorkflowDiscovery

myGrid Information Repository

Ontology Mgt

Metadata Mgt

Work bench Taverna Talisman

Native Web Services

SoapLab

Web Portal

Legacy apps

Registries

Ontologies

FreeFluo Workflow Enactment Engine

OGSA-DQPDistributed Query Processor

Bio

info

rmat

icia

nsT

ool P

rovi

ders

Ser

vice

Pro

vide

rsA

pplicationsC

ore servicesE

xternal servicesmyGrid Service Stack

Views

Legacy apps

GowLab

Page 21: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

22

Two+ Paths

Core functionality• Services – Soaplab

and Gowlab• Workflow enactment

engine – Freefluo• Workflow workbench

– Taverna• Data integration –

OGSADQP• Information model &

management

Innovative work• Service and workflow

registration• Semantic discovery• Provenance

management• Text mining

In between• Event notification• Gateway

Page 22: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

23

Web Service (Grid Service) communication fabricWeb Service (Grid Service) communication fabric

AMBITText Extraction

Service

Provenance

Personalisation

Event Notification

Gateway

Service and WorkflowDiscovery

myGrid Information Repository

Ontology Mgt

Metadata Mgt

Work bench Taverna Talisman

Native Web Services

SoapLab

Web Portal

Legacy apps

Registries

Ontologies

FreeFluo Workflow Enactment Engine

OGSA-DQPDistributed Query Processor

Bio

info

rmat

icia

nsT

ool P

rovi

ders

Ser

vice

Pro

vide

rsA

pplicationsC

ore servicesE

xternal servicesmyGrid Service Stack

Views

Legacy apps

GowLab

Page 23: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

24

Page 24: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

25

Run the Workflow

Viewing intermediate results

Page 25: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

26

Run the Workflow

Page 26: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

27

Drilling Down: myGrid and Semantics

• Workflow and service discovery – Prior to and during enactment– Semantic registration

• Workflow assembly– Semantic service typing of inputs and outputs

• Provenance of workflows and other entities• Experimental metadata glue• Use of RDF, RDFS, DAML+OIL/OWL

– Instance store, ontology server, reasoner– Materialised vs at point of delivery reasoning.

• myGrid Information Model

Page 27: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

28

Semantic Discovery

View annotations on workflow

Pedro data capture tool

Drag a workflow entry into the explorer pane and the workflow loads.Drag a service/ workflow to the scavenger window for inclusion into the workflow

Page 28: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

29

Tutorial focus

Core functionality• Services – Soaplab

and Gowlab• Workflow enactment

engine – Freefluo• Workflow workbench

– Taverna• Data integration –

OGSADQP• Information model &

management

Innovative work• Service and workflow

registration• Semantic discovery• Provenance

management• Text mining

In between• Event notification• Gateway

Page 29: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

30

Roadmap

LSID authorities

Taverna workbench

Registry1. Describe services

3. Write & run workflows

services

workflows

data

2. Discover services

4. Provenance & datamanagement

workflows

Page 30: 1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.

31

Sessions on Details• Workflows - hands on with Taverna• Semantics• Timetable – split sessions

– Session 1• Group 1 – hands on (Swanson)• Group 2 – semantics (Newhaven)

– Teabreak (short)– Session 2

• Group 1 – semantics (Newhaven)• Group 2 –hands on (Swanson)

– Discussions and Conclusions


Recommended