Photo taken by flickr/people/mfsarwar

Post on 12-Jan-2016

26 views 0 download

Tags:

description

Interoperability With BioMoby 1.0. It’s Better Than Sharing Your Toothbrush!. Photo taken by http://flickr.com/people/mfsarwar/. A brief history of BioMoby. Model Organism Bring Your own Database Interface Conference, Sept, 2001 (MOBY-DIC) May 21, 2002 – Genome Canada Platform Award - PowerPoint PPT Presentation

transcript

Photo taken by http://flickr.com/people/mfsarwar/

Interoperability With BioMoby 1.0

It’s Better ThanSharing Your Toothbrush!

A brief history of BioMoby• Model Organism Bring Your own Database Interface Conference,

Sept, 2001 (MOBY-DIC)

• May 21, 2002 – Genome Canada Platform Award

• May 25, 2002 – API Version 0.1 deployed, including object ontology serialization into XML

• July 18, 2002 – First Moby Client (Gbrowse Moby)

• June 9, 2003 – API Version 0.5 deployed

• 2006 – Genome Canada Platform Award

• 2007 - Version 1.0 API submitted for publication

MOBY-DIC Chapter VII

7th Model Organism Bring Your-own Database Interface Conference

Vancouver, BC, June 2007.

The Core Ahab’s

WendyRichard

MylahMartin

Eddie

Andreas

Paul

Ivan

Mark’s Screen…

• Create an ontology of bioinformatics data-types• Define a serialization of this ontology (data syntax)• Create an open API over this ontology• Define Web Service inputs and outputs v.v. Ontology• Register Services in an ontology-aware Registry

• Machines can find an appropriate service• Machines can execute that service unattended• Ontology is community-extensible

The BioMoby PlanThe BioMoby Plan

Gene names

MOBYCentral

MOBY hosts & services

SequenceAlignment SequenceExpress. Protein Alleles…

AlignPhylogenyPrimers

Overview of BioMoby Transactions

Overview of BioMoby Transactions

MOBYCentral

SequenceAlignPhylogenyPrimers

Overview of BioMoby Transactions

Overview of BioMoby Transactions

Objectontology

What is a sequence?A sequence is a ___That has these features __

Discovery of servicesThat consume things LIKE sequences!

This is SCUFL – Simple ConceptualUnified Flow Language

It is a complete record of everything you just did, and it can be saved for use in the Taverna workflow application that we will look at later…

Pipeline discovery “on the fly”

• No explicit coordination between providers

• Dynamic discovery of ~appropriate Services

• Automated execution of services

Some BioMoby statistics

Moby: Breadth

• Namespaces (data types): 418• Objects (data syntaxes): >561• Service Types (analytical categories): 112• Providers: ~50 active

• Service Instances: ~1200 currently “alive”– In main Moby Central server in Canada – Others in “boutique” Moby registries serving

specialized communities worldwide

Moby: Clients• Gbrowse_moby (M Wilkinson)

• PlaNet Locus_View (H Schoof, R Ernst)

• Blue-Jay (P Gordon)

• Taverna (T Oinn, M Senger, E Kawas)

• MOWserv (INB, Spain)

• Remora (S Carrere, J Gouzy, INRA)

• MOBYLE (B Néron, P Tufféry, C Letondal, Pasteur Inst.)

• SeaHawk (P Gordon)

BioMoby in detail

• MOBY Data typing system: Semantic Type

• MOBY Data typing system: Syntactic Type

• Moby Registry Queries

BioMoby in detail

• MOBY Data typing system: Semantic Type

• MOBY Data typing system: Syntactic Type

• Moby Registry Queries

Moby Namespaces

• A “Namespace” is a category of identifiers– NCBI has gi numbers (gi Namespace)– GO Terms have accession numbers (GO Namespace)

• Namespaces indicate data’s semantic type.– GO:0003476 a Gene Ontology Term– gi|163483 a GenBank record

• Though we are using the word “Namespace” correctly, it causes confusion!– “Namespace” in XML is tightly associated with an XML

document and/or its syntax– In Moby, we are ONLY talking about data entities NOT

THEIR SYNTAX

BioMoby in detail

• MOBY Data typing system: Semantic Type

• MOBY Data typing system: Syntactic Type

• Moby Registry Queries

BioMoby in detail

• MOBY Data typing system: Semantic Type

• MOBY Data typing system: Syntactic Type

• Moby Registry Queries

The MOBY Object Ontology

• Syntactic types are defined by a GO-like ontology– Class name at each node– Edges define the relationships between Classes– GO used as a model because of its familiarity in the

community

• Edges define one of three relationships– ISA

• Inheritance relationship• All properties of the parent are present in the child

– HASA• Container relationship of ‘exactly 1’

– HAS• Container relationship with ‘1 or more’

The Simplest Moby Data-Type

<Object namespace=‘NCBI_gi’ id=‘111076’/>

Object

The combination of a namespace and an identifier within that namespace uniquely identify a data entity, not its location(s), nor its representation

Moby Primitives

Object

Integer

String

Float

DateTimeISA

ISA

ISA

ISA

<Integer namespace=‘’ id=‘’>38</Integer>

A Derived Data-Type

Object

Integer

VirtualSequence

String

ISA

ISA

ISA

HASA

<Integer namespace=‘’ id=‘’>38</Integer><VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer></ VirtualSequence >

Describes the semanticrelationship between the Integer andthe Virtual Sequence

<VirtualSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer></ VirtualSequence >

<GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”>

ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String></ GenericSequence >

Object

Integer

VirtualSequence

String

ISA

ISA

ISA

HASA

GenericSequence

ISA

HASA

A Derived Data-Type

<GenericSequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”>

ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String></ GenericSequence >

<DNASequence namespace=‘NCBI_gi’ id=‘111076’> <Integer namespace=‘’ id=‘’ articleName=“length”>38</Integer> <String namespace=‘’ id=‘’ articleName=“SequenceString”>

ATGATGATAGATAGAGGGCCCGGCGCGCGCGCGCGC </String></ DNASequence >

Object

Integer

VirtualSequence

String

ISA

ISA

ISA

HASA

GenericSequence

ISA

HASA

DNASequence

ISA

A Derived Data-Type

Legacy file formats

<NCBI_Blast_Report namespace=‘NCBI_gi’ id=‘115325’><String namespace=‘’ id=‘’ articleName=‘content’>

TBLASTN 2.0.4 [Feb-24-1998]

Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A.Sch&auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman(1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.

Query= gi|1401126 (504 letters)

Database: Non-redundant GenBank+EMBL+DDBJ+PDB sequences 336,723 sequences; 677,679,054 total letters

Searchingdone

Score ESequences producing significant alignments: (bits) Value

gb|U49928|HSU49928 Homo sapiens TAK1 binding protein (TAB1) mRNA... 1009 0.0emb|Z36985|PTPP2CMR P.tetraurelia mRNA for protein phosphatase t... 58 4e-07emb|X77116|ATMRABI1 A.thaliana mRNA for ABI1 protein 53 1e-05

</String></NCBI_Blast_Report>

• Containing “String” allows ontological classes to represent legacy data types

Binaries – pictures, movies

<base64_encoded_jpeg namespace=‘TAIR_image’ id=‘3343532’><String namespace=‘’ id=‘’ articleName=‘content’>MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNVMIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3duMQ8wDQYDVQQKEwZUaGF3dGUxHTAbBgNVBAsTFENlcnRpZmljYXRlIFNlcnZpY2VzMSgwJgYDVQQDEx9QZXJzb25hbCBGcmVlbWFpbCBSU0EgMjAwMC44LjMwMB4XDTAyMDkxNTIxMDkwMVoXDTAzMDkxNTIxMDkwMVowQjEfMB0GA1UEAxMWVGhhd3RlIEZyZWVtYWlsIE1lbWJlcjEfMB0GCSqGSIb3DQEJARYQamprM0Bt

</String>

</base64_encoded_jpeg>

• Text-base64 is a Class that contains String

• Binaries are base64 encoded and passed in classes that inherit from text-base64

• base64_encoded_jpeg ISA text/base64 ISA text/plain HASA String

• With legacy data-types defined, we can extend them as we see fit• annotated_jpeg ISA base64_encoded_jpeg• annotated_jpeg HASA 2D_Coordinate_set • annotated_jpeg HASA Description

<annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’>

<2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> <Integer namespace=‘’ id=‘’

articleName=“x_coordinate”>3554</Integer> <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”>663</Integer>

</2D_Coordinate_set>

<String namespace=‘’ id=‘’ articleName=“Description”>This is the phenotype of a ufo-1 mutant under long daylength,

16’C</String><String namespace=‘’ id=‘’ articleName=“content”>MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCC

Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV

</String></annotated_jpeg>

Extending legacy datatypes

The same object…

<annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’>

<2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”> <Integer namespace=‘’ id=‘’ articleName=“x_coordinate”> 3554 </Integer> <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”> 663 </Integer> </2D_Coordinate_set>

<String namespace=‘’ id=‘’ articleName=“Description”>This is the phenotype of a ufo-1 mutant under long daylength, 16’C

</String> <String namespace=‘’ id=‘’ articleName=“content”>

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJQDCCAv4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1UEBhMCWkExFTATBgNV

</String></annotated_jpeg>

annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description

The same object…

<annotated_jpeg namespace=‘TAIR_Image’ id=‘3343532’>

<2D_Coordinate_set namespace=‘’ id=‘’ articleName=“pixelCoordinates”>

<Integer namespace=‘’ id=‘’ articleName=“x_coordinate”> 3554 </Integer> <Integer namespace=‘’ id=‘’ articleName=“y_coordinate”> 663 </Integer> </2D_Coordinate_set> <String namespace=‘’ id=‘’ articleName=“Description”>

This is the phenotype of a ufo-1 mutant under long daylength, 16’C </String> <String namespace=‘’ id=‘’ articleName=“content”>

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3Av4wggJnoAMCAQICAwhH9jANBgkqhkiG9w0BAQQFADCBkjELMAkGA1U

</String></annotated_jpeg>

<CrossReference><Object namespace=“TAIR_Allele” id=“ufo-1”/>

</CrossReference>

<CrossReference> <Object namespace=‘TAIR_Tissue’ id=‘122’/> </CrossReference>

annotated_jpeg ISA base64_encoded_jpeg HASA 2D_Coordinate_set HASA Description

Cross reference types

• Simple– A MOBY Object

• Rich– Takes the form:

– …Incidentally, this avoids the problem of reification that is experienced in RDF

<Xref namespace='' id='' authURI='' serviceName='' evidenceCode='' xrefType=''><Xref namespace='' id='' authURI='' serviceName='' evidenceCode='' xrefType=''> ... Textual Description ...... Textual Description ... </Xref></Xref>

<Object namespace=‘foo' id=‘12345‘/><Object namespace=‘foo' id=‘12345‘/>

XML Schema?

The Object Ontology allows new data-types WITHOUT new flatfile formats, and

without having to understand e.g. XML Schema

Minimize future heterogeneity

Improve interoperability without requiring schema-to-schema mapping

• Object Ontology terms have semantically rich names, but this is primarily for human intuition– DNA Sequence– Annotated_GIF

• Object Ontology does not define the meaning of an object to the machine– No machine-readable semantics

• It does define the representation – SYNTAX

XML Schema?

A portion of the MOBY-SObject Ontology

…community-built!

BioMoby in detail

• MOBY Data typing system: Semantic

Type

• MOBY Data typing system: Syntactic

Type

• Moby Registry Queries

A Moby Central Query

• Give me:

– Services that consume THIS data-type in THIS syntax…

– …do SOMETHING LIKE THIS to it…

– …and provide me THAT data-type in response

Example

• Find me services that – consume FASTA sequence data, – do a BLAST with it, – and provide me lists of GenBank GI numbers in

return.

• Query can be any or all of the above criterion– Also limit by service provider and service

description keyword

Remember!!

Moby Registry Query

INPUT TYPE||

TRANSFORMATION TYPE||

OUTPUT TYPE

A weakness of MOBY

Service discovery is horribly flawed due to insufficiently rich semantics…

Chickens go in;Pies come out!

The problem with Moby

The problem with Moby

What sort o’ pies?

Apple!

The problem with Moby

The MOBY-S Service Ontology

• A simple ISA hierarchy… – too simple!

• Primitive types include:– Analysis– Parsing– Registration– Retrieval– Resolution– Conversion– Rendering

Parse_WU_Blast

A slice of the Service Ontology

Service

Blast

NCBI_Blast

WU_Blast

Parse_NCBI_Blast

Parsing

AlignmentAnalysis

“The Exploding Bicycle”- A. Rector, U Manchester

Summary so far

• BioMoby uses ontologies to describe both data types and data syntaxes– This is where the interoperability comes from– These are used to match consumers with

providers during service discovery

• BioMoby uses a simple ontology to describe bioinformatics operations– This ontology is only marginally useful

Seahawk

• Highlight data in your browser and drag/drop it into Moby

• What could be easier than that?!

Paul MK Gordon and Christoph W Sensen BMC Bioinformatics 2007, 8:208

BMC Bioinformatics, in press

Seahawk: A New Moby Client for Biologists

Drag ‘n’ drop, highlight existing data for use with MOBY ServicesPaul Gordon & Christoph Sensen

Seahawk looks like a browser

How do I load data?

How do I load data?

How do I load data?

• Use the “open” button:– Text file (e.g. FASTA sequences)– HTML page (e.g. NCBI Entrez Web page)– RTF document (e.g. conference abstract)– MOBY XML document

• Drag ‘n’ Drop– Web links and desktop files– Highlighted text from open documents

or Web pages

Under the Hood(Beneath the Bonnet?)

• Data has to be converted into Moby XML format to be used by Moby

• Moby data has to be converted back to human-readable text for presentation to the biologist

Again: How do I load data?

How do I Find Services?• Right-click MOB rules are invoked• Resulting Moby XML is used for service search

How do I run a service?

• Click it!

• If necessary, a service’s extra parameters can be set

• Control+click submits using default params

How do I run a service?

• If required inputs are missing, the missing ones must be dragged into place.

• Unrecognized data will be rejected

How do I collate data?

• Seahawk clipboard lets you build collections of objects

• Seahawk “knows” the type of collection and will suggest appropriate Moby services

Seahawk Summary

• Seahawk integrates Moby Web Service discovery and execution into the biologists day-to-day “Web Surfing” activity

• It uses Regular Expressions and XSLT to move normal web or hard-drive-file data into and out of BioMoby

Why doesn’t MobyUse RDF/OWL?

Timeline of Moby/W3C Activities

2000 2001 2002 2003 2004 20062005

RDF CandidateSpec

RDF SchemaCandidateSpec

W3C Launches SemanticWeb (SW) Activity Group

BioMobyProject Established

BioMoby XMLFinalized

BioMobyStable 0.85 APIPublished(>400 services)

RDF/OWLFormal W3CRecommendations

BioMobyStable 1.0 APIPublished

>>>>>>

Extensive SW toolbuilding…

Moby 2.0Getting it right, the second time!

What BioMoby Already Does

SequenceData

BLAST SERVER

Blast Hit

What BioMoby Already Does

SequenceData

Blast Hit

givesBlastResult

Not “Bologically” Meaningful

What BioMoby Already Does

SequenceData

Blast Hit

hasHomologyTo

URIhasHomologyTo

URI

…looks a lot like…

Which is effectively just an RDF triple,

Now think in reverse…

(in case you forgot…)

Moby Registry Query

INPUT TYPE||

TRANSFORMATION TYPE||

OUTPUT TYPE

Moby 2.0Sequence

DataWhat does Have homology to?

hasHomologyTo

Maps to

BLAST SERVICE

Send data

Blast Hit

Query

FIND SERVICES THAT

Consume Sequence Data||

Provide hasHomologyTo Property||

Attached to other Sequence Data

SPARQL

• A Semantic Web query language

• Queries “look like” graphs

Find “X” with predicate “Y”

attached to “Z”

Moby 2.0 extends the SPARQL query language

• SPARQL queries contain concepts and the relationships between them (subject, predicate, object)

• We simply map RDF predicates onto Moby services capable of generating that relationship

• Registry query: “What Moby service consumes [subject] and generates the [predicate] relationship type?”

But wait, there’s more!

Exploit knowledge in OWL ontologies to enhance query

Subject Predicate Look up and execute Moby serviceConsumes proteins and generatesFunctional annotation info

Subject PredicateLook up and execute Moby serviceConsumes STK or proteins and Looks-up inhibitor molecules

Evaluate Query Expression

Exploit knowledge in OWL ontologies to enhance query

This SPARQL query could be posed on a database of RAW, UNANNOTATEDProtein sequences, and be answered

by Moby 2.0 (a.k.a. CardioSHARE)

Credits

• Genome Canada/Genome Alberta• myGrid – Carole Goble in particular• Spanish National Institute for

Bioinformatics (INB) through Fundación Genoma España

• Generation Challenge Programme (GCP) of the Consultative Group for International Agricultural Research (CGIAR)

• Heart and Stroke Foundation of BC and Yukon (CardioSHARE)

• Microsoft Research (CardioSHARE)