+ All Categories
Home > Documents > Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI)...

Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI)...

Date post: 16-Dec-2015
Category:
Upload: gaige-canby
View: 228 times
Download: 2 times
Share this document with a friend
Popular Tags:
27
Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic Services
Transcript
Page 1: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Mark Wilkinson UBC (Lead PI)Michel Dumontier Carleton (Co-PI)

Christopher J. O. Baker UNBSJ (Co-PI)

C-BRASSCanadian Bioinformatics Resources as

Semantic Services

Page 2: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Mandate

• Expose Canadian bioinformatics Web resources in a unified and automatable manner using Semantic Web Services framework.

• Bioinformatics data and tools will be easier to discover and utilize, and integrate to hasten discovery.

• First widespread deployment of a grid-framework where the messages are “meaningful” to the machine, and can be interpreted/re-interpreted under a wide range of scenarios.

Page 3: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Goals

• Utilize novel SWS technologies to expose Canadian informatics resources on the emergent Semantic Web

• Create toolkits for semantically “lifting” legacy resources into a SWS framework

• Create prototype applications demonstrating a variety of ways of constructing, utilizing, visualizing, and interpreting the services, analytical pipelines, and resulting semantically-enriched datasets.

Page 4: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Web Service Adoption

The low uptake of modern Web integration frameworks by the bioinformatics community stems from two primary facets:

• Challenges in implementing these solutions

• A gap between the abilities of existing technologies and the needs and skills of the target end-user.

Page 5: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

SOAP

• Simple Object Access Protocol (SOAP) messaging only successful within well-defined, often project-specific situations.

• Lack of Semantics" in the Web Service interface descriptions which precludes the automated discovery of appropriate services, and automated pipelining of data between those services.

Page 6: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Semantic Web Service (SWS)

• Achieved modest level of automated interoperability due to limitations in the way the semantics of Web Services are modeled:

• SWS frameworks are implemented to support legacy data representation frameworks, in particular XML and XML Schema.

• SWS have annotated XML Schema components describing services based on "meaning" of various input and output fields.

Page 7: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Semantic Web Services (SWS)

• Automating workflow construction and semantically validating the "sensibility" of the connections between services (often referred-to as Schema-mapping)

• XML Schema is semantically opaque, Applying semantics to it through annotation is extremely limited; – semantically-annotated XML tag can have only

one interpretation

Page 8: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

SWS Frameworks describe:• Input and output data-structures • Operations of a Web Service. • BioMoby Service Type ontology– a vocabulary describing analytical operations.

• OWL-S and WSMO/WSML Process Model– Before and After – Transformations during that state-change.

• Single-term semantics - too simplistic• Process Models too complex, - No adoption

Page 9: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

In transition

• Data on the Semantic Web is encoded in RDF, while data in most Web Service frameworks is encoded in XML

• From XML/Schema-based to OWL/RDF-based data representation

• SAWSDL W3C Rec in 2008– inputs and outputs of Web Services can be

described in terms of ontological models.

Page 10: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

User Communities (I)

• End-user community does not usually have a "process model" or "business model" in-mind when searching for a Service.

• Biologists execute a BLAST alignment • NOT because they wish to run a sequence similarity

matrix over their input data; • BUT because they are interested in finding sequences

that are related to their input sequence by homology. • Key is the relationships between the input and output

data.

Page 11: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Bioinformatics Community

Needs:• New metadata, i.e.

Bioinformatics Web Service annotations that describes the biological properties between input and output that are generated by that Web Service.

Page 12: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

• SADI facilitates novel data discovery, interoperability, and integrative behaviours that closely mirror the needs and expectations of our end-user community simply by indexing services based on this predicate.

• Semantic Web data vs data derived from Web Service.

Page 13: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

• SADI simply comprises a set of standards-compliant conventions and suggested best-practices for data representation and exchange between Web Services that fully utilizes Semantic Web technologies.

• SADI mandates the inclusion of a single required annotation in the Web Service metadata that describes the biological relationship ("predicate") that is created between the input and output data of that Service

Page 14: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

SADI Web Service Discovery

Page 15: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

hasProteinSequence

Predicate-based web service invocation. Using the hasProteinSequence predicate in a query automatically invokes a web service capable of obtaining the amino acid sequence for UniProt entry P04637.

Page 16: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

SADI: Standards-compliant recommendations for implementation

• SADI consists of several bioinformatics services • SADI Services are stateless and atomic. • SADI Services consume and provide data via HTTP, POST and GET. • SADI Services consume and produce data in RDF format. • SADI Service interfaces are defined in terms of OWL-DL classes;

– the property restrictions on these OWL classes define what specific data elements are required by the Service and what data will be provided by the Service, respectively.

• Input RDF data – data is compliant / classifies into Input OWL Class - is "decorated" or

"annotated" by the service provider to include new properties reflecting activities performed by the Web Service.

• Output RDF data – is an instance of the OWL Class that defines the output of the service.

Page 17: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

SADI RegistryPredicate Map

Page 18: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.
Page 19: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

What can it do ?

• SADI provides the functionality to automatically and dynamically discover, access, and integrate relevant data from distributed, non-uniform data-sources using disparate ontologies. Key promises of the Semantic Web !

• SHARE implementation allows users to query over data that might not exist at the time they pose their query. A query-specific database is dynamically generated as a query is being processed; effectively, the database required to answer the question is automatically generated as a result of the question being posed.

Page 20: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Find Gene Ontology terms (biological process, cellular component, and molecular function

annotations) for proteins associated with Parkinson's disease:

PREFIX pred: <http://es-01.chibi.ubc.ca/~benv/predicates.owl#>PREFIX ont: <http://ontology.dumontierlab.com/>PREFIX keyword: <http://biordf.net/moby/Global_Keyword/>

SELECT ?term ?nameWHERE { ?protein ont:hasTag keyword:parkinson . ?protein pred:hasGOTerm ?term . ?term pred:hasTermName ?name}

Page 21: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

SHARE connects SADI middleware to Pellet SPARQL query engine and DL Reasoner.

Semantic Health And Research Environment (SHARE) prototype.

Page 22: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

SADI Toolkit "RDFizing“• Virtuoso Sponger: • Bio2RDF:

Native Service Provision and "Wrapping" legacy CGI and WSDL

• Seahawk: • Dashboard:

Core SADI Service Codebase • SADI::Service::Core: • jSADI:

Quality of Service Testing• myGrid/Moby unit-Test and

the Testing Agent:

Ontology Development Tools• Protege 4 and Top Braid

Composer:

Client Applications • Taverna: • SHARE: • IO Informatics Sentient

Knowledge Explorer plug-in:

Page 23: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

SADI Training Course CurriculumTarget Audience - The target audience for the training sessions includes primary or secondary data / service providers as well as the full spectrum of bioinformatics students and professionals from academia and industry.

• Syntactic Web vs. Semantic Web:

• Interoperability: • Knowledge reprsentation

Standards: • RDF 101 - • OWL 101 - • Ontology Editors and Ontology

Design: • Inference and Reasoning: • Reasoning Engines: • Web Service Description

Languages

• Web Service Registries and Service Discovery:

• Service Ontologies: • Workflow composition: • SAWSDL: • MyGrid: • SADI 101• Bioinformatics Web Service

Requirements: • SADI Enabled services: • SADI toolkit:

Page 24: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Action Plan

• Tier 1 involves active, hands-on migration of native resources to a Semantically-enabled Service.

• Tier 2 involves “wrapping” resources from non-participating providers via Services hosted on C-BRASS servers.

• Tier 3 involves on-site training in Semantic Web Service technologies, and support for their self-directed resource migration.

Page 25: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Success Criteria

• Number of Services created/migrated, and their use by consumers worldwide; (Minimum 400 in Canada)

• Number of software tools created, and their use by third-parties;

• Number of Canadian HQP trained in construction of Semantic Web Services.

Page 26: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Deliverables

• A fully-documented definition of the SADI Semantic Web Service framework, including submission of this to an appropriate standards body (e.g. OASIS or OMG)

• A set of core ontologies describing properties and relationships for entities in the biomedical domain

• A costing-model, for use by future Semantic Web Service providers, outlining the establishment and maintenance costs for the migration from legacy Web or Web Service resources to a Semantic Web Service framework.

Page 27: Mark Wilkinson UBC (Lead PI) Michel Dumontier Carleton (Co-PI) Christopher J. O. Baker UNBSJ (Co-PI) C-BRASS Canadian Bioinformatics Resources as Semantic.

Mark Wilkinson UBC (Lead PI)Michel Dumontier Carleton (Co-PI)

Christopher J. O. Baker UNBSJ (Co-PI)

C-BRASSCanadian Bioinformatics Resources as

Semantic Services


Recommended