Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble,...

Post on 19-Jan-2016

216 views 0 download

Tags:

transcript

Using DAML+OIL Ontologies for Service Discovery in myGrid

Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts,

Mark Greenwoodhttp://www.mygrid.org.ukUK eScience Programme All Hands

meetingSheffield, UK 2-4th September 2002

Context

• myGrid: Personalised extensible environments for data-intensive in silico experiments in biology

• Higher level services: workflow, databases, knowledge management, provenance…

• Bioinformatics services are published as Web services (and soon Grid Services)

• http://www.ebi.ac.uk/collab/mygrid/service0/axis/index.html

Fetch

WF

Similarsequences

Structuremodelling

Fetch

View

RASMOL

Protein name

An in silico experiment as a workflow

Service Discovery• Find appropriate type of services

– sequence alignment

• Find appropriate instances of that service– BLAST (an algorithm for sequence alignment), as

delivered by NCBI

• Assist in forming an appropriate assembly of discovered services.

• Find, select and execute instances of services while the workflow is being enacted.

Knowledge in the head of expert bioinformatian

Metadata

• Metadata – computationally accessible data about the services

• Ontologies – the shared and common understanding of a domain– A vocabulary of terms– Definition of what those terms mean.– A shared understanding for people and

machines

Metadata Classification

• Domain metadata– the domain coverage of the service, or its

function. – BLASTn is a tool for computing sequence

homology that uses the BLAST algorithm over nucleotides;

• Business metadata – data quality, quality of service, cost,

geographical location, authorisation, provenance of data and so on.

– BLASTn service offered by the NCBI is 80% reliable.

Four tiered service descriptions

1. Class of service: • a protein sequence alignment, a protein sequence

database. 2. Specific example of an abstract service:

• BLAST, SWISS-PROT.

3. Instance service description of a specific service: • BLAST, SWISS-PROT as offered by the EBI.

4. Invoked instance service description: • BLAST as offered by the EBI on a particular date, with

particular parameters when a service was actually enacted.

Domain “semantic”

Business “operational”

DAML+OIL/OWL

• DAML+ OIL designed to describe ontologies• Ontologies incorporate information about

classes, properties, and individuals, each of which can have an ID which is URI reference.

• Equivalent to the expressive Description Logic SHIQ

• Automated reasoning for inferring classification lattice and checking concepts are consistent

• OWL Web Ontology Language 1.0 Reference• W3C Working Draft 29 July 2002

• http://www.w3.org/TR/owl-ref/

Ontology editing: OilEd

http://oiled.man.ac.uk/

• Consistency — check if knowledge is meaningful

• Subsumption — structure knowledge, compute classification

• Equivalence — check if two classes denote same set of instances

• Instantiation — check if individual i instance of class C

• Retrieval — retrieve set of individuals that instantiate C

Reasoning in DAML+OIL

class-def defined BLAST-n_service_operation subclass-of atomic_service_operation has_Class performs_task (aligning has_Class has_feature local has_Class has_feature pairwise) has_Class produces_result (report has_Class is_report_of sequence_alignment) has_Class uses_resource (database has_Class contains (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule))) has_Class requires_input (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule)) has_Class is_function_of (BLAST_application)

class-def defined pairwise_sequence_alignment_service subclass-of atomic_service_operation has_Class performs_task (aligning has_Class has_feature local has_Class has_feature pairwise) has_Class produces_result (report has_Class is_report_of sequence_alignment) has_Class uses_resource (database has_Class contains (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule))) has_Class requires_input (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule)) has_Class is_function_of (BLAST_application)

Multiple Roles• Services

organised, queried and matched using subsumption reasoning.

• Descriptions controlled by concept satisfiability reasoning.

ServiceDescription

Classification Constraints

requirescontrolsorganises

controls

drives

uses

DAML+OIL ontologies

• Service classifications; • A vocabulary for expressing service

descriptions• A reasoning process to manage:

– coherency of the classifications and the descriptions when they are created,

– the service discovery, matching and composition when they are deployed.

Bioinformatics ontology

Web serviceontology

Task ontology

Publishing ontology

Informatics ontology

Molecularbiology ontology

Organisationontology

Upper levelontology

Specialises. All concepts are subclassed from those in the more general ontology.

Contributes concepts to form definitions.

Suite

Bioinformatics ontology

Web serviceontology

Task ontology

Publishing ontology

Informatics ontology

Molecularbiology ontology

Organisationontology

Upper levelontology

Specialises. All concepts are subclassed from those in the more general ontology.

Contributes concepts to form definitions.

Suite

parameters: input, output, precondition, effectperforms_taskuses-resourceis_function_of

Suite’s Coverage Ontology NO of Classes

(primitive/ defined)

NO of Properties

Size of Vocabulary used to form concept descriptions

Individuals

Biology 112 (66/46) 22 - -

Publishing 6(6/0) - - -

Service 117(1/115) 8 124 -

Informatics 96 (48/48) 7 - -

Bioinformatics 75 (31/44) 9 - -

Upper level ontology

50(40/10) 7 - -

Organisation 1 (1/0) 0 - 8

PersonalRepository

(Meta Data)Ontology

Server

WorkflowRepository

(Meta Data)Service Type

Directory

RepositoryClient

OntologyClient

WorkflowClient

Portal

Workflowenactment

Bioinformatics services

Service instancedirectory

DAML+OIL Reasoner

(FaCT)

Matcher and

Ranker

Client framework myGrid.version0

1. User selects values from a drop down list to create a property based description of their required service. Values are constrained to provide only sensible alternatives.

2. Once the user has entered a partial description they submit it for matching. The results are displayed below.

3. The user adds the operation to the growing workflow.

4. The workflow specification is complete and ready to match against those in the workflow repository.

Ontology grounds out

• Link ontology to WSDL and UDDI

types

messages

portType operation

binding

service

XML Schema businessEntity

businessService

bindingTemplate

tModel

WSDL

UDDI

Other uses of ontology

• Labelling data items in databases– Semantic typing for controlling inputs and

outputs– Use by distributed query processing

• Linking & browsing XML-based myGrid information components– COHSE

• Work to link with the Life Science Identifier (I3C)

• Generate BioMOBY Central service classification

Summary• Description-based ontology approach

rich, flexible but a paradigm shift• Simple interfaces for publishing and

localised extensions• Need other means of finding services –

part of a solution not the whole solution.• Ontology tools essential

– OilEd, FaCT reasoner, Ontology server

Downloads

• All tools & ontology available from: http://www.mygrid.org.uk

• Forthcoming publication: A suite of DAML+OIL Ontologies to

Describe Bioinformatics Web Services and Data Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood

To appear in International Journal of Cooperative Information Systems.