Using DAML+OIL Ontologies for Service Discovery in myGrid
Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts,
Mark Greenwoodhttp://www.mygrid.org.ukUK eScience Programme All Hands
meetingSheffield, UK 2-4th September 2002
Context
• myGrid: Personalised extensible environments for data-intensive in silico experiments in biology
• Higher level services: workflow, databases, knowledge management, provenance…
• Bioinformatics services are published as Web services (and soon Grid Services)
• http://www.ebi.ac.uk/collab/mygrid/service0/axis/index.html
Fetch
WF
Similarsequences
Structuremodelling
Fetch
View
RASMOL
Protein name
An in silico experiment as a workflow
Service Discovery• Find appropriate type of services
– sequence alignment
• Find appropriate instances of that service– BLAST (an algorithm for sequence alignment), as
delivered by NCBI
• Assist in forming an appropriate assembly of discovered services.
• Find, select and execute instances of services while the workflow is being enacted.
Knowledge in the head of expert bioinformatian
Metadata
• Metadata – computationally accessible data about the services
• Ontologies – the shared and common understanding of a domain– A vocabulary of terms– Definition of what those terms mean.– A shared understanding for people and
machines
Metadata Classification
• Domain metadata– the domain coverage of the service, or its
function. – BLASTn is a tool for computing sequence
homology that uses the BLAST algorithm over nucleotides;
• Business metadata – data quality, quality of service, cost,
geographical location, authorisation, provenance of data and so on.
– BLASTn service offered by the NCBI is 80% reliable.
Four tiered service descriptions
1. Class of service: • a protein sequence alignment, a protein sequence
database. 2. Specific example of an abstract service:
• BLAST, SWISS-PROT.
3. Instance service description of a specific service: • BLAST, SWISS-PROT as offered by the EBI.
4. Invoked instance service description: • BLAST as offered by the EBI on a particular date, with
particular parameters when a service was actually enacted.
Domain “semantic”
Business “operational”
DAML+OIL/OWL
• DAML+ OIL designed to describe ontologies• Ontologies incorporate information about
classes, properties, and individuals, each of which can have an ID which is URI reference.
• Equivalent to the expressive Description Logic SHIQ
• Automated reasoning for inferring classification lattice and checking concepts are consistent
• OWL Web Ontology Language 1.0 Reference• W3C Working Draft 29 July 2002
• http://www.w3.org/TR/owl-ref/
Ontology editing: OilEd
http://oiled.man.ac.uk/
• Consistency — check if knowledge is meaningful
• Subsumption — structure knowledge, compute classification
• Equivalence — check if two classes denote same set of instances
• Instantiation — check if individual i instance of class C
• Retrieval — retrieve set of individuals that instantiate C
Reasoning in DAML+OIL
class-def defined BLAST-n_service_operation subclass-of atomic_service_operation has_Class performs_task (aligning has_Class has_feature local has_Class has_feature pairwise) has_Class produces_result (report has_Class is_report_of sequence_alignment) has_Class uses_resource (database has_Class contains (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule))) has_Class requires_input (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule)) has_Class is_function_of (BLAST_application)
class-def defined pairwise_sequence_alignment_service subclass-of atomic_service_operation has_Class performs_task (aligning has_Class has_feature local has_Class has_feature pairwise) has_Class produces_result (report has_Class is_report_of sequence_alignment) has_Class uses_resource (database has_Class contains (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule))) has_Class requires_input (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule)) has_Class is_function_of (BLAST_application)
Multiple Roles• Services
organised, queried and matched using subsumption reasoning.
• Descriptions controlled by concept satisfiability reasoning.
ServiceDescription
Classification Constraints
requirescontrolsorganises
controls
drives
uses
DAML+OIL ontologies
• Service classifications; • A vocabulary for expressing service
descriptions• A reasoning process to manage:
– coherency of the classifications and the descriptions when they are created,
– the service discovery, matching and composition when they are deployed.
Bioinformatics ontology
Web serviceontology
Task ontology
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Upper levelontology
Specialises. All concepts are subclassed from those in the more general ontology.
Contributes concepts to form definitions.
Suite
Bioinformatics ontology
Web serviceontology
Task ontology
Publishing ontology
Informatics ontology
Molecularbiology ontology
Organisationontology
Upper levelontology
Specialises. All concepts are subclassed from those in the more general ontology.
Contributes concepts to form definitions.
Suite
parameters: input, output, precondition, effectperforms_taskuses-resourceis_function_of
Suite’s Coverage Ontology NO of Classes
(primitive/ defined)
NO of Properties
Size of Vocabulary used to form concept descriptions
Individuals
Biology 112 (66/46) 22 - -
Publishing 6(6/0) - - -
Service 117(1/115) 8 124 -
Informatics 96 (48/48) 7 - -
Bioinformatics 75 (31/44) 9 - -
Upper level ontology
50(40/10) 7 - -
Organisation 1 (1/0) 0 - 8
PersonalRepository
(Meta Data)Ontology
Server
WorkflowRepository
(Meta Data)Service Type
Directory
RepositoryClient
OntologyClient
WorkflowClient
Portal
Workflowenactment
Bioinformatics services
Service instancedirectory
DAML+OIL Reasoner
(FaCT)
Matcher and
Ranker
Client framework myGrid.version0
1. User selects values from a drop down list to create a property based description of their required service. Values are constrained to provide only sensible alternatives.
2. Once the user has entered a partial description they submit it for matching. The results are displayed below.
3. The user adds the operation to the growing workflow.
4. The workflow specification is complete and ready to match against those in the workflow repository.
Ontology grounds out
• Link ontology to WSDL and UDDI
types
messages
portType operation
binding
service
XML Schema businessEntity
businessService
bindingTemplate
tModel
WSDL
UDDI
Other uses of ontology
• Labelling data items in databases– Semantic typing for controlling inputs and
outputs– Use by distributed query processing
• Linking & browsing XML-based myGrid information components– COHSE
• Work to link with the Life Science Identifier (I3C)
• Generate BioMOBY Central service classification
Summary• Description-based ontology approach
rich, flexible but a paradigm shift• Simple interfaces for publishing and
localised extensions• Need other means of finding services –
part of a solution not the whole solution.• Ontology tools essential
– OilEd, FaCT reasoner, Ontology server
Downloads
• All tools & ontology available from: http://www.mygrid.org.uk
• Forthcoming publication: A suite of DAML+OIL Ontologies to
Describe Bioinformatics Web Services and Data Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
To appear in International Journal of Cooperative Information Systems.