+ All Categories
Home > Documents > Fourth Annual Bio-Ontologies Meetinglemur.cmp.uea.ac.uk/Research/ivis/backup/PhD/Ronan iViS... ·...

Fourth Annual Bio-Ontologies Meetinglemur.cmp.uea.ac.uk/Research/ivis/backup/PhD/Ronan iViS... ·...

Date post: 29-May-2020
Category:
Upload: others
View: 3 times
Download: 0 times
Share this document with a friend
36
Fourth Annual Bio-Ontologies Meeting Fourth Annual Bio-Ontologies Meeting Sharing Experiences and Spreading Best Practice 26 July 2001 this, the fourth annual bio-Ontologies meeting has been sponsored by GlaxoSmithKline Pharmaceuticals, for whose support we are grateful. Organised by Carole Goble and Robert Stevens (co-chairs); Peter Karp, Pat Hayes, Robin McEntire, Richard Chen and Eric Neumann About the Workshop We would like to invite you to the fourth Annual Bio-Ontologies Meeting (Bio-Ontologies 2001), on July 26th in Copenhagen, Denmark. This is immediately after the ISMB-01 July 21-25 in Copenhagen. the programme is now available for the Bio-Ontologies meeting. We have a key-note speaker, Alan Flett from Semantic Edge, who will describe experience of the use of ontology in the business world. We have three invited speakers, who will discuss the roles of ontologies in medical informatics; text analysis and the analysis of gene transcription data. We have ten short talks selected from abstracts submitted to the meeting. An innovation for this year is a panel discussion with ontologists from industry, biology, computer science..... the Bio-Ontologies meeting is co-located with ISMB in Tivoli Gardens. The Bio-Ontologies meeting is in NIMB (room called Lumbysalen). All registrants receive a map of Tivoli where NIMB is clearly shown via the mail. Most registrants who are already attending ISMB will have their Bio-Ontology meeting badges already via their ISMB registration, they need to wear these on the day. Registrants should come through the NIMB entrance on the 26th. If they are only attending Bio-Ontologies they can register here also. The goal of this consortium is the identification and promotion of a practical set of technologies that will aid in the knowledge management and exchange of concepts and representations in the life sciences. The first meeting took place in Montreal in 1998, and made clear the general interest and support people had for ontologies in the life sciences. The following year in Heidelberg we discussed ontology exchange and presented ontologies currently under development. the third meeting, in La Jolla, California, USA, continued this theme, reporting several new ontologies and presenting up-dates on existing ontologies. Many in the group have been active since our last meeting. The community now has considerable http://www.cs.man.ac.uk/~stevensr/workshop01/ (1 of 3) [10/05/2004 14:50:03]
Transcript

Fourth Annual Bio-Ontologies Meeting

Fourth Annual Bio-Ontologies Meeting

Sharing Experiences and Spreading Best Practice

26 July 2001

this, the fourth annual bio-Ontologies meeting has been sponsored by GlaxoSmithKline Pharmaceuticals, for whose support we are grateful.

Organised by Carole Goble and Robert Stevens (co-chairs); Peter Karp, Pat Hayes, Robin McEntire, Richard Chen and Eric Neumann

About the Workshop

We would like to invite you to the fourth Annual Bio-Ontologies Meeting (Bio-Ontologies 2001), on July 26th in Copenhagen, Denmark. This is immediately after the ISMB-01 July 21-25 in Copenhagen.

the programme is now available for the Bio-Ontologies meeting. We have a key-note speaker, Alan Flett from Semantic Edge, who will describe experience of the use of ontology in the business world. We have three invited speakers, who will discuss the roles of ontologies in medical informatics; text analysis and the analysis of gene transcription data. We have ten short talks selected from abstracts submitted to the meeting. An innovation for this year is a panel discussion with ontologists from industry, biology, computer science.....

the Bio-Ontologies meeting is co-located with ISMB in Tivoli Gardens. The Bio-Ontologies meeting is in NIMB (room called Lumbysalen). All registrants receive a map of Tivoli where NIMB is clearly shown via the mail. Most registrants who are already attending ISMB will have their Bio-Ontology meeting badges already via their ISMB registration, they need to wear these on the day. Registrants should come through the NIMB entrance on the 26th. If they are only attending Bio-Ontologies they can register here also.

The goal of this consortium is the identification and promotion of a practical set of technologies that will aid in the knowledge management and exchange of concepts and representations in the life sciences. The first meeting took place in Montreal in 1998, and made clear the general interest and support people had for ontologies in the life sciences. The following year in Heidelberg we discussed ontology exchange and presented ontologies currently under development. the third meeting, in La Jolla, California, USA, continued this theme, reporting several new ontologies and presenting up-dates on existing ontologies.

Many in the group have been active since our last meeting. The community now has considerable

http://www.cs.man.ac.uk/~stevensr/workshop01/ (1 of 3) [10/05/2004 14:50:03]

Fourth Annual Bio-Ontologies Meeting

experience in the development and deployment of ontologies in the life sciences, so it is appropriate for us to take stock and reflect. So the theme for this year's meeting is Sharing Experiences and Spreading Best Practice. The idea is that we share not only the results of our labours but how we got there, and what we wished we had known while we did it.

Topics that will be discussed include:

● Shared experiences in using ontology tools, development methodologies, comparing ontologies, and reusing other people's ontologies;

● The latest in ontology languages and ontology exchange languages, including an update on the Ontology Inference Layer (OIL);

● Ontologies produced by members from various genomics and life-science efforts;● Specific uses of ontologies in research and drug discovery especially pre-competitive ontologies

for the industry;● Updates on ontology development in general.

Venue

The meeting will be held on July 26th in Tivoli Gardens, Copenhagen. Attendees are expected to make their own arrangements for travel and accommodation (we assume most will simply extend their bookings for ISMB). For more information on hotels with special ISMB conference rates, see ISMB Hotels.

Workshop Details

The day-long seminar will be divided into two sections:

● A key-note address followed by invited speakers on the theme of the workshop, followed by● A series of 20 minute talks selected from respondents to this call for abstracts, covering a range of

topics that may be wider than the theme of the workshop.●

Some intended results of the fourth meeting will be:

❍ Establish a user community for sharing experiences with designing and building ontologies for the Life Sciences;

❍ Enlist support from the Knowledge Management community for tools and

http://www.cs.man.ac.uk/~stevensr/workshop01/ (2 of 3) [10/05/2004 14:50:03]

Fourth Annual Bio-Ontologies Meeting

methodologies to aid our ontology efforts;❍ Create a permanent portal for the exchange of ontologies, ontology building tools

and relationships with other organizations engaged in similar ontology-building tasks;

❍ Establish a consortium for promoting and sharing open-source ontologies in the Life Sciences.

Registration is on line and the registration page is now available. It will be possible to register at the ISMB conference and on the day of the meeting itself. All registrants who are also registering for ISMB will be registered for the satellite meeting on the 26th at the same time. Those that are only attending on the day will be able to register onsite. ISMB itself has reached capacity registration, but the Bio-Ontologies meeting has up to 200 places, of which approximately 140 have been filled.

Deadlines

1 June 2001Please submit abstracts [email protected]

This page maintained by Robert Stevens. Last altered on 25 June 2001.

http://www.cs.man.ac.uk/~stevensr/workshop01/ (3 of 3) [10/05/2004 14:50:03]

Google Search: Bio-ontologies

Web Images Groups News more »

Advanced Search Preferences

Search: the web pages from the UK

Web Results 1 - 10 of about 675 for Bio-ontologies. (0.43 seconds)

The Bio-ontologies Working Group... Meeting History: Bio-Ontologies '00 Meeting, August 5,1999 The latestBio-Ontologies workshop took place at ISMB '00 in San Diego, CA. ... smi-web.stanford.edu/projects/bio-ontology/ - 7k - Cached - Similar pages

Fourth Annual Bio-Ontologies MeetingFourth Annual Bio-Ontologies Meeting. Sharing Experiences and Spreading BestPractice. ... the programme is now available for the Bio-Ontologies meeting. ... img.cs.man.ac.uk/stevens/workshop01/ - 7k - Cached - Similar pages

[PPT] Bio-Ontologies: Bio-Ontologies: Their Creation and DesignFile Format: Microsoft Powerpoint 97 - View as HTML-Ontologies: Bio-Ontologies: Their Creation and Design. Dr. Peter Karp. SRI, http://www.ai.sri.com/~pkarp/. ... Advertisement.The Fourth Annual Bio-Ontologies Meeting. ... img.cs.man.ac.uk/stevens/tutorial01/master.ppt - Similar pages[ More results from img.cs.man.ac.uk ]

go-2001: Fourth Bio-Ontologies Meeting -- 2001Fourth Bio-Ontologies Meeting -- 2001. From: Robert Stevens ([email protected])Date: Tue May 15 2001 - 08:27:36 PDT. ... Stevens Chair Bio-Ontologies 2001. ... www.geneontology.org/email-go/go-arc/go-2001/0364.html - 8k - Cached - Similar pages

go-2001: Re: Fourth Bio-Ontologies Meeting -- 2001... Reply: Midori Harris: "Re: Fourth Bio-Ontologies Meeting -- 2001 ... at ISMB, and, now,will submit a abstract for a presentation at the BioOntologies meeting as ... www.geneontology.org/email-go/go-arc/go-2001/0370.html - 9k - Cached - Similar pages[ More results from www.geneontology.org ]

Plant Ontology Mailing List Archive: [Fwd: The Seventh Annual Bio ...[Fwd: The Seventh Annual Bio-ontologies Meeting Call For Submissions]. ... Submissiondetails at http://bio-ontologies.man.ac.uk. -- This ... www.plantontology.org/mailarch/2004/0042.html - 13k - Cached - Similar pages

Bio-Ontology

http://www.google.co.uk/search?q=Bio-ontologies&ie=ISO-8859-1&oe=ISO-8859-1&hl=en&meta=on (1 of 2) [10/05/2004 15:14:44]

Google Search: Bio-ontologies

Bio-Ontologies: A List of links. What is an Ontology? "An ontologyis an explicit specification of some topic. For our purposes, it ... anil.cchmc.org/Bio-Ontologies.html - 10k - Cached - Similar pages

Bio–Ontologies: Tools, Techniques, and Examplesback to Tutorial Program Bio–Ontologies: Tools, Techniques, and Examples.Carole Goble ... A Survey of Current Bio–Ontologies. The bio ... www.iscb.org/ismb2000/tutorials/goble.html - 12k - Cached - Similar pages

Sklyar, Nataliya: Survey of existing Bio-ontologies... 7 4. Bio-ontologies The use of ontologies within bioinformatics is relatively recentand ... In this section, we present a review of some selected bioontologies. ... dol.uni-leipzig.de/pub/2001-30/en - 57k - Cached - Similar pages

Christopher Thomas - bio-Ontologiesbio-Ontologies, Databases and Dictionaries. ... lsdis.cs.uga.edu/~cthomas/bio_ontologies.html - 11k - Cached - Similar pages

Result Page: 1 2 3 4 5 6 7 8 9 10 Next

Search within results | Language Tools | Search Tips | Dissatisfied? Help us improve

Google Home - Advertising Programmes - About Google

©2004 Google

http://www.google.co.uk/search?q=Bio-ontologies&ie=ISO-8859-1&oe=ISO-8859-1&hl=en&meta=on (2 of 2) [10/05/2004 15:14:44]

The Bio-ontologies Working Group

The Molecular Biology Ontology Working Group

Updated 11-21-00

About this site:This web site was put in place to follow up from the Ontologies for Molecular Biology Workshop, which took place at ISMB '98 in Montreal. The site and mailing list have been set up primarily for the 50-odd attendees, to address the need for continued dialogue on standards and goals for a group effort on ontologies for molecular biology.

Meeting History:

Bio-Ontologies '00 Meeting, August 5,1999The latest Bio-Ontologies workshop took place at ISMB '00 in San Diego, CA.

Bio-ontologies Working Group meeting, August 5, 1999Click here for information on the 1999 meeting of the Bio-ontologies working group.

Two documents introduced at the 1999 meeting are available for download, the XOL

Specification (an OKBC-friendly, XML-based ontology exchange language), and the working group's first report, An Evaluation of Ontology Exchange Languages for Bioinformatics. Both documents are in Microsoft Word 97 format.

Core Development Group Meeting, March 29,1999Following the meeting at the Computational Genomics conference (see above), the core of active group participants have worked to evaluate the primary representation languages/environments, Ontolingua and OML (selected after the last meeting). A meeting was held on March 29 at Pangea to share results and discuss next steps for the group. Click here to see the meeting minutes.

Conference Meeting Nov. 4, 1998th

Last year, in response to the call for participation in the the Ontologies for Molecular Biology Working Group, the core group met after the Second Annual Conference on Computational Genomics (http://www.tigr.org/cet/gss/cg/index.html) on Nov. 4th,

http://smi-web.stanford.edu/projects/bio-ontology/ (1 of 3) [10/05/2004 15:14:56]

The Bio-ontologies Working Group

1998. Details...

Resources:

FTP Site:An FTP site is running at ftp://smi.stanford.edu/pub/bio-ontology/ currently contains the XOL spec and Ontology Exchange documents listed above. Further content is pending submission from other individuals in this group. Material should be deposited in the subdirectory "incoming" (for security reasons). Please submit zipped files containing the directory structure appropriate to the component files. They will then be placed into the bio-ontology directory. All material should be supplied by the owner, and accompanied by a README with property and usage information.

Mailing List:The mailing list membership is composed primarily of the original ontology workshop attendees, but is not closed. Posts to the group can only be made by members. If you have an interest in this topic and would like to be added to the list, please send a request to [email protected].

Links:

The ImMunoGeneTics Database

http://imgt.cnusc.fr:8104

The Organelle Genome Database (GOBASE)

http://megasun.bch.umontreal.ca/gobase/

Ongoing Ontology Projects

http://www.cs.utexas.edu/users/mfkb/related.html

http://smi-web.stanford.edu/projects/bio-ontology/ (2 of 3) [10/05/2004 15:14:56]

The Bio-ontologies Working Group

OML Bioinformatics support page(Includes example ontologies in OML from the working group)

http://wave.eecs.wsu.edu/Bioinformatics/Bioinformatics.html

Robert Stevens Bio-ontologies Page

http://img.cs.man.ac.uk/stevens/ontology.html

Please feel free to submit link suggestions or files to add to the ftp site.

http://smi-web.stanford.edu/projects/bio-ontology/ (3 of 3) [10/05/2004 15:14:56]

Stanford KSL Network Services

Warning: You are using Netscape 3.0 on the PC. Versions after 3.0b5 on the PC have a known bug in URL redirection from GET form submission that makes it not work properly with this service (lots of blank pages and "transfer interrupted" messages). In fact, this release of Netscape is so broken that the rest of this page might not appear without you having to hit the reload button (possibly numerous times(!). Please back off to Netscape 3.0b5, 2.0, or flame them until they fix it. We have reported this bug.

Welcome to Stanford KSL Network Services. Home of the Ontolingua Ontology Editor

Please log in:

User ID

Password (ignored if you are a new user)

Trouble logging in?Projects in our user communityRelevant papersLegal notice

http://www-ksl-svc.stanford.edu:5915/&service=frame-editor (1 of 2) [10/05/2004 15:15:16]

Stanford KSL Network Services

There are 7 other users currently connected. Requests handled since server boot at 26 April 2004, 11:08:37: 930,636 Total requests handled: 10,646,350 OKBC Requests handled since server boot at 26 April 2004, 11:08:37: 236 Total OKBC requests handled: 5,704,981

The KSL gratefully acknowledges the support of the following funding agencies for this work:

1. Defense Advanced Research Projects Agency (DARPA) and the Department of the Navy, Rapid Knowledge Formation (RKF) program, under contract N66001-00-C-8027-P00001.

2. Defense Advanced Research Projects Agency (DARPA) and the Department of the Navy, High Performance Knowledge Bases (HPKB) program, under contract N66001-96-C-8622-P00001.

3. DARPA and the National Institute for Standards and Technology (NIST), Rapid Development, Exploration and Optimization (RaDEO) program, under cooperative agreement 70NANB6H0075.

4. DARPA and the Defense Logistics Agency under the Advanced Logistics Program (ALP). 5. DARPA/ETO, U.S. Army Research Laboratory, Lockheed Martin Advanced Technology

Laboratories, and Sandpiper Software Corporation, Rapid Prototyping of Application-Specific Signal Processors (RASSP), contract DAAL01-93-C-3380.

6. CommerceNet under contract CN-1094 (TRP #F33615-94-4413). 7. Boeing Corporation.

This application copyright (c) 1995, 1996, 1997, 1998, 1999, 2000, 2001 Stanford University Knowledge Systems Laboratory.

Contact us at [email protected]

http://www-ksl-svc.stanford.edu:5915/&service=frame-editor (2 of 2) [10/05/2004 15:15:16]

Sklyar, Nataliya: Survey of existing Bio-ontologies

[Deutsche Version]

Category Value

Available via http://lips.informatik.uni-leipzig.de:80/pub/2001-30/en

Submitted on

28th of September 2001

Author Sklyar, Nataliya

Title Survey of existing Bio-ontologies

Date of publication

2001

CitationSklyar, Nataliya. Survey of existing Bio-ontologies, Techn. Report 5/2001, Dept. of Comp. Science, Univ. of Leipzig

Number of pages

23

Language English

Organization The Institute of Computer Science

Type Technical Report

Subject group

Computer Science, Data Processing

Abstract

Ontologies have become popular in the fields of intelligent information integration, cooperative information systems, electronic commerce and knowledge management. Recently, ontologies for knowledge representation in the biological domain have also appeared. This survey is intended to provide a brief state-of-art introduction into ontology-based biological systems. This work aims to investigate the role of ontologies and the possibilities opened by the use of ontologies in bioinformatics. The description of several selected bio-ontologies is given. For every ontology, the structure, scope, area of application, characteristics, representation language are considered. A comparison of described ontologies and future possibilities for the use of bio-ontologies are discussed.

Keywords Ontology, bio-ontologies, bioinformatics

Contact address

Nataliya Sklyar, Universität Leipzig, Institut für Informatik, Augustusplatz 10 - 11, 04109 Leipzig

http://dol.uni-leipzig.de/pub/2001-30/en (1 of 2) [10/05/2004 15:15:56]

Sklyar, Nataliya: Survey of existing Bio-ontologies

Sponsored by

DFG

Copyright

Das Copyright liegt beim Autor/bei den Autoren. Der Leser ist berechtigt, persцnliche Kopien fьr wissenschaftliche oder nichtkommerzielle Zwecke zu erstellen. Jede weitergehende Nutzung bedarf der ausdrьcklichen vorherigen schriftlichen Genehmigung des Autors/der Autoren.

Source(s)

● Postscript ( ps , ps.gz [ 440709 bytes ] , ps.zip ) ● PDF ( pdf , pdf.gz [ 247235 bytes ] , pdf.zip ) ● Plain text ( text , text.gz [ 55899 bytes ] , text.zip ) ● Pagewise preview

Management of the document by

[email protected]

Leipzig University Multimedia Publication Server

http://dol.uni-leipzig.de/pub/2001-30/en (2 of 2) [10/05/2004 15:15:56]

Institut für Informatik

Survey of existing bio-ontologies

Nataliya Sklyar

Report. Nr.

September 2001

1

Survey of existing Bio-ontologies

Nataliya Sklyar

Department of Computer Science, University of Leipzig Augustusplatz 10/11, D-04109 Leipzig, Germany

e-mail: [email protected]

September 2001

Abstract. Ontologies have become popular in the fields of intelligent information integration, cooperative information systems, electronic commerce and knowledge management. Recently, ontologies for knowledge representation in the biological domain have also appeared. This survey is intended to provide a brief state-of-art introduction into ontology-based biological systems. This work aims to investigate the role of ontologies and the possibilities opened by the use of ontologies in bioinformatics. The description of several selected bio-ontologies is given. For every ontology, the structure, scope, area of application, characteristics, representation language etc. are considered. A comparison of described ontologies and also future possibilities for the use of bio-ontologies are discussed. 1. Introduction With the recent advance in bio-technology and bioinformatics, numerous genome databases

from various biological communities have been developed to assist in genomic research. There are hundreds of genomic and biological databases open for common access throughout the World Wide Web. They cover various types of biologically relevant information, such as genomic databases for different organisms, pathway databases, genetic sequence and molecular DBs etc. Genome-related databases can be divided into two major groups: generalized and specialized databases [11]. Generalized databases include such well known databases as GenBank, EMBL, DDBJ archives of nucleic acid sequences, SwissProt, and PIR polypeptide sequence databases. These databases capture and represent information on particular classes of molecules without any phylogenetic or functional exclusion. In contrast, specialized databases are organized around a specific model organism or around a specific biological function (Fig. 1). Biological databases have been created by different biological communities, within different stages of the biological knowledge development, for different purposes, and covered different aspects. All that leads to the high level of structural and semantic heterogeneity and to the autonomy of biological databases. Structural heterogeneity refers to differences in the DBMS and in data models. A large amount of biological data is stored in flat files, at the same time there are relational and object-oriented biological DBs. Semantic heterogeneity considers the content of databases and the

2

meaning of database categories. In general, molecular biology has so called “communication problem” [36]. Even the meaning of important high-level fundamental concepts is often ambiguous. For example the use of term “gene” differs from one to another biological system, some proteins and genes have the same names or the same protein or gene has different names in different sources. In genomic databases functions of genes and gene products are often expressed in natural language. This makes very difficult obtaining such important information like, for example, sets of proteins with certain function.

Genomic Databases

Generalized Specialized

Nucleic acid sequences(DNA, RNA)

Polypeptide sequences(proteins)

GenBank

EMBL

SwissPROT

PIR

Organism specific

Biological functionspecific

FlyBase

MGI

Pathways

Enzymaticreactions

EcoCyc

Structure

NDB

PDB

Figure 1. Classification of genomic databases To be able to compare results, infer and test new hypotheses, biologists need to have a

possibility to pose complex questions, analyze data from different information sources and different experiments. For example, in microarray analysis it would be useful to have information on the functional classification for every gene on the chip, on the pathways in which it works, and on the details of any protein-protein interactions in which it participates. All those require integration of the biological information.

A number of approaches for biological data integration and interoperation have been proposed. Hyperlink navigation is the simplest and the most common. It allows a user to interactively navigate from some presentation of an entry in one DB to an entry in another database. However, types of links between objects are implicit and so one can be dropped from one DB to another one loosing the context of his initial intentions.

Several well-known techniques, such as mediator systems [17] and federated databases [25], have been developed to provide uniform interfaces and common query languages for heterogeneous DBs. SRS (Sequence Retrieval System) technology [37, 44] has been specially developed to integrate heterogeneous biological data sources behind a single interface. Currently

3

it provides a powerful unified interface to over 400 different scientific databases [44]. However all these approaches aim to overcome of structural heterogeneity and they do not concern the integration of data contents of underlying sources.

When using more than one data store or analysis tool, a biologist needs to be sure that the knowledge within one resource can be reliably compared to those in another. Information integration in general and in biology in particular requires a consistent shared understanding of the meaning of that information. Ontologies provide a shared and common structure of a domain thus giving a common understanding of this domain, and may be used for overcoming semantic heterogeneity.

This survey gives an overview of ontologies that have been developed for molecular biology. This work aims to investigate the role of ontologies and the possibilities opened by the use of ontologies in bioinformatics.

The survey consists of several sections. In the next section we give a definition of ontology and its basic components. A certain attention is paid to the languages for ontology representation. In section 4, we describe several currently existing bio-ontologies. For each ontology, the structure, scope, area of application, characteristics, representation language etc. are considered. In the discussion section a comparison of described ontologies is given and future possibilities for the use of bio-ontologies are discussed.

2. Ontology: Concept, components and applications In the literature, a number of ontology definitions and their intended purposes is discussed.

Ontologies were developed in AI for the following purposes: knowledge sharing and reuse, data exchange among programs, unification of disparate data and knowledge representations, knowledge-based services etc.

The commonly used ontology definition, given by Gruber, is “ontology is a formal, explicit specification of a shared conceptualization” [15]. A conceptualization is an abstract, simplified view of the area of interest that we wish to represent.

The main ontology components are concepts, relations, instances, and axioms [39]. However, depending on the complexity, expressiveness and the representation, an ontology may contain only some of these components.

Concepts represent sets of objects within a domain that have certain common properties. For instance, “protein” is a concept within the domain of molecular biology. Relations describe interactions between concepts. There are two kinds of relations: hierarchical and associative relations. Hierarchical relations organize concepts into sub- and super-concept tree structures. Associative relations allow additional concept linking independent on their hierarchical arrangement. Relations may also have properties that provide further knowledge about them.

Instances are objects represented by a concept. Strictly speaking, ontologies should not contain any instances, because they are supposed to be a conceptualization of a domain, but not a description of a particular instances’ set. The latter has to be implemented in databases.

Axioms are used to constrain values of concepts and instances. In this sense, the relation properties are a kind of axioms, though axioms may include more general rules.

4

Due to a very general definition, the term “ontology” has a number of different treatments among the researchers. This also influences a representation and application of a particular ontology. In this sense, most commonly ontologies trend to be used as a common vocabulary, a database schema, a meta-level specification, or a structure of knowledge base.

Since the purpose of ontologies is to provide an explicit conceptualization that describes the semantics of data, they have similar functions to database, but differ from them in the following (see [10] for relevant discussion):

– Languages for describing ontologies are syntactically and semantically richer than those commonly used for databases;

– Information that is captured by ontologies may consist of semi-structured natural language texts rather than only structured data;

– Ontologies are intended to be reusable, while database schemas are usually applied only to a single application.

Many applications of ontologies address the issue of interoperability [43]. In nearly all ontology-based approaches, ontologies are used for the explicit description of the information source semantics.

3. Ontology representation languages To implement ontologies in applications, they must be represented in a certain language that

particular applications can understand. A variety of languages, based on different knowledge representation (KR) models, has been proposed, for example, Ontolingua, OMK/CKML (Ontology Markup Language/Conceptual Knowledge Markup Language), Frame-Logic, UML, XOL (Ontology exchange language), and OIL (Ontology Interchange Language).

3.1. Conceptual representation Current bio-ontologies use approaches of different complexity for their conceptual

representation: (i) controlled vocabularies, defined in natural language, (ii) taxonomies, (iii) object-based knowledge representation languages, such as frames and UML, and (iv) approaches based on logic expressed predicates, like Description Logics (DL) [20, 39]. Fig. 2 provides examples for different forms of ontology representations.

Controlled Vocabulary is an ontology representation that is simply a list of terms, defined in natural language (Fig. 2a). There is a number of biological and molecular biology vocabularies available over WWW [4, 8, 18, 28]. These are the ontologies with the lowest level of complexity and, thus, we will not provide any more details on them.

Taxonomy is a set of concepts that are arranged into generalization-specialization hierarchy (Fig. 2b). Usually taxonomies support a creation of handcrafted ontologies with simple tree-like inheritance structures, but multiple-inheritance may also be realized. Taxonomies define neither the web of relations for the concepts that provide the term semantics, nor the attributes of these terms.

Frame-based systems use the object-oriented data model, where objects are named frames (Fig. 2c). There are frames of two types: classes (concepts) and instances. Class frames represent

5

generic types of objects, such as class of biomolecules (Class_Biomolecule) and class of proteins (Class_Protein). Classes can be arranged in class-subclass hierarchies. Instances represent particular elements of a class, for example protein_LARD. A set of slots, which define attributes and relations of a class, is associated with every class.

a) Controlled Vocabulary

Biomolecule – is a...Protein – is a large molecule...Molecular function – ...

c) FramesBiomolecule

DNA Protein

Name: ...Chemical composition: ...Reference: ......

......

d) Description Logics

Biomolecule

DNA Protein

Molecular function

Cell growth Enzymeis-a is-a

Can have:"Has function"

Protein with cell growth function

Biomolecule

DNA Protein

Molecular function

Cell growth Enzymeis-a is-a is-a is-a

b) Taxonomies

Figure 2. Examples ontology representations in different languages Description Logics (DLs) provide a language for capturing declarative knowledge about a

domain, and support reasoning on that knowledge [3]. DL models an application domain in terms of concepts, roles (relations) and individuals (instances). The domain is a set of individuals, and a concept is a description of a group of individuals that share common characteristics (Fig. 2d). Roles model relations between, or attributes of, individuals. Information captured with DLs is classified to a rich hierarchical lattice of concepts and their inter-relations. The DL supplies a number of reasoning services, which allow a construction of classification hierarchies and verification of the description consistency. This enables automated development of a taxonomy for defined concepts. The further described TAMBIS ontology is based on the DL.

3.2. Ontology-exchange languages Since information sharing is one of the central problems in computer science and in

bioinformatics in particular, there has been a number of efforts to develop a set of requirements for the ontology-exchange language and to develop such a language.

In 1999, at the Bio-ontologies Working Group Meeting [2] the evaluation of a number of ontology-exchange languages was reported [20]. The goal of such an evaluation was to analyze capabilities of languages on the satisfaction to the bioinformatics requirements and hence recommend one or more languages for the use within the bioinformatics community. The diagram of some ontology languages relative to their semantics and syntax is given on fig. 3.

Ontolingua [32] (frame-based, with LISP syntax) and OML/CKML [31] (conceptual graph based, with XML syntax) were indicated to be the most expressive languages at that moment. However, it is believed only the frame systems provide the necessary representational

6

constructions to model ontologies for molecular biology and at the same time an XML-based syntax must be used for a bioinformatics ontology exchange language and so far it is likely that the language will see widespread acceptance. The main conclusion of this evaluation was that “the language that the bioinformatics community needs for the exchange of ontologies should be based on frame-based semantics with an XML expression”.

The language, which satisfies these recommendations was proposed by P. Karp and is called XOL (XML-Based Ontology Exchange Language) [22]. The ontology definitions, that XOL is designed to encode, include both schema information (meta-data), such as class definitions from object databases, and non-schema information (ground facts) such as object definitions from object databases. The syntax of XOL is based on XML and the semantics of XOL are based on OKBC-Lite, which is a simplified form of the knowledge model for the OKBC [29]. The XOL language provides a mechanism for encoding ontologies within a flat file that may be easily published on the WWW for exchange among a set of application developers. However, there are high-level ontologies, which have in the basis not only frames but also other logic-based representations model, as for example TAMBIS ontology based on DL. XOL is restricted only to a frame viewpoint of ontology, and exclude the logic-based ontologies.

Semantics

Conceptualgraph

Framerepresentation

Descriptionlogic

Syntax

XML, RDF LISP

OML/CKML OIL XOL Ontolingua

Figure 3. Diagram of the semantic and syntax components of the ontology languages An International consortium of European and US researchers has developed the Ontology

Inference Layer - OIL (also known as Ontology Interchange Language) being a proposal for a web-based representation of ontologies, which combines the widely used modeling primitives from frame-based languages with the formal semantics and reasoning services provided by the DLs. It is also compatible to the RDF Schema [30]. OIL is closely related to XOL and can be seen as an extension of XOL. OIL extends XOL to make it more suitable for capturing ontologies defined using a logic-based approach. OIL unifies three important aspects provided by different communities (see fig. 5): formal semantics and efficient reasoning support as provided by DL, rich modeling primitives as provided by the frame community, and a standard proposal for syntactical exchange notations as provided by the Web community.

Currently OIL has reached the state of a decidable core language. Also currently it has certain limitations, though it is intended to be extensible. OIL is now being suggested as the next generation RDF and as a standard for ontology representation for e-Commerce [30].

7

4. Bio-ontologies The use of ontologies within bioinformatics is relatively recent and consequently the

amount of ontologies is limited. In this section, we present a review of some selected bio-ontologies.

4.1. TAMBIS The Transparent Access to Multiple Bioinformatics Information Sources (TAMBIS) project

is intended to provide users with maximum transparency when accessing diverse bioinformatic data sources, shielding users from those sources [1, 38, 42]. For the end user it provides the illusion of a single query language, a single data model and a single data location. This is achieved by using the following principles.

− Conceptual representation of biological concepts and terminology, namely TAMBIS Ontology (TaO), against which the user can formulate queries;

− Mapping from terms of the conceptual representation onto terms in external sources to enable queries to the databases based on different schemes.

Thus, TAMBIS provides a level of indirection between the user and external sources. The focus of the project has been to demonstrate the possibility to develop an intelligent retrieval integration system using based on ontology approach. The data sources, which can be queried by TAMBIS include Swiss-Prot, CATH, Prosite, Enzyme, and BLAST databases.

As well as the TAMBIS Ontology itself, the complete TAMBIS system that implements this ontology is of interest since the ideas of database integration via ontology may be used for further research.

Architecture TAMBIS has five major components organized into a classical three-layer

mediator/wrapper architecture (Fig. 4). The presentation layer uses an intelligent user interface, in which the user combines concepts from the knowledge base to form declarative, source independent queries. This layer consists of a knowledge-driven query formulation interface. The mediation layer identifies the appropriate sources to satisfy a query and rewrites the query to an ordered list of source dependent procedures. The mediation level also manages a query translation process, transforming a query expression from a declarative conceptual one into an ordered collection of source specific calls. The wrapper layer manages the external sources, which are wrapped to provide a common interface that affords communication, format and network transparency. It includes a wrapper service for retrieving data from external source. Besides that, there is a terminology server for handling TaO, which is linked to both presentation and mediation layers.

TAMBIS Ontology The role of the TaO in TAMBIS [1] is

– to describe the knowledge that can be queried for, and the schemas of the underlying data sources;

8

– to link conceptual terms to their actual representations in the data sources; – to mediate between equivalent or near equivalent concepts in the different data sources

by means of the knowledge-base reasoning services; – to guide the user in the formulation of biologically correct queries.

Terminology Server

TaO LinguisticModel

Presentation layer

Query formulationdialogs

Ontology browsers

Query formulation

Mediation layer

Sources and services

CPLfunctions

Costs andcoersions

Ontology functionmapping

Rewrite rules

Query transformation

Wrapper layer

Wrapper service

Wrapper Wrapper

User

Interface

Figure 4. TAMBIS architecture The TaO is based on a DL. Since within the DL new concepts can be constructed from

existing ones, the TaO is a dynamic ontology. This means that it can grow without the need of either conceptualization or encoding new knowledge. The TaO uses ontology axioms to govern the concepts, which can be linked to another concepts via relations to form new concepts. Only biologically reasonable concepts can be formed, because the knowledge, which defines rules for composing new concepts, is captured in constrains upon relations.

The basic relations in the TaO contain “is a component of”, “has name”, “has function”, “is homologous to”.

The previous version of TaO used the GRAIL language [33]. In the current version it also has been represented with the more powerful and optimized tool – FaCT [16].

The TaO can be divided into the high- and low-level parts. The high level divisions are taken from the models developed in the GALEN project and described in [33]. This general foundation has been extended in TAMBIS with the lower level concepts necessary to represent user’s descriptions in the biological domain (see fig. 5).

The reusability of the TaO is based on the following facts. The TaO uses a representation language, based on the DL, and it has a partly module structure, that allows reuse of modules in other applications without knowing their internal structure in details. The TaO currently contains around 1500 biological concepts and their relations, and is capable of inferring many more by virtue of its knowledge representation scheme. Its coverage includes proteins and their components, motifs, protein structures, enzyme functions, enzyme and metabolic pathways, expressed sequence tags, nucleic acids, their component motifs, gene function and expressions,

9

sequence homology, and taxonomy of species.

Biological FunctionMolecularModificationBindingDNA Replication and RepairTransportMaintenance of StructureCellular Growth and ProfilerationSignal TransdactionEnzymatic FunctionReceptorHormoneToxinInhibitorTargettingStress

Biological Processbody processbiomolecular process

Body Structure

Cellular Substance

body fluidbiomolecular process

Abstract Structure

Physical Structure

protocolmethodclassificationreactionpathwayinformation source

solid structurecomp. of biomol. Struct.Chemical

ArchitecturalAttributeClassificationalAttribute

PhysicalLocativeAttributeProcessLocativeAttribute

TransformationAttributeMethodAttribute

TopThing

DomainCategory DomainAttribute

PhenomenonModifier ValueType

GeneralisedStructure

GeneralisedSubstance

GeneralisedProcess

GeneralisedFunction

Aspect

Collection

Unit

Feature

Selector

State

CollectionAttribute

FunctionalAttribute

LocativeAttribute

StructuralAttribute

SelectorAttribute

MultipleAttributePartitiveAttribute

Biological concepts Roles (relations)

Figure 5. TaO concept hierarchy (the grayed rectangles indicate the high-level concepts, the transparent rectangles indicate the low-level concepts)

4.2. Gene Ontology The intention of the Gene Ontology Consortium [12, 13] is to create a shared biological

resource that would enable the community to describe gene products1 using a common vocabulary and semantics. The GO consortium was initiated in 1998 and is currently a collaboration among 5 database projects FlyBase, MGI, SGD, TAIR, and WormDB and covers over 5000 concepts. The GO is not intended to deal with the whole molecular biology knowledge captured in the community databases, but captures information about the role of gene products within an organism, because the knowledge of the biological role of proteins in one organism can often be transferred to other organisms [5]. On this basis, the GO provides controlled vocabularies for the description of third-party independent ontologies: “molecular function”, “biological process”, and “cellular component” of gene product. The names of these ontologies represent the corresponding attributes of the gene product, thus principally enabling a uniform querying of the collaborating databases for the information on gene products. Let us give the definitions to these attributes:

Molecular function describes the tasks performed by individual gene products. It is a capability that a physical gene product (or gene product group) carries as a potential. It describes only what it can do without specifying where or when this usage actually occurs. Examples of

1 Gene product is a biochemical material, either RNA or protein, resulting from expression of a gene

10

broad functional terms are “enzyme”, “transporter” or “ligand”. Biological process is accomplished via one or more ordered assemblies of molecular

functions. It often involves transformation, in the sense that something goes into a process and something different comes out of it. Examples of broad biological process terms are "cell growth and maintenance," or "signal transduction." A biological process is not equivalent to a pathway, and the GO does not capture any of the dynamics or dependencies that would be required to describe a pathway.

Cellular component encompasses sub-cellular structures, locations, and macromolecular complexes, like for example, nucleus, ribosome, and proteasome.

A gene product has one or more molecular functions and is used in one or more biological processes. It may be associated with one or more cellular components. Thus the relations between the gene product and molecular functions, biological processes and cellular components are all of many to many kind.

Cellular component

Cell wall

Cell wall (sensu Bacteria)

ExtracellularMembrane

Cell wall inner membrane

Type II proteine (sec) secretionsystem complex [GO: 0015627]

part of

part of

is a is a

is a

is a

is a is a

Figure 6. Sample GO inheritance scheme

The ontologies of molecular function, biological process and cellular component are represented as directed acyclic graphs (DAGs) or networks. DAG allows multiple inheritances with the “is-a” relationship, when a child term may be an "instance" of its parent, or with the “part-of” relationship, when a child is a component of its parent term. A child term may have relationships of different classes with its different parents (fig. 6). Nevertheless, most of relationships in the GO are of “is-a” kind and implement mainly the single inheritance. Thus, the GO ontologies are build in the form of taxonomies.

The GO has a well-detailed conceptualization level, though lacks any upper-level ontology organization. That is, it contains, for instance, 9 hierarchy levels in the molecular function ontology, 12 hierarchy levels in the biological process ontology [34], but does not contain any terms being of the level upper than the molecular function, biological process or cellular component concepts.

The GO is represented in form of text or XML files. This principally enables the use of the GO by different external databases, which are not initially included into the GO project. The

11

external database may collaborate with the GO by making cross-links between the GO terms and the objects of the database (typically gene products, or their surrogates, genes). It can also support queries that use the GO terms or it can contribute to the development of the GO expanding the vocabularies and refining the terms. The collaborating databases are provided with the links to the GO files. The amount of links from different databases is depicted in table 1 (on 19.02.2001).

Table 1. The number of links to GO from SGD, FlyBase, and MGI databases. 4.3. MBO The Ontology for Molecular Biology (MBO) is intended to provide clarity and

communication within the molecular biology database community. It was one of the first attempts to create an ontology as “a mean to provide a semantic repository to semantically order relevant concepts in molecular biology and to bridge the different notations in various databases by explicitly specifying the meaning of and relation between fundamental concepts in an application domain” [35]. This means that either the different databases would agree to the common MBO definition (and their annotation would be accordingly changed) or mapping of the differences between each DBs conceptualization could be made in terms of the MBO.

The MBO contains concepts and relationships that are required to describe biological objects, experimental procedures and computational aspects of molecular biology. It captures a very wide range of biological concepts and contains over 1200 nodes.

The MBO has an upper-level organizing ontology (Fig. 7) and includes also such upper-level common sense ontologies as µKosmos and Cyc.

Being

Object EventTemporalextent

Individual object Property Self-contentment

Abstract object

Physical object

Physicality

Mental object

Worldly object

Mentality

Energy

Matter

Mass content

Attribute

Relation

Arity

Identifier

Descriptor

Information content

Primary property

Secondary property

Objectivity

Occurance TimeActivitycontent

Abstract event

Physical event

Physicality

Mental event

Worldly event

Mentality

Human activity

Natural process

Human cause

Future

Past

Direction

Figure 7. Upper-level MBO ontology

SGD FlyBase MGI Biological Process 5,603 624 3,418 Molecular Function 5,710 5,277 4,529 Cellular Component 2,206 479 3,477 Total Gene Products Associated 6,312 5,413 5,518

12

The MBO also includes a number of biologically specific ontologies, e.g. for genes,

reactions, pathways, and compound. An example of the pathway ontology is represented on fig. 8. However, upper-level organizing ontologies do not have direct relationships to biological concepts and processes from biologically specific ontologies.

Although MBO captures wide range of biological concepts in its ontologies, these ontologies mainly do not interfere with each other and have different detail levels, for example compound ontology consists of instances, at the same time pathway ontology is ending at quite large grained concepts. The MBO can be considered as a taxonomy. The primary relationship used there is “is a” relationship and the concepts in MBO are in general given no attributes.

Pathways

Biosynthesis

Amino-Acid-Biosynthesis

Amino-Acid-Family-Syn

Ind-Amino-Acid-Syn

Carbo-Biosynthesis

Cell-Structure-Biosynthesis

Murein-Biosynthesis

Surface-Structure-Biosynthesis

Cofactor-Biosynthesis

Lipid-Biosynthesis

Nucleotid-Biosynthesis

Deoxyribonucleotide-Biosynthesis

Pur-And-Pyr-Syn

Ribonucleotide-Biosynthesis

Degradation

Amino-Acid-Degradation

Carbon-Degradation

Fatty-Acid-Degradation

Other-Degradation

Phosphorous-Compounds

Energy-Metabolism Intermediary-Metabolism

Central-Metabolism

Nucleotide-Metabolism

Nitrogen-Metabolism

Sulfur-Metabolism

Figure 8. Pathways ontology in MBO In the MBO project Java Ontology Browser and Ontology Editor [27, 36] have been

developed. 4.4. EcoCyc EcoCyc ontology is put in the base of an organism-specific Pathway/Genom Database that

describes the metabolic and signal transduction pathways of Escherichia coli, its enzymes and its transport proteins. The EcoCyc DB describes the known genes of Escherichia coli, the enzymes of small-molecule metabolism that are encoded of these genes, the reactions catalyzed by each enzyme, and the organization of these reactions into metabolic pathways [21, 23].

The ontology is employed to encode the functions of metabolic enzymes, signal-transduction proteins, transporters, and DNA-binding repressor proteins. The EcoCyc uses frame-based language for the ontology representation and for encoding its data. The frames are arranged in a class hierarchy, given in fig. 9 [23].

To describe all of the distinct molecular species in a pathways, frames for every species

13

have been created in the EcoCyc DB. There is also a frame for every substrate and for every enzyme in a pathway. Every reaction is represented as a distinct frame. Such separation of the biological entity representation from the function representation of that entity has a number of advantages: There is a many-to-many mapping between entities and functions, the representation is more normalized and therefore less redundant, etc. [24]

EcoCyc is a single organism DB and intended to capture genome and full biochemical network of Escherichia coli. Currently EcoCyc includes 139 metabolic-, and 20 signaling pathways, 946 reactions, 629 enzymes, and 4390 genes.

Thing

Chemicals Organisms Generalized-ReactionsEnzymatic-Reactions

Elements Macromolecules

Polynucleotides

DNA

Genetic-Elements

Chromosomes

Plasmids

DNA-Segments

Genes

icdA

Promoters

DNA-Binding-Sites

Operons

RNA

All-tRNAs

tRNAs

val-tRNAs

RNAs

Charged-tRNAs

Charged-val-tRNAs

valT-tRNAvalU-tRNAvalV-tRNAvalW-tRNAvalX-tRNAvalY-tRNA

Pathways

Slot nameleftrightsubstratesspontaneous?ec-numberdeltag0keqenzymatic-reactionin-pathwayspecies-distribution

Inverse

reactionreaction-list

Value-TypeChemicalsChemicalsChemicalsbooleanstringnumbernumberEnzym.-ReactionsPathwaysstring

Reactions

Small-Molecules

All-Amino-Acids

Amino-Acids

All-Carbohydrates

All-Folates

All-Nucleosides

Coenzymes

NADPNADPH

Ions

Anions

Cations

proton

Porphyrins

Vitamins

Unclassified-Compounds

isocitrate2-oxoglutaratecarbon-dioxide

Complexes Proteins

Protein-Complexes

Polypeptides

isocithase-cplx

Figure 9. EcoCyc ontology class hierarchy: classes are marked with bold and placed into the rectangular boxes, while objects are given in normal text without outline. There are also slots of the class Reactions

represented. 4.5. Cell signaling ontology Recently the Human Genome Center of Tokyo University started to develop the Cell

signaling ontology (SIGNAL-ONTOLOGY) [6, 41]. This ontology is based on the knowledge from the the database for cell signaling networks (CSNDB) [40]. The purpose of the project is to extract common natures of the cell signaling in the model species and to find "what the cell signaling is" and "how we can reconstruct the cell signaling system in the computer". SIGNAL-ONTOLOGY is created to be used as a controlled vocabulary of the cell signaling system and also as a common reference which the database developers can refer to.

The authors consider the cell signaling ontology as a kind of an object-oriented database system. The ontology features a flow diagram of a signal transduction and a conceptual hierarchy of biochemical attributes of signaling molecules. These two aspects are integrated into the object-oriented model as the method and the type of the object. Currently the ontology provides the

14

following conceptual classes. Signal Module is a signal processing class of the eukaryote model species. Every signaling

cascade can be reconstructed by a set of Signal Module instances through the messages passing between Input and Output Signals (see fig. 10).

Reaction is a class, representing the information on biochemical reactions, which transfer biological signals. Molecular interaction motifs, effects of the signals, components of the reactions, and properties of biological signals are included.

ligand->G-protein-coupled

receptor ->G-protein switching

input

output

G-proteinswitching ->

kinase

input

output

kinase cascade

input

outputconformation

change and releaseof signal peptide

-> nuclearlocalisation

input

output

ligand->transmembrane

signaling->phosphorylation

-> clustering

input

output

phospholipidsecond

messenger

secondmessanger ->

kinase

input

output

targetgene

expression

input

output

Input signalOutput signalPointer to “Molecular function” conceptPointer to “Cellular function” concept

Signal module

input

output

input

output

Signaling cascade

c ell

mem

bran

e

cellnucleus

Figure 10. Signaling cascade, consistent of a set of Signaling Modules in SIGNAL-ONTOLOGY

The Molecular Function class is intended to store the information on biochemical properties of molecules, which relate to the cell signaling.

Cellular Function is a class for the representation of the biological phenomenon that a Signal Module contributes to. This is a biological response performed by a series of Molecular Function.

The following classes are also defined: Tissue – for a set of tissues where biological processes take place, Cell – to represent hierarchically structural cell components, Molecule – to represent a list of molecule types. Every class represents a hierarchy of corresponding concepts. The pointers between components of the conceptual classes have been introduced to interrelate them. Signal module has pointers to Molecular functions, every Molecular Function component has pointers to genes and proteins, and Cellular Function also has pointers to Signal Module concept, genes and proteins. Pointing information is written in the link table. Currently, the ontology classes, except Signal Module class, are available at WWW in form of HTML- and XML-documents.

The SIGNAL-ONTOLOGY does not contain any upper-level ontology organization. In many cases it captures the same concepts as the GO but still has much less levels of hierarchy.

15

6. Discussion 6.1. Comparison of bio-ontologies and conclusions The bio-ontologies described above are not the only ones that deal with the bio-molecular

data. Existing ontologies, in general, have much more differences than similarities. They differ in their intention, structure, their coverage, and detail level. The following characteristics may be used for the ontology comparison:

(i) Application scenarios, the ways ontologies are used by applications. (ii) Although all considered ontologies are intended for molecular biology, they cover

different parts of this domain and have different detail levels. In this aspect, the ontologies may consist of one or both of the following components:

– Domain-oriented component, which includes domain specific component (e.g. genes, processes specific for one particular organism) and domain generalization components (e.g. gene function, gene structure);

– Generic component, which captures common high-level concepts, such as Thing, Physical, Abstract, Structure etc. This component can be especially useful for the ontology reuse, as it allows concepts to be correctly or more reliably placed (e.g. concept Process can be a parent for such concepts as body process, cellular process, and chemical process).

Even the ontologies that cover the same parts of the same domain can differ in their detail level, which determines how deep and wide they capture the lower level concepts (e.g. different types of proteins, enzymatic reactions, cellular processes).

(iii) One of the essential ontology characteristics is its conceptual representation. Different kinds of ontology representation have been described above, in section 3.1.

(iv) As the additional points to characterize ontologies we consider tasks, which ontology-based applications provide, and physical ontology representation – storage with means to access ontologies.

Table 2 summarizes the content, structure and representation of the surveyed bio-ontologies (partly it is based on the results presented by Stevens in [39]).

Most of ontologies are common regarding to some core concepts of molecular biology, such as Gene, Protein and related Biological Functions and Biological Processes. However, they differ extensively in both the content and notation of their knowledge. This is primary due to the wide range of tasks to which ontologies are applied. In general, these ontologies fit quite well the demands of particular application, but their use within others application requires a lot of additional efforts or in some cases seems to be even impossible. For instance, EcoCyc ontology has been used to support the DB schema, but its direct use as a controlled vocabulary for the DB annotation seems inconvenient.

Three ontologies the GO, MBO and Signal Ontology have been developed almost for the same purpose – to be used as controlled vocabulary and/or community reference providing a bridge between different notations and thus reducing the “communication problem”. These ontologies capture different but partially intersected domains, describing them with different notations and classifications, remaining at the same time consistent and correct. This makes their

16

integration or mapping definition difficult. The GO being used for the database annotation contains fine level of details, whereas TaO

is quite shallow, but due to the DL representation it can be easily extended to provide more complex and detailed concepts.

All the above allow us to make the following conclusions about the current state-of-art in the bio-ontologies:

– The bio-ontologies have been developed to capture different, sometimes intersected knowledge domains and for different purposes: common access ontology-based search, DB schema, controlled vocabulary for DB annotation etc. In general, any of existing ontologies cannot be substituted by another existing ontology.

– Conceptualizations of the same domain may differ without providing incorrect

knowledge. – Currently there is no ontology that captures the whole range of concepts in the

molecular biology domain. – Applications use only specific, narrow part of knowledge thus they will use only

subsets of a single global ontology if the latter is ever created. Thus it is more important to develop a set of comprehensive and detailed ontologies for different domains, rather than to create the global bio-ontology.

– There is a lack of reusability in most of the existing bio-ontologies because they had been made from scratches, when there were no ontologies in the biological domain they could be based on.

– Although a certain progress in the use of ontologies for the biological domain hasbeen achieved, still a number of challenging problems have to be solved.

6.2. Open problems From all the above the following open problems in the field of bio-ontologies can be

derived. The discussion of possible ways to their solutions and how the existing bio-ontologies can contribute to that are also given.

Integration of heterogeneous biological resources. The use of ontologies can help to overcome interoperability problems. In order to achieve

interoperability many ontology-based approaches to the information integration have been developed in different fields [43]. As for the bioinfomatics this problem still remains open. From one side biologist needs to have a possibility to analyze a wide range of data, to pose complex queries over different resources [14, 26]. From the other side, existing biological databases are encoded in different and incompatible formats; they have different data models, from flat-files to object-oriented databases. There are also no naming conventions between databases. Regarding pathways databases, these databases usually present the data in the form of images and it is difficult to make links between proteins on the pathway diagram and genes, for which an accession numbers are given. Most pathway databases do not store references to known genes in the databases and hence the name given in the pathway database might not resemble any name in sequence databases.

17

Table 2. The summary of the content, structure and representation of the surveyed bio-ontologies Ontology Applicatio

n scenario Domain-oriented

component Generic

component Detail level Conceptual

representation Host application

tasks Storage and

access Development

years TaO ontology-

based search

Proteins, enzymes, motifs, secondary and tertiary

structure, functions and processes, sub-cellular

structure and chemicals.

∨∨∨∨

High (due to

possibility of dynamic extension)

DL Ontology-based

user interface, Terminology server

storage - ? query

processor (Java)

1995-1998

GO controlled vocabulary

for DB annotation

Drosophila, mouse and yeast gene and gene product

function, process and cellular location

×××× High Taxonomies ××××

Text, XML files.

Java browser

1998 -

EcoCyc DB schema E.coli genes, metabolism, regulation, signal

transduction and metabolic pathways

∨∨∨∨ High Frames

Visualization of biochemical reactions and layout of genes with chromosomes

storage - DB query

processor (LISP)

1997-1999

MBO community reference

genes, pathways, reactions (shallow) ∨∨∨∨ Low Taxonomies ×××× Java Browser 1997-1998

Signal ontology

controlled vocabulary, community reference

molecular function, cellular function, reaction (shallow) ×××× Low

Taxonomies (current version)

××××

Text, XML files

2000 -

18

The TAMBIS project [42] was one of the first attempts to integrate several different

biological resources by supporting uniformed queries across such sources. The TAMBIS manages the heterogeneity trough the mapping between its ontology and the real recourses. However, it seems that the TaO cannot be used as semantic repository for community, as far as the TaO is built into a specific application, which it was initially intended for, and captures wide but shallow range of biological concepts. The GO could be used as common semantic repository or controlled vocabulary. However, currently it has not been used by applications for integration, although it provides high detail level and captures important molecular biological fields, like molecular function and biological process.

Thus, the ontology-based approaches for the biological resource integration have to be developed. This can be done by creation of new bio-ontologies, by reuse and integration of existing ones, and by extension and adoption of appropriate integration approaches from other fields.

Integration and reuse of bio-ontologies. Currently there are only a few reusable bio-ontologies. This is partially because of the

diversity of their representation forms, because of the explicitness of their semantics and the range applications they address. Moreover, still there are also no approaches for integration of bio-ontologies. However, it is obvious that when developing a new application for the integration of biological data for different tasks (like data warehousing, mediator systems for querying distributed heterogeneous sources), the ontology put in the base of such an application should not be designed from scratch, rather it should integrate all or some modules of existing ontologies, since the process of ontology building is a high-cost process. For this goal, approaches for ontology integration and exchange, unified languages for ontology representations (like DAML+OIL [7] being very promising for this goal), semantic vocabularies and catalogues for different domains of biology should be developed. All this requires a very close collaborative work of people from biology and computer science community.

(iii) Ontology-based annotation. Functional annotation of genes and gene products, that is an association of functional

data with a gene product (sequence annotation), is one of the key tasks in bioinformatics. Currently functional classification schemes are just simple hierarchies, which start by defining a function in very general terms, and become increasingly specific as one progresses downwards over the hierarchy. Actually, functional classes of genes do not form a strict tree-like hierarchy, as far as many genes have multiple functions, rather their class structure forms a directed acyclic graph. Hence the most effective functional schemas could be a multi-dimensional one, which allows accurate positioning of gene products in the functional space [34].

The use of an ontology as the mean for the sequence annotation would allow a consistent and rigorous annotation. A newly submitted sequence is described in terms taken from the ontology and, hence, is correctly classified into an hierarchy. Such an annotation would be sensible and consistent. Therefore, ontologies would allow more effective information retrieval and analysis (e. g. a sequence comparison for discovering functions of a new sequence). GO

19

could be considered as a representative of the “next generation” of functional classification schemes. However now, there is a big gap between simple low-level tree-like classification schemas used in existing databases and the GO itself. Thus the tools for mapping are needed.

Additionally, for bio-ontologies there are also the same problems that exist for ontologies in general, namely the creation of ontology development tools (editors), ontologies libraries, development of methodologies supporting the development and use of ontologies. Acknowledgements: The author would like to gratefully acknowledge Prof. Rahm and Do Hong Hai for the helpful discussions, Borys Omelayenko (Division of Mathematics and Computer Science, Vrije Universiteit Amsterdam) for his useful comments, and Graduiertenkolleg “Wissensrepresentation” (DFG) for the financial support in carrying out this research.

20

References

1. P.G. Baker, C.A. Goble, S. Bechhofer, N.W. Paton, R. Stevens, and A Brass. An Ontology for Bioinformatics Applications. Bioinformatics, 15(6): 510-520(1999).

2. The Molecular Biology Ontology Working Group WWW resources: http://smi-web.stanford.edu/projects/bio-ontology/

3. A. Borgida. Description Logics in Data Management. IEEE Trans Knowledge and Data Engeneering, 7(5): 671-782 (1995).

4. CBIL's Controlled Vocabularies: http://www.cbil.upenn.edu/anatomy.php3 5. M. Cherry. A Report on the Status of the Gene Ontology Consortium. ISMB (2000). 6. CSO ontology WWW resources:http://ontology.ims.u-tokyo.ac.jp/signalontology/ 7. DAML+OIL: www.daml.org 8. Enzyme Nomenclature: http://www.chem.qmw.ac.uk/iubmb/enzyme/ 9. D. Fensel et al. OIL in a nutshell In: Knowledge Acquisition, Modeling, and Management,

Proceedings of the European Knowledge Acquisition Conference (EKAW-2000), R. Dieng et al. (eds.), LNAI, Springer-Verlag (2000).

10. D. Fensel. Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce. Springer-Verlag, Berlin (2001).

11. W. M. Gelbart. Databases in genomic research. Science. Oct 23;282(5389), 659-61 (1998). 12. Gene Ontology WWW resources: http://www.geneontology.org 13. Gene Ontology Consortium. Gene Ontology: Tool for the Unification of Biology. Nature

Genetics, 25:25-29 (2000). 14. M. Gerstein. Integrative database analysis in structural genomics. Nat. Struct. Biol. 7 Suppl

: 960-3 (2000). 15. T. R. Gruber. Towards Principles for the Design of Ontologies used for Knowledge

Sharing. International Journal of Human-Computer Studies, 43, 907-928 (1995). 16. I. Horrocks. Using an expressive description logic: FaCT or fiction? In A. G. Cohn, L.

Schubert, and S. C. Shapiro, editors. Principles of Knowledge Representation and Reasoning. Proceedings of the Sixth International Conference (KR'98), pages 636-647. Morgan Kaufmann Publishers, San Francisco, California, (June 1998).

17. Hasan M. Jamil. Achieving Interoperability of Genome Databases Though Intelligent Web Mediator. In the Proceedings of the IEEE International Symposium on Bio-Informatics and Biomedical Engineering (BIBE 2000), November 8-10, 2000, Washington DC, USA.

18. HUGO Gene Nomenclature Committee: http://www.gene.ucl.ac.uk/nomenclature/ 19. R. Jasper, M. Uschold. A Framework for Understanding and Classifying Ontology

Applications. KAW`99 (1999). Published on-line: http://sern.ucalgary.ca/KSI/KAW/KAW99

20. P. Karp, N. Abernethy et. al. An Evaluation of Ontology Exchange Languages for Bioinformatics, Robin McEntire (1999): http://smi-web.stanford.edu/projects/bio-ontology/

21. P.D. Karp, M. Riley, M. Saier, I.T. Paulsen, S. Paley, A. Pellegrini-Toole. The EcoCyc and MetaCyc Databases. Nucleic Acids Research 28(1):56-59 (2000).

21

22. P. Karp, K. Vinay, Chaudhri and Jerome Thomere. XOL: An XML-Based Ontology Exchange Language, 1999. http://smi-web.stanford.edu/projects/bio-ontology/

23. P.D. Karp. An Ontology for Biological Function Based on Molecular Interactions. Bioinformatics 16(3) 269-85 (2000).

24. P.D. Karp. EcoCyc: The Resource and the Lessons Learned. In Bioinformatics Databases and Systems, S. Letovsky, ed., Kluwer Academic Publishers, 47-62 (1999).

25. U. Leser. Designing a Global Information Resource for Molecular biology, 26. N. Luscombe, D. Greenbaum, M. Gerstein. What is bioinformatics? An introduction and

overview. IMIA (2001, in press). 27. MBO Java ontology browser: http://igd.rz-berlin.mpg.de/~www/oe/mbo.html 28. The NCBI Taxonomy Homepage:

http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/ 29. Open Knowledge Base Connectivity Standard WWW resources:

http://www.ai.sri.com/~okbc/ 30. Ontology Interchange Language (OIL) WWW resources:

http://www.ontoknowledge.org/oil/ 31. Ontology Markup Language (OML/CKML): http://www.ontologos.org/OML/ 32. Ontolingua, http://www-ksl-svc.stanford.edu:5915/doc/frame-editor/index.html 33. Rector A.L., Bechhofer S., Goble C.A., Horrocks I., Nowlan W.A., and Solomon W.D.,

The GRAIL Concept Modelling Language for Medical Terminology. Journal of Artificial Intelligence in Medicine, Kluwer Publishing, Vol 9, 139-171, (1997).

34. Rison S., Hodgman T.C., Thornton J.M., Comparison of functional annotation schemes for genomes. Functional Integrative Genomics, Springer-Verlag, Vol 1, 56-59 (2000) .

35. S. Schulze-Kremer. Ontologies for Molecular Biology. Proceedings of the Third Pacific Symposium on Biocomputing, Hawaii, World Scientific Publishers, Singapor, pp. 693-704 (1998).

36. S. Schulze-Kremer. Integrating and Exploiting Large-Scale, Heterogeneous and Autonomous Databases with an Ontology for Molecular Biology. In: Molecular Bioinformatics, Sequence Analysis - The Human Genome Project (R. Hofestaedt and H. Lim eds). Shaker Verlag, Aachen, pp. 43-56 (1997).

37. SRS6: http://srs6.ebi.ac.uk/ 38. R. Stevens, P. Baker, S. Bechhofer, G. Ng, A. Jacoby, N.W. Paton, C.A. Goble, and A.

Brass. TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. Bioinformatics, 16(2):184-186 (2000).

39. R. Stevens, C.A. Goble, and S. Bechhofer. Ontology-based Knowledge Representation for Bioinformatics. The journal Briefings in Bioinformatics (2000).

40. T. Takai-Igarashi, Y. Nadaoka, and T. Kaminuma. A database for cell signaling networks. J. Comp. Biol., 5(4), 747 (1998).

41. T. Takai-Igarashi, T. Takagi. Cell Signaling Ontology. ISMB BioOntology Workshop, August 24, SanDiego (2000).

42. TAMBIS Project WWW resources: http://img.cs.man.ac.uk/tambis/

22

43. H. Wache, T. Vögele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann and S. Hübner. Ontology-Based Integration of Information - A Survey of Existing Approaches. Submitted to IJCAI 2001 Workshop: Ontologies and Information Sharing.

44. Zdobnov E., Lopez R., Apweiler R., Etzold T.; "The EBI SRS server- recent developments."In: Proceedings of the German Conference on Bioinformatics (GCB'00), Bornberg-Bauer E., Rost U., Stoye J., Vingron M. (eds.), pp. 139-147, Logos Verlag, Berlin, Germany (2000).


Recommended