Fourth Annual Bio-Ontologies Meeting
Fourth Annual Bio-Ontologies Meeting
Sharing Experiences and Spreading Best Practice
26 July 2001
this, the fourth annual bio-Ontologies meeting has been sponsored by GlaxoSmithKline Pharmaceuticals, for whose support we are grateful.
Organised by Carole Goble and Robert Stevens (co-chairs); Peter Karp, Pat Hayes, Robin McEntire, Richard Chen and Eric Neumann
About the Workshop
We would like to invite you to the fourth Annual Bio-Ontologies Meeting (Bio-Ontologies 2001), on July 26th in Copenhagen, Denmark. This is immediately after the ISMB-01 July 21-25 in Copenhagen.
the programme is now available for the Bio-Ontologies meeting. We have a key-note speaker, Alan Flett from Semantic Edge, who will describe experience of the use of ontology in the business world. We have three invited speakers, who will discuss the roles of ontologies in medical informatics; text analysis and the analysis of gene transcription data. We have ten short talks selected from abstracts submitted to the meeting. An innovation for this year is a panel discussion with ontologists from industry, biology, computer science.....
the Bio-Ontologies meeting is co-located with ISMB in Tivoli Gardens. The Bio-Ontologies meeting is in NIMB (room called Lumbysalen). All registrants receive a map of Tivoli where NIMB is clearly shown via the mail. Most registrants who are already attending ISMB will have their Bio-Ontology meeting badges already via their ISMB registration, they need to wear these on the day. Registrants should come through the NIMB entrance on the 26th. If they are only attending Bio-Ontologies they can register here also.
The goal of this consortium is the identification and promotion of a practical set of technologies that will aid in the knowledge management and exchange of concepts and representations in the life sciences. The first meeting took place in Montreal in 1998, and made clear the general interest and support people had for ontologies in the life sciences. The following year in Heidelberg we discussed ontology exchange and presented ontologies currently under development. the third meeting, in La Jolla, California, USA, continued this theme, reporting several new ontologies and presenting up-dates on existing ontologies.
Many in the group have been active since our last meeting. The community now has considerable
http://www.cs.man.ac.uk/~stevensr/workshop01/ (1 of 3) [10/05/2004 14:50:03]
Fourth Annual Bio-Ontologies Meeting
experience in the development and deployment of ontologies in the life sciences, so it is appropriate for us to take stock and reflect. So the theme for this year's meeting is Sharing Experiences and Spreading Best Practice. The idea is that we share not only the results of our labours but how we got there, and what we wished we had known while we did it.
Topics that will be discussed include:
● Shared experiences in using ontology tools, development methodologies, comparing ontologies, and reusing other people's ontologies;
● The latest in ontology languages and ontology exchange languages, including an update on the Ontology Inference Layer (OIL);
● Ontologies produced by members from various genomics and life-science efforts;● Specific uses of ontologies in research and drug discovery especially pre-competitive ontologies
for the industry;● Updates on ontology development in general.
Venue
The meeting will be held on July 26th in Tivoli Gardens, Copenhagen. Attendees are expected to make their own arrangements for travel and accommodation (we assume most will simply extend their bookings for ISMB). For more information on hotels with special ISMB conference rates, see ISMB Hotels.
Workshop Details
The day-long seminar will be divided into two sections:
● A key-note address followed by invited speakers on the theme of the workshop, followed by● A series of 20 minute talks selected from respondents to this call for abstracts, covering a range of
topics that may be wider than the theme of the workshop.●
Some intended results of the fourth meeting will be:
❍ Establish a user community for sharing experiences with designing and building ontologies for the Life Sciences;
❍ Enlist support from the Knowledge Management community for tools and
http://www.cs.man.ac.uk/~stevensr/workshop01/ (2 of 3) [10/05/2004 14:50:03]
Fourth Annual Bio-Ontologies Meeting
methodologies to aid our ontology efforts;❍ Create a permanent portal for the exchange of ontologies, ontology building tools
and relationships with other organizations engaged in similar ontology-building tasks;
❍ Establish a consortium for promoting and sharing open-source ontologies in the Life Sciences.
Registration is on line and the registration page is now available. It will be possible to register at the ISMB conference and on the day of the meeting itself. All registrants who are also registering for ISMB will be registered for the satellite meeting on the 26th at the same time. Those that are only attending on the day will be able to register onsite. ISMB itself has reached capacity registration, but the Bio-Ontologies meeting has up to 200 places, of which approximately 140 have been filled.
Deadlines
1 June 2001Please submit abstracts [email protected]
This page maintained by Robert Stevens. Last altered on 25 June 2001.
http://www.cs.man.ac.uk/~stevensr/workshop01/ (3 of 3) [10/05/2004 14:50:03]
Web Images Groups News more »
Advanced Search Preferences Language Tools
Search: the web pages from the UK
Advertising Programmes - About Google - Go to Google.com
©2004 Google - Searching 4,285,199,774 web pages
http://www.google.co.uk/ [10/05/2004 15:14:32]
Google Search: Bio-ontologies
Web Images Groups News more »
Advanced Search Preferences
Search: the web pages from the UK
Web Results 1 - 10 of about 675 for Bio-ontologies. (0.43 seconds)
The Bio-ontologies Working Group... Meeting History: Bio-Ontologies '00 Meeting, August 5,1999 The latestBio-Ontologies workshop took place at ISMB '00 in San Diego, CA. ... smi-web.stanford.edu/projects/bio-ontology/ - 7k - Cached - Similar pages
Fourth Annual Bio-Ontologies MeetingFourth Annual Bio-Ontologies Meeting. Sharing Experiences and Spreading BestPractice. ... the programme is now available for the Bio-Ontologies meeting. ... img.cs.man.ac.uk/stevens/workshop01/ - 7k - Cached - Similar pages
[PPT] Bio-Ontologies: Bio-Ontologies: Their Creation and DesignFile Format: Microsoft Powerpoint 97 - View as HTML-Ontologies: Bio-Ontologies: Their Creation and Design. Dr. Peter Karp. SRI, http://www.ai.sri.com/~pkarp/. ... Advertisement.The Fourth Annual Bio-Ontologies Meeting. ... img.cs.man.ac.uk/stevens/tutorial01/master.ppt - Similar pages[ More results from img.cs.man.ac.uk ]
go-2001: Fourth Bio-Ontologies Meeting -- 2001Fourth Bio-Ontologies Meeting -- 2001. From: Robert Stevens ([email protected])Date: Tue May 15 2001 - 08:27:36 PDT. ... Stevens Chair Bio-Ontologies 2001. ... www.geneontology.org/email-go/go-arc/go-2001/0364.html - 8k - Cached - Similar pages
go-2001: Re: Fourth Bio-Ontologies Meeting -- 2001... Reply: Midori Harris: "Re: Fourth Bio-Ontologies Meeting -- 2001 ... at ISMB, and, now,will submit a abstract for a presentation at the BioOntologies meeting as ... www.geneontology.org/email-go/go-arc/go-2001/0370.html - 9k - Cached - Similar pages[ More results from www.geneontology.org ]
Plant Ontology Mailing List Archive: [Fwd: The Seventh Annual Bio ...[Fwd: The Seventh Annual Bio-ontologies Meeting Call For Submissions]. ... Submissiondetails at http://bio-ontologies.man.ac.uk. -- This ... www.plantontology.org/mailarch/2004/0042.html - 13k - Cached - Similar pages
Bio-Ontology
http://www.google.co.uk/search?q=Bio-ontologies&ie=ISO-8859-1&oe=ISO-8859-1&hl=en&meta=on (1 of 2) [10/05/2004 15:14:44]
Google Search: Bio-ontologies
Bio-Ontologies: A List of links. What is an Ontology? "An ontologyis an explicit specification of some topic. For our purposes, it ... anil.cchmc.org/Bio-Ontologies.html - 10k - Cached - Similar pages
Bio–Ontologies: Tools, Techniques, and Examplesback to Tutorial Program Bio–Ontologies: Tools, Techniques, and Examples.Carole Goble ... A Survey of Current Bio–Ontologies. The bio ... www.iscb.org/ismb2000/tutorials/goble.html - 12k - Cached - Similar pages
Sklyar, Nataliya: Survey of existing Bio-ontologies... 7 4. Bio-ontologies The use of ontologies within bioinformatics is relatively recentand ... In this section, we present a review of some selected bioontologies. ... dol.uni-leipzig.de/pub/2001-30/en - 57k - Cached - Similar pages
Christopher Thomas - bio-Ontologiesbio-Ontologies, Databases and Dictionaries. ... lsdis.cs.uga.edu/~cthomas/bio_ontologies.html - 11k - Cached - Similar pages
Result Page: 1 2 3 4 5 6 7 8 9 10 Next
Search within results | Language Tools | Search Tips | Dissatisfied? Help us improve
Google Home - Advertising Programmes - About Google
©2004 Google
http://www.google.co.uk/search?q=Bio-ontologies&ie=ISO-8859-1&oe=ISO-8859-1&hl=en&meta=on (2 of 2) [10/05/2004 15:14:44]
The Bio-ontologies Working Group
The Molecular Biology Ontology Working Group
Updated 11-21-00
About this site:This web site was put in place to follow up from the Ontologies for Molecular Biology Workshop, which took place at ISMB '98 in Montreal. The site and mailing list have been set up primarily for the 50-odd attendees, to address the need for continued dialogue on standards and goals for a group effort on ontologies for molecular biology.
Meeting History:
Bio-Ontologies '00 Meeting, August 5,1999The latest Bio-Ontologies workshop took place at ISMB '00 in San Diego, CA.
Bio-ontologies Working Group meeting, August 5, 1999Click here for information on the 1999 meeting of the Bio-ontologies working group.
Two documents introduced at the 1999 meeting are available for download, the XOL
Specification (an OKBC-friendly, XML-based ontology exchange language), and the working group's first report, An Evaluation of Ontology Exchange Languages for Bioinformatics. Both documents are in Microsoft Word 97 format.
Core Development Group Meeting, March 29,1999Following the meeting at the Computational Genomics conference (see above), the core of active group participants have worked to evaluate the primary representation languages/environments, Ontolingua and OML (selected after the last meeting). A meeting was held on March 29 at Pangea to share results and discuss next steps for the group. Click here to see the meeting minutes.
Conference Meeting Nov. 4, 1998th
Last year, in response to the call for participation in the the Ontologies for Molecular Biology Working Group, the core group met after the Second Annual Conference on Computational Genomics (http://www.tigr.org/cet/gss/cg/index.html) on Nov. 4th,
http://smi-web.stanford.edu/projects/bio-ontology/ (1 of 3) [10/05/2004 15:14:56]
The Bio-ontologies Working Group
1998. Details...
Resources:
FTP Site:An FTP site is running at ftp://smi.stanford.edu/pub/bio-ontology/ currently contains the XOL spec and Ontology Exchange documents listed above. Further content is pending submission from other individuals in this group. Material should be deposited in the subdirectory "incoming" (for security reasons). Please submit zipped files containing the directory structure appropriate to the component files. They will then be placed into the bio-ontology directory. All material should be supplied by the owner, and accompanied by a README with property and usage information.
Mailing List:The mailing list membership is composed primarily of the original ontology workshop attendees, but is not closed. Posts to the group can only be made by members. If you have an interest in this topic and would like to be added to the list, please send a request to [email protected].
Links:
The ImMunoGeneTics Database
http://imgt.cnusc.fr:8104
The Organelle Genome Database (GOBASE)
http://megasun.bch.umontreal.ca/gobase/
Ongoing Ontology Projects
http://www.cs.utexas.edu/users/mfkb/related.html
http://smi-web.stanford.edu/projects/bio-ontology/ (2 of 3) [10/05/2004 15:14:56]
The Bio-ontologies Working Group
OML Bioinformatics support page(Includes example ontologies in OML from the working group)
http://wave.eecs.wsu.edu/Bioinformatics/Bioinformatics.html
Robert Stevens Bio-ontologies Page
http://img.cs.man.ac.uk/stevens/ontology.html
Please feel free to submit link suggestions or files to add to the ftp site.
http://smi-web.stanford.edu/projects/bio-ontology/ (3 of 3) [10/05/2004 15:14:56]
Stanford KSL Network Services
Warning: You are using Netscape 3.0 on the PC. Versions after 3.0b5 on the PC have a known bug in URL redirection from GET form submission that makes it not work properly with this service (lots of blank pages and "transfer interrupted" messages). In fact, this release of Netscape is so broken that the rest of this page might not appear without you having to hit the reload button (possibly numerous times(!). Please back off to Netscape 3.0b5, 2.0, or flame them until they fix it. We have reported this bug.
Welcome to Stanford KSL Network Services. Home of the Ontolingua Ontology Editor
Please log in:
User ID
Password (ignored if you are a new user)
Trouble logging in?Projects in our user communityRelevant papersLegal notice
http://www-ksl-svc.stanford.edu:5915/&service=frame-editor (1 of 2) [10/05/2004 15:15:16]
Stanford KSL Network Services
There are 7 other users currently connected. Requests handled since server boot at 26 April 2004, 11:08:37: 930,636 Total requests handled: 10,646,350 OKBC Requests handled since server boot at 26 April 2004, 11:08:37: 236 Total OKBC requests handled: 5,704,981
The KSL gratefully acknowledges the support of the following funding agencies for this work:
1. Defense Advanced Research Projects Agency (DARPA) and the Department of the Navy, Rapid Knowledge Formation (RKF) program, under contract N66001-00-C-8027-P00001.
2. Defense Advanced Research Projects Agency (DARPA) and the Department of the Navy, High Performance Knowledge Bases (HPKB) program, under contract N66001-96-C-8622-P00001.
3. DARPA and the National Institute for Standards and Technology (NIST), Rapid Development, Exploration and Optimization (RaDEO) program, under cooperative agreement 70NANB6H0075.
4. DARPA and the Defense Logistics Agency under the Advanced Logistics Program (ALP). 5. DARPA/ETO, U.S. Army Research Laboratory, Lockheed Martin Advanced Technology
Laboratories, and Sandpiper Software Corporation, Rapid Prototyping of Application-Specific Signal Processors (RASSP), contract DAAL01-93-C-3380.
6. CommerceNet under contract CN-1094 (TRP #F33615-94-4413). 7. Boeing Corporation.
This application copyright (c) 1995, 1996, 1997, 1998, 1999, 2000, 2001 Stanford University Knowledge Systems Laboratory.
Contact us at [email protected]
http://www-ksl-svc.stanford.edu:5915/&service=frame-editor (2 of 2) [10/05/2004 15:15:16]
Sklyar, Nataliya: Survey of existing Bio-ontologies
[Deutsche Version]
Category Value
Available via http://lips.informatik.uni-leipzig.de:80/pub/2001-30/en
Submitted on
28th of September 2001
Author Sklyar, Nataliya
Title Survey of existing Bio-ontologies
Date of publication
2001
CitationSklyar, Nataliya. Survey of existing Bio-ontologies, Techn. Report 5/2001, Dept. of Comp. Science, Univ. of Leipzig
Number of pages
23
Language English
Organization The Institute of Computer Science
Type Technical Report
Subject group
Computer Science, Data Processing
Abstract
Ontologies have become popular in the fields of intelligent information integration, cooperative information systems, electronic commerce and knowledge management. Recently, ontologies for knowledge representation in the biological domain have also appeared. This survey is intended to provide a brief state-of-art introduction into ontology-based biological systems. This work aims to investigate the role of ontologies and the possibilities opened by the use of ontologies in bioinformatics. The description of several selected bio-ontologies is given. For every ontology, the structure, scope, area of application, characteristics, representation language are considered. A comparison of described ontologies and future possibilities for the use of bio-ontologies are discussed.
Keywords Ontology, bio-ontologies, bioinformatics
Contact address
Nataliya Sklyar, Universität Leipzig, Institut für Informatik, Augustusplatz 10 - 11, 04109 Leipzig
http://dol.uni-leipzig.de/pub/2001-30/en (1 of 2) [10/05/2004 15:15:56]
Sklyar, Nataliya: Survey of existing Bio-ontologies
Sponsored by
DFG
Copyright
Das Copyright liegt beim Autor/bei den Autoren. Der Leser ist berechtigt, persцnliche Kopien fьr wissenschaftliche oder nichtkommerzielle Zwecke zu erstellen. Jede weitergehende Nutzung bedarf der ausdrьcklichen vorherigen schriftlichen Genehmigung des Autors/der Autoren.
Source(s)
● Postscript ( ps , ps.gz [ 440709 bytes ] , ps.zip ) ● PDF ( pdf , pdf.gz [ 247235 bytes ] , pdf.zip ) ● Plain text ( text , text.gz [ 55899 bytes ] , text.zip ) ● Pagewise preview
Management of the document by
Leipzig University Multimedia Publication Server
http://dol.uni-leipzig.de/pub/2001-30/en (2 of 2) [10/05/2004 15:15:56]
Institut für Informatik
Survey of existing bio-ontologies
Nataliya Sklyar
Report. Nr.
September 2001
1
Survey of existing Bio-ontologies
Nataliya Sklyar
Department of Computer Science, University of Leipzig Augustusplatz 10/11, D-04109 Leipzig, Germany
e-mail: [email protected]
September 2001
Abstract. Ontologies have become popular in the fields of intelligent information integration, cooperative information systems, electronic commerce and knowledge management. Recently, ontologies for knowledge representation in the biological domain have also appeared. This survey is intended to provide a brief state-of-art introduction into ontology-based biological systems. This work aims to investigate the role of ontologies and the possibilities opened by the use of ontologies in bioinformatics. The description of several selected bio-ontologies is given. For every ontology, the structure, scope, area of application, characteristics, representation language etc. are considered. A comparison of described ontologies and also future possibilities for the use of bio-ontologies are discussed. 1. Introduction With the recent advance in bio-technology and bioinformatics, numerous genome databases
from various biological communities have been developed to assist in genomic research. There are hundreds of genomic and biological databases open for common access throughout the World Wide Web. They cover various types of biologically relevant information, such as genomic databases for different organisms, pathway databases, genetic sequence and molecular DBs etc. Genome-related databases can be divided into two major groups: generalized and specialized databases [11]. Generalized databases include such well known databases as GenBank, EMBL, DDBJ archives of nucleic acid sequences, SwissProt, and PIR polypeptide sequence databases. These databases capture and represent information on particular classes of molecules without any phylogenetic or functional exclusion. In contrast, specialized databases are organized around a specific model organism or around a specific biological function (Fig. 1). Biological databases have been created by different biological communities, within different stages of the biological knowledge development, for different purposes, and covered different aspects. All that leads to the high level of structural and semantic heterogeneity and to the autonomy of biological databases. Structural heterogeneity refers to differences in the DBMS and in data models. A large amount of biological data is stored in flat files, at the same time there are relational and object-oriented biological DBs. Semantic heterogeneity considers the content of databases and the
2
meaning of database categories. In general, molecular biology has so called “communication problem” [36]. Even the meaning of important high-level fundamental concepts is often ambiguous. For example the use of term “gene” differs from one to another biological system, some proteins and genes have the same names or the same protein or gene has different names in different sources. In genomic databases functions of genes and gene products are often expressed in natural language. This makes very difficult obtaining such important information like, for example, sets of proteins with certain function.
Genomic Databases
Generalized Specialized
Nucleic acid sequences(DNA, RNA)
Polypeptide sequences(proteins)
GenBank
EMBL
SwissPROT
PIR
Organism specific
Biological functionspecific
FlyBase
MGI
Pathways
Enzymaticreactions
EcoCyc
Structure
NDB
PDB
Figure 1. Classification of genomic databases To be able to compare results, infer and test new hypotheses, biologists need to have a
possibility to pose complex questions, analyze data from different information sources and different experiments. For example, in microarray analysis it would be useful to have information on the functional classification for every gene on the chip, on the pathways in which it works, and on the details of any protein-protein interactions in which it participates. All those require integration of the biological information.
A number of approaches for biological data integration and interoperation have been proposed. Hyperlink navigation is the simplest and the most common. It allows a user to interactively navigate from some presentation of an entry in one DB to an entry in another database. However, types of links between objects are implicit and so one can be dropped from one DB to another one loosing the context of his initial intentions.
Several well-known techniques, such as mediator systems [17] and federated databases [25], have been developed to provide uniform interfaces and common query languages for heterogeneous DBs. SRS (Sequence Retrieval System) technology [37, 44] has been specially developed to integrate heterogeneous biological data sources behind a single interface. Currently
3
it provides a powerful unified interface to over 400 different scientific databases [44]. However all these approaches aim to overcome of structural heterogeneity and they do not concern the integration of data contents of underlying sources.
When using more than one data store or analysis tool, a biologist needs to be sure that the knowledge within one resource can be reliably compared to those in another. Information integration in general and in biology in particular requires a consistent shared understanding of the meaning of that information. Ontologies provide a shared and common structure of a domain thus giving a common understanding of this domain, and may be used for overcoming semantic heterogeneity.
This survey gives an overview of ontologies that have been developed for molecular biology. This work aims to investigate the role of ontologies and the possibilities opened by the use of ontologies in bioinformatics.
The survey consists of several sections. In the next section we give a definition of ontology and its basic components. A certain attention is paid to the languages for ontology representation. In section 4, we describe several currently existing bio-ontologies. For each ontology, the structure, scope, area of application, characteristics, representation language etc. are considered. In the discussion section a comparison of described ontologies is given and future possibilities for the use of bio-ontologies are discussed.
2. Ontology: Concept, components and applications In the literature, a number of ontology definitions and their intended purposes is discussed.
Ontologies were developed in AI for the following purposes: knowledge sharing and reuse, data exchange among programs, unification of disparate data and knowledge representations, knowledge-based services etc.
The commonly used ontology definition, given by Gruber, is “ontology is a formal, explicit specification of a shared conceptualization” [15]. A conceptualization is an abstract, simplified view of the area of interest that we wish to represent.
The main ontology components are concepts, relations, instances, and axioms [39]. However, depending on the complexity, expressiveness and the representation, an ontology may contain only some of these components.
Concepts represent sets of objects within a domain that have certain common properties. For instance, “protein” is a concept within the domain of molecular biology. Relations describe interactions between concepts. There are two kinds of relations: hierarchical and associative relations. Hierarchical relations organize concepts into sub- and super-concept tree structures. Associative relations allow additional concept linking independent on their hierarchical arrangement. Relations may also have properties that provide further knowledge about them.
Instances are objects represented by a concept. Strictly speaking, ontologies should not contain any instances, because they are supposed to be a conceptualization of a domain, but not a description of a particular instances’ set. The latter has to be implemented in databases.
Axioms are used to constrain values of concepts and instances. In this sense, the relation properties are a kind of axioms, though axioms may include more general rules.
4
Due to a very general definition, the term “ontology” has a number of different treatments among the researchers. This also influences a representation and application of a particular ontology. In this sense, most commonly ontologies trend to be used as a common vocabulary, a database schema, a meta-level specification, or a structure of knowledge base.
Since the purpose of ontologies is to provide an explicit conceptualization that describes the semantics of data, they have similar functions to database, but differ from them in the following (see [10] for relevant discussion):
– Languages for describing ontologies are syntactically and semantically richer than those commonly used for databases;
– Information that is captured by ontologies may consist of semi-structured natural language texts rather than only structured data;
– Ontologies are intended to be reusable, while database schemas are usually applied only to a single application.
Many applications of ontologies address the issue of interoperability [43]. In nearly all ontology-based approaches, ontologies are used for the explicit description of the information source semantics.
3. Ontology representation languages To implement ontologies in applications, they must be represented in a certain language that
particular applications can understand. A variety of languages, based on different knowledge representation (KR) models, has been proposed, for example, Ontolingua, OMK/CKML (Ontology Markup Language/Conceptual Knowledge Markup Language), Frame-Logic, UML, XOL (Ontology exchange language), and OIL (Ontology Interchange Language).
3.1. Conceptual representation Current bio-ontologies use approaches of different complexity for their conceptual
representation: (i) controlled vocabularies, defined in natural language, (ii) taxonomies, (iii) object-based knowledge representation languages, such as frames and UML, and (iv) approaches based on logic expressed predicates, like Description Logics (DL) [20, 39]. Fig. 2 provides examples for different forms of ontology representations.
Controlled Vocabulary is an ontology representation that is simply a list of terms, defined in natural language (Fig. 2a). There is a number of biological and molecular biology vocabularies available over WWW [4, 8, 18, 28]. These are the ontologies with the lowest level of complexity and, thus, we will not provide any more details on them.
Taxonomy is a set of concepts that are arranged into generalization-specialization hierarchy (Fig. 2b). Usually taxonomies support a creation of handcrafted ontologies with simple tree-like inheritance structures, but multiple-inheritance may also be realized. Taxonomies define neither the web of relations for the concepts that provide the term semantics, nor the attributes of these terms.
Frame-based systems use the object-oriented data model, where objects are named frames (Fig. 2c). There are frames of two types: classes (concepts) and instances. Class frames represent
5
generic types of objects, such as class of biomolecules (Class_Biomolecule) and class of proteins (Class_Protein). Classes can be arranged in class-subclass hierarchies. Instances represent particular elements of a class, for example protein_LARD. A set of slots, which define attributes and relations of a class, is associated with every class.
a) Controlled Vocabulary
Biomolecule – is a...Protein – is a large molecule...Molecular function – ...
c) FramesBiomolecule
DNA Protein
Name: ...Chemical composition: ...Reference: ......
......
d) Description Logics
Biomolecule
DNA Protein
Molecular function
Cell growth Enzymeis-a is-a
Can have:"Has function"
Protein with cell growth function
Biomolecule
DNA Protein
Molecular function
Cell growth Enzymeis-a is-a is-a is-a
b) Taxonomies
Figure 2. Examples ontology representations in different languages Description Logics (DLs) provide a language for capturing declarative knowledge about a
domain, and support reasoning on that knowledge [3]. DL models an application domain in terms of concepts, roles (relations) and individuals (instances). The domain is a set of individuals, and a concept is a description of a group of individuals that share common characteristics (Fig. 2d). Roles model relations between, or attributes of, individuals. Information captured with DLs is classified to a rich hierarchical lattice of concepts and their inter-relations. The DL supplies a number of reasoning services, which allow a construction of classification hierarchies and verification of the description consistency. This enables automated development of a taxonomy for defined concepts. The further described TAMBIS ontology is based on the DL.
3.2. Ontology-exchange languages Since information sharing is one of the central problems in computer science and in
bioinformatics in particular, there has been a number of efforts to develop a set of requirements for the ontology-exchange language and to develop such a language.
In 1999, at the Bio-ontologies Working Group Meeting [2] the evaluation of a number of ontology-exchange languages was reported [20]. The goal of such an evaluation was to analyze capabilities of languages on the satisfaction to the bioinformatics requirements and hence recommend one or more languages for the use within the bioinformatics community. The diagram of some ontology languages relative to their semantics and syntax is given on fig. 3.
Ontolingua [32] (frame-based, with LISP syntax) and OML/CKML [31] (conceptual graph based, with XML syntax) were indicated to be the most expressive languages at that moment. However, it is believed only the frame systems provide the necessary representational
6
constructions to model ontologies for molecular biology and at the same time an XML-based syntax must be used for a bioinformatics ontology exchange language and so far it is likely that the language will see widespread acceptance. The main conclusion of this evaluation was that “the language that the bioinformatics community needs for the exchange of ontologies should be based on frame-based semantics with an XML expression”.
The language, which satisfies these recommendations was proposed by P. Karp and is called XOL (XML-Based Ontology Exchange Language) [22]. The ontology definitions, that XOL is designed to encode, include both schema information (meta-data), such as class definitions from object databases, and non-schema information (ground facts) such as object definitions from object databases. The syntax of XOL is based on XML and the semantics of XOL are based on OKBC-Lite, which is a simplified form of the knowledge model for the OKBC [29]. The XOL language provides a mechanism for encoding ontologies within a flat file that may be easily published on the WWW for exchange among a set of application developers. However, there are high-level ontologies, which have in the basis not only frames but also other logic-based representations model, as for example TAMBIS ontology based on DL. XOL is restricted only to a frame viewpoint of ontology, and exclude the logic-based ontologies.
Semantics
Conceptualgraph
Framerepresentation
Descriptionlogic
Syntax
XML, RDF LISP
OML/CKML OIL XOL Ontolingua
Figure 3. Diagram of the semantic and syntax components of the ontology languages An International consortium of European and US researchers has developed the Ontology
Inference Layer - OIL (also known as Ontology Interchange Language) being a proposal for a web-based representation of ontologies, which combines the widely used modeling primitives from frame-based languages with the formal semantics and reasoning services provided by the DLs. It is also compatible to the RDF Schema [30]. OIL is closely related to XOL and can be seen as an extension of XOL. OIL extends XOL to make it more suitable for capturing ontologies defined using a logic-based approach. OIL unifies three important aspects provided by different communities (see fig. 5): formal semantics and efficient reasoning support as provided by DL, rich modeling primitives as provided by the frame community, and a standard proposal for syntactical exchange notations as provided by the Web community.
Currently OIL has reached the state of a decidable core language. Also currently it has certain limitations, though it is intended to be extensible. OIL is now being suggested as the next generation RDF and as a standard for ontology representation for e-Commerce [30].
7
4. Bio-ontologies The use of ontologies within bioinformatics is relatively recent and consequently the
amount of ontologies is limited. In this section, we present a review of some selected bio-ontologies.
4.1. TAMBIS The Transparent Access to Multiple Bioinformatics Information Sources (TAMBIS) project
is intended to provide users with maximum transparency when accessing diverse bioinformatic data sources, shielding users from those sources [1, 38, 42]. For the end user it provides the illusion of a single query language, a single data model and a single data location. This is achieved by using the following principles.
− Conceptual representation of biological concepts and terminology, namely TAMBIS Ontology (TaO), against which the user can formulate queries;
− Mapping from terms of the conceptual representation onto terms in external sources to enable queries to the databases based on different schemes.
Thus, TAMBIS provides a level of indirection between the user and external sources. The focus of the project has been to demonstrate the possibility to develop an intelligent retrieval integration system using based on ontology approach. The data sources, which can be queried by TAMBIS include Swiss-Prot, CATH, Prosite, Enzyme, and BLAST databases.
As well as the TAMBIS Ontology itself, the complete TAMBIS system that implements this ontology is of interest since the ideas of database integration via ontology may be used for further research.
Architecture TAMBIS has five major components organized into a classical three-layer
mediator/wrapper architecture (Fig. 4). The presentation layer uses an intelligent user interface, in which the user combines concepts from the knowledge base to form declarative, source independent queries. This layer consists of a knowledge-driven query formulation interface. The mediation layer identifies the appropriate sources to satisfy a query and rewrites the query to an ordered list of source dependent procedures. The mediation level also manages a query translation process, transforming a query expression from a declarative conceptual one into an ordered collection of source specific calls. The wrapper layer manages the external sources, which are wrapped to provide a common interface that affords communication, format and network transparency. It includes a wrapper service for retrieving data from external source. Besides that, there is a terminology server for handling TaO, which is linked to both presentation and mediation layers.
TAMBIS Ontology The role of the TaO in TAMBIS [1] is
– to describe the knowledge that can be queried for, and the schemas of the underlying data sources;
8
– to link conceptual terms to their actual representations in the data sources; – to mediate between equivalent or near equivalent concepts in the different data sources
by means of the knowledge-base reasoning services; – to guide the user in the formulation of biologically correct queries.
Terminology Server
TaO LinguisticModel
Presentation layer
Query formulationdialogs
Ontology browsers
Query formulation
Mediation layer
Sources and services
CPLfunctions
Costs andcoersions
Ontology functionmapping
Rewrite rules
Query transformation
Wrapper layer
Wrapper service
Wrapper Wrapper
User
Interface
Figure 4. TAMBIS architecture The TaO is based on a DL. Since within the DL new concepts can be constructed from
existing ones, the TaO is a dynamic ontology. This means that it can grow without the need of either conceptualization or encoding new knowledge. The TaO uses ontology axioms to govern the concepts, which can be linked to another concepts via relations to form new concepts. Only biologically reasonable concepts can be formed, because the knowledge, which defines rules for composing new concepts, is captured in constrains upon relations.
The basic relations in the TaO contain “is a component of”, “has name”, “has function”, “is homologous to”.
The previous version of TaO used the GRAIL language [33]. In the current version it also has been represented with the more powerful and optimized tool – FaCT [16].
The TaO can be divided into the high- and low-level parts. The high level divisions are taken from the models developed in the GALEN project and described in [33]. This general foundation has been extended in TAMBIS with the lower level concepts necessary to represent user’s descriptions in the biological domain (see fig. 5).
The reusability of the TaO is based on the following facts. The TaO uses a representation language, based on the DL, and it has a partly module structure, that allows reuse of modules in other applications without knowing their internal structure in details. The TaO currently contains around 1500 biological concepts and their relations, and is capable of inferring many more by virtue of its knowledge representation scheme. Its coverage includes proteins and their components, motifs, protein structures, enzyme functions, enzyme and metabolic pathways, expressed sequence tags, nucleic acids, their component motifs, gene function and expressions,
9
sequence homology, and taxonomy of species.
Biological FunctionMolecularModificationBindingDNA Replication and RepairTransportMaintenance of StructureCellular Growth and ProfilerationSignal TransdactionEnzymatic FunctionReceptorHormoneToxinInhibitorTargettingStress
Biological Processbody processbiomolecular process
Body Structure
Cellular Substance
body fluidbiomolecular process
Abstract Structure
Physical Structure
protocolmethodclassificationreactionpathwayinformation source
solid structurecomp. of biomol. Struct.Chemical
ArchitecturalAttributeClassificationalAttribute
PhysicalLocativeAttributeProcessLocativeAttribute
TransformationAttributeMethodAttribute
TopThing
DomainCategory DomainAttribute
PhenomenonModifier ValueType
GeneralisedStructure
GeneralisedSubstance
GeneralisedProcess
GeneralisedFunction
Aspect
Collection
Unit
Feature
Selector
State
CollectionAttribute
FunctionalAttribute
LocativeAttribute
StructuralAttribute
SelectorAttribute
MultipleAttributePartitiveAttribute
Biological concepts Roles (relations)
Figure 5. TaO concept hierarchy (the grayed rectangles indicate the high-level concepts, the transparent rectangles indicate the low-level concepts)
4.2. Gene Ontology The intention of the Gene Ontology Consortium [12, 13] is to create a shared biological
resource that would enable the community to describe gene products1 using a common vocabulary and semantics. The GO consortium was initiated in 1998 and is currently a collaboration among 5 database projects FlyBase, MGI, SGD, TAIR, and WormDB and covers over 5000 concepts. The GO is not intended to deal with the whole molecular biology knowledge captured in the community databases, but captures information about the role of gene products within an organism, because the knowledge of the biological role of proteins in one organism can often be transferred to other organisms [5]. On this basis, the GO provides controlled vocabularies for the description of third-party independent ontologies: “molecular function”, “biological process”, and “cellular component” of gene product. The names of these ontologies represent the corresponding attributes of the gene product, thus principally enabling a uniform querying of the collaborating databases for the information on gene products. Let us give the definitions to these attributes:
Molecular function describes the tasks performed by individual gene products. It is a capability that a physical gene product (or gene product group) carries as a potential. It describes only what it can do without specifying where or when this usage actually occurs. Examples of
1 Gene product is a biochemical material, either RNA or protein, resulting from expression of a gene
10
broad functional terms are “enzyme”, “transporter” or “ligand”. Biological process is accomplished via one or more ordered assemblies of molecular
functions. It often involves transformation, in the sense that something goes into a process and something different comes out of it. Examples of broad biological process terms are "cell growth and maintenance," or "signal transduction." A biological process is not equivalent to a pathway, and the GO does not capture any of the dynamics or dependencies that would be required to describe a pathway.
Cellular component encompasses sub-cellular structures, locations, and macromolecular complexes, like for example, nucleus, ribosome, and proteasome.
A gene product has one or more molecular functions and is used in one or more biological processes. It may be associated with one or more cellular components. Thus the relations between the gene product and molecular functions, biological processes and cellular components are all of many to many kind.
Cellular component
Cell wall
Cell wall (sensu Bacteria)
ExtracellularMembrane
Cell wall inner membrane
Type II proteine (sec) secretionsystem complex [GO: 0015627]
part of
part of
is a is a
is a
is a
is a is a
Figure 6. Sample GO inheritance scheme
The ontologies of molecular function, biological process and cellular component are represented as directed acyclic graphs (DAGs) or networks. DAG allows multiple inheritances with the “is-a” relationship, when a child term may be an "instance" of its parent, or with the “part-of” relationship, when a child is a component of its parent term. A child term may have relationships of different classes with its different parents (fig. 6). Nevertheless, most of relationships in the GO are of “is-a” kind and implement mainly the single inheritance. Thus, the GO ontologies are build in the form of taxonomies.
The GO has a well-detailed conceptualization level, though lacks any upper-level ontology organization. That is, it contains, for instance, 9 hierarchy levels in the molecular function ontology, 12 hierarchy levels in the biological process ontology [34], but does not contain any terms being of the level upper than the molecular function, biological process or cellular component concepts.
The GO is represented in form of text or XML files. This principally enables the use of the GO by different external databases, which are not initially included into the GO project. The
11
external database may collaborate with the GO by making cross-links between the GO terms and the objects of the database (typically gene products, or their surrogates, genes). It can also support queries that use the GO terms or it can contribute to the development of the GO expanding the vocabularies and refining the terms. The collaborating databases are provided with the links to the GO files. The amount of links from different databases is depicted in table 1 (on 19.02.2001).
Table 1. The number of links to GO from SGD, FlyBase, and MGI databases. 4.3. MBO The Ontology for Molecular Biology (MBO) is intended to provide clarity and
communication within the molecular biology database community. It was one of the first attempts to create an ontology as “a mean to provide a semantic repository to semantically order relevant concepts in molecular biology and to bridge the different notations in various databases by explicitly specifying the meaning of and relation between fundamental concepts in an application domain” [35]. This means that either the different databases would agree to the common MBO definition (and their annotation would be accordingly changed) or mapping of the differences between each DBs conceptualization could be made in terms of the MBO.
The MBO contains concepts and relationships that are required to describe biological objects, experimental procedures and computational aspects of molecular biology. It captures a very wide range of biological concepts and contains over 1200 nodes.
The MBO has an upper-level organizing ontology (Fig. 7) and includes also such upper-level common sense ontologies as µKosmos and Cyc.
Being
Object EventTemporalextent
Individual object Property Self-contentment
Abstract object
Physical object
Physicality
Mental object
Worldly object
Mentality
Energy
Matter
Mass content
Attribute
Relation
Arity
Identifier
Descriptor
Information content
Primary property
Secondary property
Objectivity
Occurance TimeActivitycontent
Abstract event
Physical event
Physicality
Mental event
Worldly event
Mentality
Human activity
Natural process
Human cause
Future
Past
Direction
Figure 7. Upper-level MBO ontology
SGD FlyBase MGI Biological Process 5,603 624 3,418 Molecular Function 5,710 5,277 4,529 Cellular Component 2,206 479 3,477 Total Gene Products Associated 6,312 5,413 5,518
12
The MBO also includes a number of biologically specific ontologies, e.g. for genes,
reactions, pathways, and compound. An example of the pathway ontology is represented on fig. 8. However, upper-level organizing ontologies do not have direct relationships to biological concepts and processes from biologically specific ontologies.
Although MBO captures wide range of biological concepts in its ontologies, these ontologies mainly do not interfere with each other and have different detail levels, for example compound ontology consists of instances, at the same time pathway ontology is ending at quite large grained concepts. The MBO can be considered as a taxonomy. The primary relationship used there is “is a” relationship and the concepts in MBO are in general given no attributes.
Pathways
Biosynthesis
Amino-Acid-Biosynthesis
Amino-Acid-Family-Syn
Ind-Amino-Acid-Syn
Carbo-Biosynthesis
Cell-Structure-Biosynthesis
Murein-Biosynthesis
Surface-Structure-Biosynthesis
Cofactor-Biosynthesis
Lipid-Biosynthesis
Nucleotid-Biosynthesis
Deoxyribonucleotide-Biosynthesis
Pur-And-Pyr-Syn
Ribonucleotide-Biosynthesis
Degradation
Amino-Acid-Degradation
Carbon-Degradation
Fatty-Acid-Degradation
Other-Degradation
Phosphorous-Compounds
Energy-Metabolism Intermediary-Metabolism
Central-Metabolism
Nucleotide-Metabolism
Nitrogen-Metabolism
Sulfur-Metabolism
Figure 8. Pathways ontology in MBO In the MBO project Java Ontology Browser and Ontology Editor [27, 36] have been
developed. 4.4. EcoCyc EcoCyc ontology is put in the base of an organism-specific Pathway/Genom Database that
describes the metabolic and signal transduction pathways of Escherichia coli, its enzymes and its transport proteins. The EcoCyc DB describes the known genes of Escherichia coli, the enzymes of small-molecule metabolism that are encoded of these genes, the reactions catalyzed by each enzyme, and the organization of these reactions into metabolic pathways [21, 23].
The ontology is employed to encode the functions of metabolic enzymes, signal-transduction proteins, transporters, and DNA-binding repressor proteins. The EcoCyc uses frame-based language for the ontology representation and for encoding its data. The frames are arranged in a class hierarchy, given in fig. 9 [23].
To describe all of the distinct molecular species in a pathways, frames for every species
13
have been created in the EcoCyc DB. There is also a frame for every substrate and for every enzyme in a pathway. Every reaction is represented as a distinct frame. Such separation of the biological entity representation from the function representation of that entity has a number of advantages: There is a many-to-many mapping between entities and functions, the representation is more normalized and therefore less redundant, etc. [24]
EcoCyc is a single organism DB and intended to capture genome and full biochemical network of Escherichia coli. Currently EcoCyc includes 139 metabolic-, and 20 signaling pathways, 946 reactions, 629 enzymes, and 4390 genes.
Thing
Chemicals Organisms Generalized-ReactionsEnzymatic-Reactions
Elements Macromolecules
Polynucleotides
DNA
Genetic-Elements
Chromosomes
Plasmids
DNA-Segments
Genes
icdA
Promoters
DNA-Binding-Sites
Operons
RNA
All-tRNAs
tRNAs
val-tRNAs
RNAs
Charged-tRNAs
Charged-val-tRNAs
valT-tRNAvalU-tRNAvalV-tRNAvalW-tRNAvalX-tRNAvalY-tRNA
Pathways
Slot nameleftrightsubstratesspontaneous?ec-numberdeltag0keqenzymatic-reactionin-pathwayspecies-distribution
Inverse
reactionreaction-list
Value-TypeChemicalsChemicalsChemicalsbooleanstringnumbernumberEnzym.-ReactionsPathwaysstring
Reactions
Small-Molecules
All-Amino-Acids
Amino-Acids
All-Carbohydrates
All-Folates
All-Nucleosides
Coenzymes
NADPNADPH
Ions
Anions
Cations
proton
Porphyrins
Vitamins
Unclassified-Compounds
isocitrate2-oxoglutaratecarbon-dioxide
Complexes Proteins
Protein-Complexes
Polypeptides
isocithase-cplx
Figure 9. EcoCyc ontology class hierarchy: classes are marked with bold and placed into the rectangular boxes, while objects are given in normal text without outline. There are also slots of the class Reactions
represented. 4.5. Cell signaling ontology Recently the Human Genome Center of Tokyo University started to develop the Cell
signaling ontology (SIGNAL-ONTOLOGY) [6, 41]. This ontology is based on the knowledge from the the database for cell signaling networks (CSNDB) [40]. The purpose of the project is to extract common natures of the cell signaling in the model species and to find "what the cell signaling is" and "how we can reconstruct the cell signaling system in the computer". SIGNAL-ONTOLOGY is created to be used as a controlled vocabulary of the cell signaling system and also as a common reference which the database developers can refer to.
The authors consider the cell signaling ontology as a kind of an object-oriented database system. The ontology features a flow diagram of a signal transduction and a conceptual hierarchy of biochemical attributes of signaling molecules. These two aspects are integrated into the object-oriented model as the method and the type of the object. Currently the ontology provides the
14
following conceptual classes. Signal Module is a signal processing class of the eukaryote model species. Every signaling
cascade can be reconstructed by a set of Signal Module instances through the messages passing between Input and Output Signals (see fig. 10).
Reaction is a class, representing the information on biochemical reactions, which transfer biological signals. Molecular interaction motifs, effects of the signals, components of the reactions, and properties of biological signals are included.
ligand->G-protein-coupled
receptor ->G-protein switching
input
output
G-proteinswitching ->
kinase
input
output
kinase cascade
input
outputconformation
change and releaseof signal peptide
-> nuclearlocalisation
input
output
ligand->transmembrane
signaling->phosphorylation
-> clustering
input
output
phospholipidsecond
messenger
secondmessanger ->
kinase
input
output
targetgene
expression
input
output
Input signalOutput signalPointer to “Molecular function” conceptPointer to “Cellular function” concept
Signal module
input
output
input
output
Signaling cascade
c ell
mem
bran
e
cellnucleus
Figure 10. Signaling cascade, consistent of a set of Signaling Modules in SIGNAL-ONTOLOGY
The Molecular Function class is intended to store the information on biochemical properties of molecules, which relate to the cell signaling.
Cellular Function is a class for the representation of the biological phenomenon that a Signal Module contributes to. This is a biological response performed by a series of Molecular Function.
The following classes are also defined: Tissue – for a set of tissues where biological processes take place, Cell – to represent hierarchically structural cell components, Molecule – to represent a list of molecule types. Every class represents a hierarchy of corresponding concepts. The pointers between components of the conceptual classes have been introduced to interrelate them. Signal module has pointers to Molecular functions, every Molecular Function component has pointers to genes and proteins, and Cellular Function also has pointers to Signal Module concept, genes and proteins. Pointing information is written in the link table. Currently, the ontology classes, except Signal Module class, are available at WWW in form of HTML- and XML-documents.
The SIGNAL-ONTOLOGY does not contain any upper-level ontology organization. In many cases it captures the same concepts as the GO but still has much less levels of hierarchy.
15
6. Discussion 6.1. Comparison of bio-ontologies and conclusions The bio-ontologies described above are not the only ones that deal with the bio-molecular
data. Existing ontologies, in general, have much more differences than similarities. They differ in their intention, structure, their coverage, and detail level. The following characteristics may be used for the ontology comparison:
(i) Application scenarios, the ways ontologies are used by applications. (ii) Although all considered ontologies are intended for molecular biology, they cover
different parts of this domain and have different detail levels. In this aspect, the ontologies may consist of one or both of the following components:
– Domain-oriented component, which includes domain specific component (e.g. genes, processes specific for one particular organism) and domain generalization components (e.g. gene function, gene structure);
– Generic component, which captures common high-level concepts, such as Thing, Physical, Abstract, Structure etc. This component can be especially useful for the ontology reuse, as it allows concepts to be correctly or more reliably placed (e.g. concept Process can be a parent for such concepts as body process, cellular process, and chemical process).
Even the ontologies that cover the same parts of the same domain can differ in their detail level, which determines how deep and wide they capture the lower level concepts (e.g. different types of proteins, enzymatic reactions, cellular processes).
(iii) One of the essential ontology characteristics is its conceptual representation. Different kinds of ontology representation have been described above, in section 3.1.
(iv) As the additional points to characterize ontologies we consider tasks, which ontology-based applications provide, and physical ontology representation – storage with means to access ontologies.
Table 2 summarizes the content, structure and representation of the surveyed bio-ontologies (partly it is based on the results presented by Stevens in [39]).
Most of ontologies are common regarding to some core concepts of molecular biology, such as Gene, Protein and related Biological Functions and Biological Processes. However, they differ extensively in both the content and notation of their knowledge. This is primary due to the wide range of tasks to which ontologies are applied. In general, these ontologies fit quite well the demands of particular application, but their use within others application requires a lot of additional efforts or in some cases seems to be even impossible. For instance, EcoCyc ontology has been used to support the DB schema, but its direct use as a controlled vocabulary for the DB annotation seems inconvenient.
Three ontologies the GO, MBO and Signal Ontology have been developed almost for the same purpose – to be used as controlled vocabulary and/or community reference providing a bridge between different notations and thus reducing the “communication problem”. These ontologies capture different but partially intersected domains, describing them with different notations and classifications, remaining at the same time consistent and correct. This makes their
16
integration or mapping definition difficult. The GO being used for the database annotation contains fine level of details, whereas TaO
is quite shallow, but due to the DL representation it can be easily extended to provide more complex and detailed concepts.
All the above allow us to make the following conclusions about the current state-of-art in the bio-ontologies:
– The bio-ontologies have been developed to capture different, sometimes intersected knowledge domains and for different purposes: common access ontology-based search, DB schema, controlled vocabulary for DB annotation etc. In general, any of existing ontologies cannot be substituted by another existing ontology.
– Conceptualizations of the same domain may differ without providing incorrect
knowledge. – Currently there is no ontology that captures the whole range of concepts in the
molecular biology domain. – Applications use only specific, narrow part of knowledge thus they will use only
subsets of a single global ontology if the latter is ever created. Thus it is more important to develop a set of comprehensive and detailed ontologies for different domains, rather than to create the global bio-ontology.
– There is a lack of reusability in most of the existing bio-ontologies because they had been made from scratches, when there were no ontologies in the biological domain they could be based on.
– Although a certain progress in the use of ontologies for the biological domain hasbeen achieved, still a number of challenging problems have to be solved.
6.2. Open problems From all the above the following open problems in the field of bio-ontologies can be
derived. The discussion of possible ways to their solutions and how the existing bio-ontologies can contribute to that are also given.
Integration of heterogeneous biological resources. The use of ontologies can help to overcome interoperability problems. In order to achieve
interoperability many ontology-based approaches to the information integration have been developed in different fields [43]. As for the bioinfomatics this problem still remains open. From one side biologist needs to have a possibility to analyze a wide range of data, to pose complex queries over different resources [14, 26]. From the other side, existing biological databases are encoded in different and incompatible formats; they have different data models, from flat-files to object-oriented databases. There are also no naming conventions between databases. Regarding pathways databases, these databases usually present the data in the form of images and it is difficult to make links between proteins on the pathway diagram and genes, for which an accession numbers are given. Most pathway databases do not store references to known genes in the databases and hence the name given in the pathway database might not resemble any name in sequence databases.
17
Table 2. The summary of the content, structure and representation of the surveyed bio-ontologies Ontology Applicatio
n scenario Domain-oriented
component Generic
component Detail level Conceptual
representation Host application
tasks Storage and
access Development
years TaO ontology-
based search
Proteins, enzymes, motifs, secondary and tertiary
structure, functions and processes, sub-cellular
structure and chemicals.
∨∨∨∨
High (due to
possibility of dynamic extension)
DL Ontology-based
user interface, Terminology server
storage - ? query
processor (Java)
1995-1998
GO controlled vocabulary
for DB annotation
Drosophila, mouse and yeast gene and gene product
function, process and cellular location
×××× High Taxonomies ××××
Text, XML files.
Java browser
1998 -
EcoCyc DB schema E.coli genes, metabolism, regulation, signal
transduction and metabolic pathways
∨∨∨∨ High Frames
Visualization of biochemical reactions and layout of genes with chromosomes
storage - DB query
processor (LISP)
1997-1999
MBO community reference
genes, pathways, reactions (shallow) ∨∨∨∨ Low Taxonomies ×××× Java Browser 1997-1998
Signal ontology
controlled vocabulary, community reference
molecular function, cellular function, reaction (shallow) ×××× Low
Taxonomies (current version)
××××
Text, XML files
2000 -
18
The TAMBIS project [42] was one of the first attempts to integrate several different
biological resources by supporting uniformed queries across such sources. The TAMBIS manages the heterogeneity trough the mapping between its ontology and the real recourses. However, it seems that the TaO cannot be used as semantic repository for community, as far as the TaO is built into a specific application, which it was initially intended for, and captures wide but shallow range of biological concepts. The GO could be used as common semantic repository or controlled vocabulary. However, currently it has not been used by applications for integration, although it provides high detail level and captures important molecular biological fields, like molecular function and biological process.
Thus, the ontology-based approaches for the biological resource integration have to be developed. This can be done by creation of new bio-ontologies, by reuse and integration of existing ones, and by extension and adoption of appropriate integration approaches from other fields.
Integration and reuse of bio-ontologies. Currently there are only a few reusable bio-ontologies. This is partially because of the
diversity of their representation forms, because of the explicitness of their semantics and the range applications they address. Moreover, still there are also no approaches for integration of bio-ontologies. However, it is obvious that when developing a new application for the integration of biological data for different tasks (like data warehousing, mediator systems for querying distributed heterogeneous sources), the ontology put in the base of such an application should not be designed from scratch, rather it should integrate all or some modules of existing ontologies, since the process of ontology building is a high-cost process. For this goal, approaches for ontology integration and exchange, unified languages for ontology representations (like DAML+OIL [7] being very promising for this goal), semantic vocabularies and catalogues for different domains of biology should be developed. All this requires a very close collaborative work of people from biology and computer science community.
(iii) Ontology-based annotation. Functional annotation of genes and gene products, that is an association of functional
data with a gene product (sequence annotation), is one of the key tasks in bioinformatics. Currently functional classification schemes are just simple hierarchies, which start by defining a function in very general terms, and become increasingly specific as one progresses downwards over the hierarchy. Actually, functional classes of genes do not form a strict tree-like hierarchy, as far as many genes have multiple functions, rather their class structure forms a directed acyclic graph. Hence the most effective functional schemas could be a multi-dimensional one, which allows accurate positioning of gene products in the functional space [34].
The use of an ontology as the mean for the sequence annotation would allow a consistent and rigorous annotation. A newly submitted sequence is described in terms taken from the ontology and, hence, is correctly classified into an hierarchy. Such an annotation would be sensible and consistent. Therefore, ontologies would allow more effective information retrieval and analysis (e. g. a sequence comparison for discovering functions of a new sequence). GO
19
could be considered as a representative of the “next generation” of functional classification schemes. However now, there is a big gap between simple low-level tree-like classification schemas used in existing databases and the GO itself. Thus the tools for mapping are needed.
Additionally, for bio-ontologies there are also the same problems that exist for ontologies in general, namely the creation of ontology development tools (editors), ontologies libraries, development of methodologies supporting the development and use of ontologies. Acknowledgements: The author would like to gratefully acknowledge Prof. Rahm and Do Hong Hai for the helpful discussions, Borys Omelayenko (Division of Mathematics and Computer Science, Vrije Universiteit Amsterdam) for his useful comments, and Graduiertenkolleg “Wissensrepresentation” (DFG) for the financial support in carrying out this research.
20
References
1. P.G. Baker, C.A. Goble, S. Bechhofer, N.W. Paton, R. Stevens, and A Brass. An Ontology for Bioinformatics Applications. Bioinformatics, 15(6): 510-520(1999).
2. The Molecular Biology Ontology Working Group WWW resources: http://smi-web.stanford.edu/projects/bio-ontology/
3. A. Borgida. Description Logics in Data Management. IEEE Trans Knowledge and Data Engeneering, 7(5): 671-782 (1995).
4. CBIL's Controlled Vocabularies: http://www.cbil.upenn.edu/anatomy.php3 5. M. Cherry. A Report on the Status of the Gene Ontology Consortium. ISMB (2000). 6. CSO ontology WWW resources:http://ontology.ims.u-tokyo.ac.jp/signalontology/ 7. DAML+OIL: www.daml.org 8. Enzyme Nomenclature: http://www.chem.qmw.ac.uk/iubmb/enzyme/ 9. D. Fensel et al. OIL in a nutshell In: Knowledge Acquisition, Modeling, and Management,
Proceedings of the European Knowledge Acquisition Conference (EKAW-2000), R. Dieng et al. (eds.), LNAI, Springer-Verlag (2000).
10. D. Fensel. Ontologies: Silver Bullet for Knowledge Management and Electronic Commerce. Springer-Verlag, Berlin (2001).
11. W. M. Gelbart. Databases in genomic research. Science. Oct 23;282(5389), 659-61 (1998). 12. Gene Ontology WWW resources: http://www.geneontology.org 13. Gene Ontology Consortium. Gene Ontology: Tool for the Unification of Biology. Nature
Genetics, 25:25-29 (2000). 14. M. Gerstein. Integrative database analysis in structural genomics. Nat. Struct. Biol. 7 Suppl
: 960-3 (2000). 15. T. R. Gruber. Towards Principles for the Design of Ontologies used for Knowledge
Sharing. International Journal of Human-Computer Studies, 43, 907-928 (1995). 16. I. Horrocks. Using an expressive description logic: FaCT or fiction? In A. G. Cohn, L.
Schubert, and S. C. Shapiro, editors. Principles of Knowledge Representation and Reasoning. Proceedings of the Sixth International Conference (KR'98), pages 636-647. Morgan Kaufmann Publishers, San Francisco, California, (June 1998).
17. Hasan M. Jamil. Achieving Interoperability of Genome Databases Though Intelligent Web Mediator. In the Proceedings of the IEEE International Symposium on Bio-Informatics and Biomedical Engineering (BIBE 2000), November 8-10, 2000, Washington DC, USA.
18. HUGO Gene Nomenclature Committee: http://www.gene.ucl.ac.uk/nomenclature/ 19. R. Jasper, M. Uschold. A Framework for Understanding and Classifying Ontology
Applications. KAW`99 (1999). Published on-line: http://sern.ucalgary.ca/KSI/KAW/KAW99
20. P. Karp, N. Abernethy et. al. An Evaluation of Ontology Exchange Languages for Bioinformatics, Robin McEntire (1999): http://smi-web.stanford.edu/projects/bio-ontology/
21. P.D. Karp, M. Riley, M. Saier, I.T. Paulsen, S. Paley, A. Pellegrini-Toole. The EcoCyc and MetaCyc Databases. Nucleic Acids Research 28(1):56-59 (2000).
21
22. P. Karp, K. Vinay, Chaudhri and Jerome Thomere. XOL: An XML-Based Ontology Exchange Language, 1999. http://smi-web.stanford.edu/projects/bio-ontology/
23. P.D. Karp. An Ontology for Biological Function Based on Molecular Interactions. Bioinformatics 16(3) 269-85 (2000).
24. P.D. Karp. EcoCyc: The Resource and the Lessons Learned. In Bioinformatics Databases and Systems, S. Letovsky, ed., Kluwer Academic Publishers, 47-62 (1999).
25. U. Leser. Designing a Global Information Resource for Molecular biology, 26. N. Luscombe, D. Greenbaum, M. Gerstein. What is bioinformatics? An introduction and
overview. IMIA (2001, in press). 27. MBO Java ontology browser: http://igd.rz-berlin.mpg.de/~www/oe/mbo.html 28. The NCBI Taxonomy Homepage:
http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html/ 29. Open Knowledge Base Connectivity Standard WWW resources:
http://www.ai.sri.com/~okbc/ 30. Ontology Interchange Language (OIL) WWW resources:
http://www.ontoknowledge.org/oil/ 31. Ontology Markup Language (OML/CKML): http://www.ontologos.org/OML/ 32. Ontolingua, http://www-ksl-svc.stanford.edu:5915/doc/frame-editor/index.html 33. Rector A.L., Bechhofer S., Goble C.A., Horrocks I., Nowlan W.A., and Solomon W.D.,
The GRAIL Concept Modelling Language for Medical Terminology. Journal of Artificial Intelligence in Medicine, Kluwer Publishing, Vol 9, 139-171, (1997).
34. Rison S., Hodgman T.C., Thornton J.M., Comparison of functional annotation schemes for genomes. Functional Integrative Genomics, Springer-Verlag, Vol 1, 56-59 (2000) .
35. S. Schulze-Kremer. Ontologies for Molecular Biology. Proceedings of the Third Pacific Symposium on Biocomputing, Hawaii, World Scientific Publishers, Singapor, pp. 693-704 (1998).
36. S. Schulze-Kremer. Integrating and Exploiting Large-Scale, Heterogeneous and Autonomous Databases with an Ontology for Molecular Biology. In: Molecular Bioinformatics, Sequence Analysis - The Human Genome Project (R. Hofestaedt and H. Lim eds). Shaker Verlag, Aachen, pp. 43-56 (1997).
37. SRS6: http://srs6.ebi.ac.uk/ 38. R. Stevens, P. Baker, S. Bechhofer, G. Ng, A. Jacoby, N.W. Paton, C.A. Goble, and A.
Brass. TAMBIS: Transparent Access to Multiple Bioinformatics Information Sources. Bioinformatics, 16(2):184-186 (2000).
39. R. Stevens, C.A. Goble, and S. Bechhofer. Ontology-based Knowledge Representation for Bioinformatics. The journal Briefings in Bioinformatics (2000).
40. T. Takai-Igarashi, Y. Nadaoka, and T. Kaminuma. A database for cell signaling networks. J. Comp. Biol., 5(4), 747 (1998).
41. T. Takai-Igarashi, T. Takagi. Cell Signaling Ontology. ISMB BioOntology Workshop, August 24, SanDiego (2000).
42. TAMBIS Project WWW resources: http://img.cs.man.ac.uk/tambis/
22
43. H. Wache, T. Vögele, U. Visser, H. Stuckenschmidt, G. Schuster, H. Neumann and S. Hübner. Ontology-Based Integration of Information - A Survey of Existing Approaches. Submitted to IJCAI 2001 Workshop: Ontologies and Information Sharing.
44. Zdobnov E., Lopez R., Apweiler R., Etzold T.; "The EBI SRS server- recent developments."In: Proceedings of the German Conference on Bioinformatics (GCB'00), Bornberg-Bauer E., Rost U., Stoye J., Vingron M. (eds.), pp. 139-147, Logos Verlag, Berlin, Germany (2000).