Date post: | 10-Mar-2016 |
Category: |
Documents |
Upload: | raya-agency |
View: | 237 times |
Download: | 2 times |
• AVAILABLE ONTOLOGIES FOR YOUR BIOMEDICAL TEXT MINING: proteins, genes, chemicals, diseases, cell lines, general species, plants, anatomy, physiological effects, cosmetology, geopolitical regions, authors, relationships...
• WE CREATE CUSTOM ONTOLOGIES WITH OUR UNIQUE AND POWERFUL SOFTWARE TOOLS
Ontologies provide the basis for identifying concepts in text mining technologies. Subsequent extraction of facts and relationships between these
concepts enables data mining and provides the foundation for novel “in silico” knowledge discovery methods. OntoChem is using ontologies
for the extraction of implicit, unknown and useful information from databases and document collections such as patents or scientifi c literature.
ONTOCHEM ONTOLOGIES
chemonto
effects
is found in
6 memberedheterocycles
aromaticcompunds
anti-inflammatoryagent
chemistry
is a
is a
is a
is a
is a
is a
is a
is a
is a
treats
is a
is a
is a
is a
is a is a
is a
is a
is a
is found
in
Filipendulaulmaria
Rheumatic Fever
range of distribution
range of distribution
is part of
is part ofis part of world
regions
diseases
Filipendula Salicaceae
species
Salix
…
……
…… …
is a
SALIXALBA
D (-)-Salicin
Europe
Northern Africa
Africa
Ontology (derived from onto- the Greek ὤν, ὄντος “being;
that which is”, present participle of the verb εἰμί “be”, and
-λογία , -logia: science, study, theory) is the philosophical
study of the nature of being, existence, or reality as such,
as well as the basic categories of being and their relations.
In computer science, an ontology formally represents knowledge
as a set of concepts within a domain, and the relationships
between those concepts, enabling semantic data integration,
data mining and knowledge generation. Ontologies are explicit
specifi cations of a topic including a vocabulary of terms and
concepts with defi ned logical relationships to each other.
http://en.wikipedia.org/wiki/Ontology_(information_science)
● Finding specifi c relationships between domains, e.g.
which compounds have been isolated from plants –
information that was previously only available from
manually curated databases is now generated on the fl y
● Similarity search and ranking of documents based on
ontology concept metrics. This gives more relevant
results than conventional technologies such as word
frequencies or key words.
OntoChem develops ontologies in the areas of chemistry,
species, diseases, anatomy, cell lines, proteins, pharmacolo-
gical effects, languages, geopolitical and climate zones,
company information for business intelligence and others.
EXAMPLE
Ontologies together with heuristic and linguistic methods
are applied for semantic processing of unstructured
information sources like scientifi c articles, patents and others.
Using for example our species, chemistry and geographical
ontologies, one may retrieve relationships for the white
willow (Salix alba) as follows:
INTEGRATED APPROACH
OntoChem has an integrated approach – from custom made
novel tools and algorithms up to ready-to-use ontologies
and text annotation with OCMiner®. We build, update, validate
and merge general, chemical and biological ontologies for
biomedical data mining applications. OntoChem’s ontology
approach allows for stable concept IDs – making updates
easier and past annotations interpretable. Our modular
software enables quick assembly of derived meta-ontologies
that are quality checked. OntoChem’s unique selling point
is also the scalability of its patented methods for high perfor-
mance text processing – enabling ontologies to contain up to
billion terms for annotation and very fast text annotations.
USE OF ONTOLOGIES
Our data and knowledge extraction technology OCMiner®
uses ontologies for a variety of information retrieval tasks:
● Classifi cation of entities, for example assigning
specifi c compounds to compound classes, relating
physiological symptoms to a disease, or defi ning
specifi c relationship types using a custom developed
regular expression syntax language
● Ontology aware search engines such as our demo
server www.ocminer.com allow to search for concepts,
for example the search term “plants” will return
documents mentioning specifi c plants such as “salix”
or “Filipendula ulmaria”
AVAILABLE ONTOLOGIES
OntoChem has implemented technologies to build dictionaries,
controlled vocabulary, taxonomies or ontologies comprising
more than 100 million terms from various domains. Examples
are our ontologies for general species, plants and fungi, cell
lines, general anatomy, plant and fungal anatomy, diseases,
pharmacological and physiological effects, cosmetology,
proteins, genes, chemistry, languages, geopolitical and
climate zones, company information for business intelligence
and domain specifi c relationship ontologies.
Each ontology concept contains further data, such as relation-
ships to other concepts, links to external sources, language
information, its synonyms and related updating information.
OntoChem’s ontologies can be stored and used in various
formats such as OBO, CSV, XML (using specifi c fl avors such as
RDF, OWL, CML, SBML or others), SKOS etc.
When ontologies are used for text mining, we have specifi c
modules that enhance the value of ontologies, either by
generating an enriched ontology with additional terms or
by using these modules at the time point of annotation:
● Spelling variations (e.g. British-American English,
plural forms)
● Diacritic character, space/hyphen/apostrophe handling
● Ontology dependent conditional black and white lists,
case sensitive annotations
● Automated detection of acronyms and abbreviations
An unique ontology format has been developed to extract
relationships between named entities (NE) in text. Domain
specifi c relationship ontologies are used together with the
related ontologies and a new regular syntax expression
language to extract relationships with high precision and recall.
ONTOLOGY TOOLS
To create, manage, update and validate ontologies we have
developed a range of different software tools.
Chemistry ontology editor We have developed the fi rst
specialized chemistry ontology editor, SODIAC (structure
ontology development and individual assignment center),
to support the development of chemical ontologies. Using
the OBO format, it implements known functionality of an
ontology editor together with a chemistry structure editor
that allows structure based addition, editing and ontology
checks. SODIAC can be used to annotate conventional structure
fi les or chemical databases whereby each compound will
be assigned to its chemical structure classes.
Using SODIAC, we have developed chemical ontologies
that comprise structure based classifi cations but also biology
related classifi cations of chemical compounds. Particular
emphasis has been given to natural products, for example
steroids or sugars, but also to all classes of heterocycles and
compound classes that are of interest for biomedical research.
In addition, classifi cations such as vitamins, food and fl avor,
cosmetics, drugs and FEMA compounds can be assigned.
OntViewer is designed to display, review and check very large
ontologies with up to multi-GB data, such as for example the
chemistry or the proteins/genes ontologies. It also performs
logical, statistical and consistency checks on the ontology.
Screenshot of SODIAC, our specialized ontology editor for general and chemical ontologies
Screenshot of OntViewer, showing the ontology tree of ChEBI with different relationships.
OntoChem has also developed a series of custom build
command line tools that aid creating, updating and
validating ontologies:
● Searching and proposing candidate synonym concept
terms in document collections
● Automated generation of spelling variations
● Checking and correcting homonyms or logical errors
within or between ontologies
Together, our technologies provide a straightforward and
comprehensive toolbox for various tasks when working with
ontologies.
ADVANTAGES
OntoChem’s ontologies, together with OCMiner® are ideally
suited for high speed, high quality annotation and search
of large data volumes. For example, annotating PubMed
abstracts in the demo application www.ocminer.com and
using the chemistry search term “heterocyclic compounds”
in www.ocminer.com retrieves 3.124.129 hit documents,
while a native PubMed (http://www.ncbi.nlm.nih.gov/
pubmed?term=heterocyclic) search fi nds 24.524 hit documents.
Using the cell line “SKMEL-28” as a search term retrieves
296 documents, while the native PubMed search
(http://www.ncbi.nlm.nih.gov/pubmed?term=skmel-28)
delivers 26 hit documents.
Screenshot of HugeEdit, showing a large data fi le of compounds and their names.
HugeEdit is a simple and fast text editor for displaying, searching
and editing very large data fi les with up to multi-GB data and
multi-million lines without the need to hold the complete data
in memory. It is especially suited to work with column separated
data too large to be edited in standard spreadsheet editors.
OntoChem GmbHHeinrich-Damerow-Str. 406120 HalleGermany