Olivier BodenreiderOlivier Bodenreider
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA
Experiences in visualizing and navigatingbiomedical ontologies and knowledge bases
ISMB 2002ISMB 2002Fifth Annual Bio-Ontologies MeetingFifth Annual Bio-Ontologies Meeting
August 8, 2002 August 8, 2002
2Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Introduction Introduction 11
Biomedical knowledgeBiomedical knowledge TerminologiesTerminologies (names)(names) OntologiesOntologies (objects)(objects) Knowledge basesKnowledge bases (facts)(facts)
Common featuresCommon features Terms / ConceptsTerms / Concepts Inter-concept relationshipsInter-concept relationships
HierarchicalHierarchical AssociativeAssociative
3Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Introduction Introduction 22
ChallengesChallenges Volume of informationVolume of information
10104 4 -- 10106 6 conceptsconcepts 10105 5 -- 10107 7 relationshipsrelationships
OrientationOrientation Mapping to conceptsMapping to concepts Visualizing concept spacesVisualizing concept spaces Navigating concept spacesNavigating concept spaces
term
knowledge
4Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Introduction Introduction 33
SemNavSemNav UMLS browserUMLS browser Entry point: biomedical Entry point: biomedical
termterm Display related conceptsDisplay related concepts
Display properties of Display properties of interconcept relationshipsinterconcept relationships
Allow navigation among Allow navigation among conceptsconcepts
GenNavGenNav GO browserGO browser Entry point: GO term or Entry point: GO term or
gene product name/symbolgene product name/symbol Display related GO terms Display related GO terms
and gene productsand gene products Display properties of Display properties of
term/term and term/gene term/term and term/gene product relationshipsproduct relationships
Allow navigation between Allow navigation between GO terms and gene GO terms and gene productsproducts
5Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
OutlineOutline
BackgroundBackground Unified Medical Language System (UMLS)Unified Medical Language System (UMLS) Gene OntologyGene Ontology
Overview of the browsersOverview of the browsers SemNavSemNav GenNavGenNav
Common featuresCommon features DifferencesDifferences
UMLS and GO
7Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
UUnified nified MMedical edical LLanguage anguage SSystemystem
Developed at NLM since 1990Developed at NLM since 1990 1313thth edition in 2002 edition in 2002 Integrates some 60 terminological resourcesIntegrates some 60 terminological resources
Clinical vocabularies (including specialties)Clinical vocabularies (including specialties) Core terminologies (anatomy, drugs, med. devices)Core terminologies (anatomy, drugs, med. devices) Administrative terminologies, standardsAdministrative terminologies, standards
IntegrationIntegration Synonymous terms are clustered in a conceptSynonymous terms are clustered in a concept Hierarchies (trees) are combined in a graph structureHierarchies (trees) are combined in a graph structure
8Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Terminology integration Terminology integration TermsTerms
Duchenne muscular dystrophy
MeSH, SNOMEDCTV3, Jablonski,CRISP, DxPlain,MedDRA, LOINC
pseudohypertrophic muscular dystrophyMeSH, CTV3SNOMED
X-liked recessive muscular dystrophy Jablonski
Duchenne de Boulogne muscular dystrophy Jablonski
Duchenne’s muscular dystrophy COSTAR
severe generalized familial muscular dystrophy SNOMED
Duchenne type progressive muscular dystrophy SNOMED
9Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Terminology integration Terminology integration RelationshipsRelationships
UMLS
Adrenal Cortex Diseases
Hypoadrenalism
Adrenal Gland Hypofunction
Adrenal cortical hypofunction
Adrenal Gland Diseases
Addison’s Disease
SNOMEDMeSHAODRead Codes
10Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
UMLSUMLS
Two-level structureTwo-level structure Semantic NetworkSemantic Network
134 Semantic Types (STs)134 Semantic Types (STs) 54 types of relationships54 types of relationships
among STsamong STs
MetathesaurusMetathesaurus 800,000 concepts800,000 concepts ~10 M inter-concept~10 M inter-concept
relationshipsrelationships
Link = categorizationLink = categorizationConcept
Metathesaurus
SemanticType
Semantic Network
categorization
Heart
Concepts
Metathesaurus
22
225
97
4
12
9 31
Esophagus
Left PhrenicNerve
HeartValves
FetalHeart
Medias-tinum
SaccularViscus
AnginaPectoris
CardiotonicAgents
TissueDonors
AnatomicalStructure
Fully FormedAnatomical
Structure
EmbryonicStructure
Body Part, Organ orOrgan Component Pharmacologic
Substance
Disease orSyndrome
PopulationGroup
Semantic Types
SemanticNetwork
12Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Gene OntologyGene Ontology
Developed by the GO ConsortiumDeveloped by the GO Consortium Several componentsSeveral components
Ontology (~11,000 concepts)Ontology (~11,000 concepts) Molecular functionsMolecular functions Cellular componentsCellular components Biological processesBiological processes
Gene products (~125,000)Gene products (~125,000) Associations between Gene products and GO concepts Associations between Gene products and GO concepts
(~357,000)(~357,000)
SemNav
MeSH Browser
18Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
SemNav SemNav Visualization optionsVisualization options
23Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
SemNav SemNav RelationshipsRelationships
Dystrophin
Concepts
Semantic Types
MuscularDystrophy,Duchenne55
Amino Acid,Peptide or Protein
Disease orSyndrome
Biologically ActiveSubstance
GenNav
Material and Methods
Common featuresand differences
31Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
Mapping query termsMapping query terms
Mapping terms to conceptsMapping terms to concepts Matching criteria (exact, approximate)Matching criteria (exact, approximate) Normalization techniquesNormalization techniques
work well on clinical termswork well on clinical terms less applicable to gene namesless applicable to gene names
Query disambiguationQuery disambiguation With semantic type in With semantic type in SemNavSemNav With species in With species in GenNavGenNav
32Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
VisualizationVisualization
Graph vs. Trees (Forest)Graph vs. Trees (Forest) Multiple inheritance is better visualized by graphs than Multiple inheritance is better visualized by graphs than
by treesby trees Off-the-shelf, freely available graph visualization Off-the-shelf, freely available graph visualization
packages are available (GraphViz)packages are available (GraphViz)
Need to reduce complexityNeed to reduce complexity Transitive reduction on complex graphsTransitive reduction on complex graphs Feature selectionFeature selection
e.g., a given vocabulary in e.g., a given vocabulary in SemNavSemNav e.g., a given species in e.g., a given species in GenNavGenNav
33Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
NavigationNavigation
Tool for explorationTool for exploration Navigation among conceptsNavigation among concepts
((SemNavSemNav and and GenNavGenNav)) Navigation between two polesNavigation between two poles
(Gene products and GO concepts in (Gene products and GO concepts in GenNavGenNav))
Self-contained (Self-contained (SemNavSemNav))or opened to external resources (or opened to external resources (GenNavGenNav))
Conclusions
35Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications
ConclusionsConclusions
Most of the lessons learned while developing Most of the lessons learned while developing SemNavSemNav (for browsing general biomedical (for browsing general biomedical knowledge) were applicable to knowledge) were applicable to GenNavGenNav (for (for browsing molecular biology knowledge)browsing molecular biology knowledge)
The lexical techniques suitable for mapping text to The lexical techniques suitable for mapping text to clinical terminologies require adaptation to the clinical terminologies require adaptation to the specificity of molecular biology terminologiesspecificity of molecular biology terminologies
Contact: Contact: [email protected]@nlm.nih.gov
Olivier BodenreiderOlivier Bodenreider
Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland - USABethesda, Maryland - USA
SemNav http://umlsks.nlm.nih.gov*
► Resources ► Semantic Navigator(* free UMLS registration required)
GenNav http://etbsun2.nlm.nih.gov:8000/perl/gennav.pl