Date post: | 01-Jan-2016 |
Category: |
Documents |
Upload: | cally-lowe |
View: | 13 times |
Download: | 0 times |
Finding Bugs in People:Developing an
Entomology Ontology from the UMLS
Indra Neil Sarkar, PhDLewis B. & Dorothy Cullman Bioinformatics Associate
Division of Invertebrate Zoology
American Museum of Natural History
NKOS Workshop 10 June 2005
© 2005 Indra Neil Sarkar, PhD
Total Evidence Tree of Life
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
Sequence Data
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
Structural Data
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
Phenotypes
Morphology
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
© 2005 Indra Neil Sarkar, PhD
Statements of Homology
Sequence Data Multiple Sequence Alignments
CLUSTAL, T-COFFEE, MUSCLE
Non-sequence Data Ontologies
© 2005 Indra Neil Sarkar, PhD
Ontologies
“White” “Blanc” “Weiss”
White BlueRed
Color
© 2005 Indra Neil Sarkar, PhD
Ogden-Richards Semiotic Triangle
“White” “Weiss”“Blanc” XVFD
Symbols
Thought/Reference
Referent
© 2005 Indra Neil Sarkar, PhD
Ontology Development
Protégé http://protege.stanford.edu “Frame-based”
© 2005 Indra Neil Sarkar, PhD
Ontology Development
© 2005 Indra Neil Sarkar, PhD
Ontologies in Phylogenetics
“Wing” “Aile” “Flügel”
Wing ArmForeleg
Forelimb
© 2005 Indra Neil Sarkar, PhD
Wing ArmForeleg
Forelimb
CATBATBIRD
111
Forelimb
Wing(2)
Arm(3)
Foreleg(1)
132
[Gene 1][Gene 1][Gene 1]
[Gene 2][Gene 2][Gene 2]
………
Ontologies in Phylogenetics
© 2005 Indra Neil Sarkar, PhD
Ontologies in Phylogenetics
Genetic Information 99% of Earth’s biota are extinct!
Morphological Information Fossil record Morphological studies from extant organisms
© 2005 Indra Neil Sarkar, PhD
Ontologies in Phylogenetics
Ontology Development Web Ontology Language (OWL) Structured Descriptive Data (SDD)
Can be exported to NEXUS, DELTA, Lucid
Ontology Acquisition and Markup Archival Resources Natural Language Processing
© 2005 Indra Neil Sarkar, PhD
Unified Medical Language System (UMLS)
Metathesaurus One Million Concepts 100+ Biomedical Terminologies/Ontologies
Semantic Network 135 Semantic Types 15 Coarse Semantic Groups
SPECIALIST Lexicon English + Biomedical Words
© 2005 Indra Neil Sarkar, PhD
Torre-Bueno Glossary of Entomology (TBGE)
Common Entomology Phrases 300 Primary Sources 15,010 Terms/Phrases
© 2005 Indra Neil Sarkar, PhD
TBGE to UMLS
Question 1: Is Entomology Language Different than Biomedical Language? TBGE to SPECIALIST
Question 2: Can UMLS Be Used to Seed an Ontology for Entomology? TBGE to UMLS Metathesaurus Organize Results According to Semantic
Network
© 2005 Indra Neil Sarkar, PhD
Q1: Is Entomology a Unique Language?
“Look-up” Individual Word Atoms in SPECIALIST
Complete Look-up 48% Coverage
Partial Look-up 66% Coverage
Not found 34% Not covered
© 2005 Indra Neil Sarkar, PhD
Q2: Can UMLS Be Used to Seed Entomology Ontology?
Three-Tiered Mapping Approach Tier 1: Direct Mapping
Exact & Normalized String Matching Tier 2: Direct Mapping after Demodification
Remove nominal and adjectival modifiers Exact & Normalized String Matching
Tier 3: Approximate Matching MetaMap Application
© 2005 Indra Neil Sarkar, PhD
Q2: Can UMLS Be Used to Seed Entomology Ontology?
Three-Tiered Mapping Approach Tier 1: Direct Mapping
Exact & Normalized String Matching Tier 2: Direct Mapping after Demodification
Remove nominal and adjectival modifiers Exact & Normalized String Matching
Tier 3: Approximate Matching MetaMap Application
© 2005 Indra Neil Sarkar, PhD
Q2: Can UMLS Be Used to Seed Entomology Ontology?
Tier
(Approach)
1
(Direct)
2
(Demod)
3
(Approx)
% Mapped % Accuracy
Method Overall Method Overall
20 8620 86
37 7449 78
23 4161 71
© 2005 Indra Neil Sarkar, PhD
Q2: Can UMLS Be Used to Seed Entomology Ontology?
Correct Mappings
0
500
1000
1500
2000
2500
3000
ACTIANATCHEMCONCDEVIDISOGENEGEOG
LIVBOBJCOCCUORGAPHENPHYSPROCSemantic Type
Number of Mappings
Approx
Demod
Direct
Incorrect Mappings
0
100
200
300
400
500
600
700
800
900
1000
ACTIANATCHEMCONCDEVIDISOGENEGEOG
LIVBOBJCOCCUORGAPHENPHYSPROC
Semantic Type
Number of Mappings
Approx
Demod
Direct
© 2005 Indra Neil Sarkar, PhD
Q2: Can UMLS Be Used to Seed Entomology Ontology?
0
500
1000
1500
2000
2500
3000
SNOMED CT
MeSH
Other (30)
UWDANCBI MDR LCH CSP GO
Incorrect
Correct
Source Terminologies
© 2005 Indra Neil Sarkar, PhD
TBGE-UMLS Implications
UMLS Semantic Network is a good Seed Ontology for Biological Domain Ontologies
Best Term-Concept Mappings into Anatomy
© 2005 Indra Neil Sarkar, PhD
Bottom-Up vs. Top-Down
© 2005 Indra Neil Sarkar, PhD
In Summary…
Ontologies are Needed for Phylogenetics Existing Biomedical Ontologies Are Useful
for New Domain Ontologies (especially UMLS)
Top-Down Strategy using UMLS is Tractable
© 2005 Indra Neil Sarkar, PhD
End Goal
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
Sequence Data
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
Structural Data
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
Phenotypes
Morphology
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
OWL SDD
© 2005 Indra Neil Sarkar, PhD
Next Steps
Represent Seed Entomology Ontology in OWL
Link OWL Representation to SDD for use in Taxonomic Descriptions
Involve Team of Experts for Validation Go Beyond Morphology-- Location,
Biodiversity Data, etc.
Acknowledgements
© 2005 Indra Neil Sarkar, PhD
Acknowledgements
Tom Moritz Rob DeSalle Mark Siddall David Figurski Susan Perkins Paul Planet
Gloria Coruzzi Olivier Bodenreider Carol Friedman Jim Cimino Bob Morris Mark Musen
National Institutes of Health
National Science Foundation
American Museum of Natural History
Thank [email protected]
Indra Neil Sarkar, Cullman Bioinformatics AssociateAmerican Museum of Natural History
http://www.GenomeCurator.org/people/sarkar