Post on 23-Feb-2016
description
transcript
Slide 1
Grand Ontology StrategyBarry Smithhttp://ontology.buffalo.edu/smith 1DSC Cloud part of Armys Distributed Common Ground System (DCGS-A)
Semantic Enhancement StrategyMake data retrievable, and support analytics, by using Ontology Modules to tag data http://www.w3.org/People/Ivan/CorePresentations/HighLevelIntro/
2Agenda Day 13 9:00 Grand Ontology Strategy10.45 Military intelligence: an overview of the domain from a data integration perspective12:15 Lunch13:00Interactive session: building sample ontologiesHuman Physical Property Ontology Geospatial Ontology Target Ontology Event Ontology Video Ontology INTEL Product Ontology
Agenda Day 24 9am Survey of some existing approaches to ontology-based military/intelligence informatics10.45 Break11:00 Distinction of Relations between Ontologies and Data Models11:30 A strategy to ensure consistency of ontology development across multiple domainsRules for coordination of ontology developmentCreating a system of orthogonalsPotential partners: establishing a division of laborEstablishing the scope of a suite of Interoperable Military Intelligence OntologiesThe role of Joint Doctrine Albert Baker: Army Emerging Web Technologies (semantic solution to global force data management)Bill Barnhill:Army Data Management Implementation for Army SEC, in support of Army CIO/G-6Dan Carey: OUSD Personnel & Readiness Information Management: HRM Domain Ontology development Cliff Joslyn DoEKevin Gupton Naval Sea Systems Command (NAVSEA), Modeling & Simulation Information ManagementRichard Lee: Digital Integrated Air Defense System /DSB (METS PMO)Peter Morosoff, Electronic Mapping Systems, Inc. Military doctrine SMEExternal Participants http://www.w3.org/People/Ivan/CorePresentations/HighLevelIntro/
5Make your data available in a standard way on the Web 2.Use controlled vocabularies (ontologies) to capture common meanings, in ways understandable to both humans and computers Web Ontology Language (OWL)Build links among the datasets to create a web of data
The roots of Semantic Technology http://www.w3.org/People/Ivan/CorePresentations/HighLevelIntro/
6Controlled vocabularies for tagging (annotating) dataHardware changes rapidlyOrganizations rapidly forming and disbanding Data is explodingButMeanings of common words change slowly Use web architecture to annotate exploding data stores using ontologies to capture these common meanings in a stable waySeparate enhanced data from software7 Ivan Herman7The hope underlying the strategy of Semantic EnhancementBuild ontologies using formal languages (OWL, or something better) to enhance the different bodies of exploding data in a consistent fashion that would enable these data to be more easily retrievableintegratableanalyzedreasoned over8/The problem: the more Semantic Technology is successful, the more it failsThe original idea was to break down silos via common controlled vocabularies for the tagging of dataThe very success of this idea leads to the creation of ever new controlled vocabularies semantic silos as ever more ontologies are created in ad hoc waysThe Semantic Web framework as currently conceived and governed by the W3C yields minimal standardizationMultiplying (Meta)data registries and ontology repositors are creating semantic cemeteries, where data goes home to die9 Some of the reasons for this effectLow incentives for reuse of existing ontologiesEach organization wants its own ontology (We have been describing our data in this way for 30 years, we are not going to change now)Poor licensing regime, poor standards, poor training
1010Why should you care?when there are many ad hoc systems, average quality will be lowconstant need for ad hoc repair through manual effortDoD alone spends $6 billion per annum on this problemregulatory agencies are recognizing the need for common controlled vocabularies
11/24 Some people think that the problem of multiplying ontologies can be solved by mappings between ontologiesAnnotation = tagging data with ontologies Mapping between ontologies = like a warehouse with multiple inventories = a waste of resources
with thanks to Ron Rudnicki, IARPA AIRS (Actionable Information Retrieval System ) project12/Ontologies are not Sufficient for Interoperable Data1350010001500200025003000350020406080100120140199719981999200020012002200320042005200620072008200920102011Ontology in Article TitleOntology Mapping in Article TitleCount of Articles per Year Returned by Search on Google ScholarCUBRC - Proprietary
Some people think we can rely on luck: The Infinite Monkey (Fortuitous Interoperability) StrategyA better solutionFind out what makes ontologies stable and useful, and create an evolutionary process whereby good ontologies will thrive and bad ontologies will die Being a good ontology means not only: being good, and it also means being aggressively used in annotations
How to do it right?how create an incremental, evolutionary process, where what is good survives ?how to bring about ontology death ?
A success story from biology16To find out out what makes ontologies stable and useful look at the worlds most successful ontology = the Gene Ontology (GO)What makes GO successful?built by SMEs (constant feed back loop)coherent architectureimproves over time through application of best practices learned through use and simple feedback from users to developers
Old biology data18/
MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPISKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDV
New biology data19http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=116006492sequence of X chromosome in bakers yeast
19how to link the kinds of phenomena represented here
20/MKVSDRRKFEKANFDEFESALNNKNDLVHCPSITLFESIPTEVRSFYEDEKSGLIKVVKFRTGAMDRKRSFEKVVISVMVGKNVKKFLTFVEDEPDFQGGPIPSKYLIPKKINLMVYTLFQVHTLKFNRKDYDTLSLFYLNRGYYNELSFRVLERCHEIASARPNDSSTMRTFTDFVSGAPIVRSLQKSTIRKYGYNLAPYMFLLLHVDELSIFSAYQASLPGEKKVDTERLKRDLCPRKPIEIKYFSQICNDMMNKKDRLGDILHIILRACALNFGAGPRGGAGDEEDRSITNEEPIIPSVDEHGLKVCKLRSPNTPRRLRKTLDAVKALLVSSCACTARDLDIFDDNNGVAMWKWIKILYHEVAQETTLKDSYRITLVPSSDGISLLAFAGPQRNVYVDDTTRRIQLYTDYNKNGSSEPRLKTLDGLTSDYVFYFVTVLRQMQICALGNSYDAFNHDPWMDVVGFEDPNQVTNRDISRIVLYSYMFLNTAKGCLVEYATFRQYMRELPKNAPQKLNFREMRQGLIALGRHCVGSRFETDLYESATSELMANHSVQTGRNIYGVDSFSLTSVSGTTATLLQERASERWIQWLGLESDYHCSFSSTRNAEDVVAGEAASSNHHQKISRVTRKRPREPKSTNDILVAGQKLFGSSFEFRDLHQLRLCYEIYMADTPSVAVQAPPGYGKTELFHLPLIALASKGDVEYVSFLFVPYTVLLANCMIRLGRRGCLNVAPVRNFIEEGYDGVTDLYVGIYDDLASTNFTDRIAAWENIVECTFRTNNVKLGYLIVDEFHNFETEVYRQSQFGGITNLDFDAFEKAIFLSGTAPEAVADAALQRIGLTGLAKKSMDINELKRSEDLSRGLSSYPTRMFNLIKEKSEVPLGHVHKIRKKVESQPEEALKLLLALFESEPESKAIVVASTTNEVEELACSWRKYFRVVWIHGKLGAAEKVSRTKEFVTDGSMQVLIGTKLVTEGIDIKQLMMVIMLDNRLNIIELIQGVGRLRDGGLCYLLSRKNSWAARNRKGELPPKEGCITEQVREFYGLESKKGKKGQHVGCCGSRTDLSADTVELIERMDRLAEKQATASMSIVALPSSFQESNSSDRYRKYCSSDEDSNTCIHGSANASTNASTNAITTASTNVRTNATTNASTNATTNASTNASTNATTNASTNATTNSSTNATTTASTNVRTSATTTASINVRTSATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSATTTASINVRTSATTTESTNSSTSATTTASINVRTSATTTKSINSSTNATTTESTNSNTNATTTESTNSSTNATTTESTNSSTNATTTESTNSNTSAATTESTNSNTSATTTESTNASAKEDANKDGNAEDNRFHPVTDINKESYKRKGSQMVLLERKKLKAQFPNTSENMNVLQFLGFRSDEIKHLFLYGIDIYFCPEGVFTQYGLCKGCQKMFELCVCWAGQKVSYRRIAWEALAVERMLRNDEEYKEYLEDIEPYHGDPVGYLKYFSVKRREIYSQIQRNYAWYLAITRRRETISVLDSTRGKQGSQVFRMSGRQIKELYFKVWSNLRESKTEVLQYFLNWDEKKCQEEWEAKDDTVVVEALEKGGVFQRLRSMTSAGLQGPQYVKLQFSRHHRQLRSRYELSLGMHLRDQIALGVTPSKVPHWTAFLSMLIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGELIGLFYNKTFRQKLEYLLEQISEVWLLPHWLDLANVEVLAADDTRVPLYMLMVAVHKELDSDDVPDGRFDILLCRDSSREVGE
21to this?http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?db=nuccore&id=11600649221
answer: through annotation of data with terms from the GO controlled vocabulary22sphingolipid transporter activityHolliday junction helicase complex22dir.niehs.nih.gov/ microarray/datamining/ Why is GO successfulbuilt by bench biologistsmulti-species, multi-disciplinary, open source compare use of kilograms, meters, seconds in formulating experimental resultsnatural language and logical definitions for all termsinitially low-tech to ensure aggressive use and testing23If controlled vocabularies are to serve to remove silosthey have to be respected by many owners of data as resources that ensure accurate description of their data GO maintained not by computer scientists but by biologiststhey have to be willingly used in annotations by many owners of data they have to be maintained by persons who are trained in common principles of ontology maintenance
24Success of GO measured by the fact thatit has created a community consensusit has implemented a web of feedback loops where users of the GO can easily report errors and gapsit has identified and applied principles for successful ontology management25GO is limited in its scopeit covers only generic biological entities of three sorts:cellular componentsmolecular functionsbiological processes
no diseases, symptoms, disease biomarkers, protein interactions, experimental processes 2626Thus it was necessary to extend the GO methodology to other domains of biology and medicine
And to provide and test rules for such extension2727 RELATION TO TIME
GRANULARITYCONTINUANTOCCURRENTINDEPENDENTDEPENDENTORGAN ANDORGANISMOrganism(NCBITaxonomy)Anatomical Entity(FMA, CARO)OrganFunction(FMP, CPRO)Phenotypic Quality(PaTO)Biological Process(GO)CELL AND CELLULAR COMPONENTCell(CL)Cellular Component(FMA, GO)Cellular Function(GO)MOLECULEMolecule(ChEBI, SO,RnaO, PrO)Molecular Function(GO)Molecular Process(GO)OBO (Open Biomedical Ontology) Foundry proposal(Gene Ontology in yellow)2828 RELATION TO TIME
GRANULARITYCONTINUANTOCCURRENTINDEPENDENTDEPENDENTORGAN ANDORGANISMOrganism(NCBITaxonomy)Anatomical Entity(FMA, CARO)OrganFunction(FMP, CPRO)Phenotypic Quality(PaTO)Biological Process(GO)CELL AND CELLULAR COMPONENTCell(CL)Cellular Component(FMA, GO)Cellular Function(GO)MOLECULEMolecule(ChEBI, SO,RnaO, PrO)Molecular Function(GO)Molecular Process(GO)The strategy of orthogonal modules2929How to recreate the success of the GO in other areascreate a portal for sharing of information about existing controlled vocabularies, needs and institutions operating in a given areacreate a library of ontologies in this areacreate a consortium of developers of these ontologies who agree to pool their efforts to create a single set of non-overlapping ontology modules one ontology for each sub-area
30NextGen Ontology PortalOntology PortalTwo-Tiered RegistryNextGen Ontology consist of vetted ontologiesOntology Library open to the wider communityOntology MetadataOntology owner, domain, and location Ontology Search*Support ontology discovery#Redundant efforts defeat the purposes of ontology-based data integration
32 The ontologies are of lower qualityMultiple ontologies leads to siloing of dataMultiple ontologies block user commitment32The General Strategy33 (Applied to Army intelligence, but with a view to generalizing to other military agencies)Identify all Army intelligence ontology projectsLock leaders of these projects in a room, with government personnelPool informationThrash out a strategy for creating a single non-redundant suite of interoperable ontologies, with a division of labor, a division of responsibility34 Developers commit to collaborating with developers of ontologies in adjacent domains and Developers commit to ensuring that, for each domain, there is convergence on a single ontologySee http://obofoundry.orgSome rules to serve as starting point for negotations in the room3535Why do this now?36
Fixing Intelwww.militaryontology.com37
Integrate information collected by:
Civil Affairs OfficersPRTsAtmospherics TeamsAfghan Liaison OfficersFemale Engagement TeamsNon-Governmental Organizations Development OrganizationsUnited Nations OfficialsPsychological Operations TeamsHuman Terrain TeamsInfantry BattalionsBasic principles of ontology developmentfor formulating definitionsof modularityof user feedback for error correction and gap identificationfor ensuring compatibility between modulesfor using ontologies to annotate legacy datafor using ontologies to create new datafor developing user-specific viewsPrinciple of two-part definitionsEach ontology term A will have a unique immediate parent B within the asserted hierarchyThe definition of A will state what it is about certain Bs which makes them As. Thus it will be of the form:A =def. a B which Cs
39Types of SE OntologiesUpper-level = BFO, plus small extensions of BFO covering terms used in almost all lower-level ontologies such as person, group, datum, meeting, Low-level ontologies = small ontologies for single domains, for example: ontologies of qualities such as hair color, eye color (close to flat lists)ontology of INTEL disciplinesontology of INTEL products
40Types of SE Ontologies3. Mid-Level Ontologies of two sortsreference ontologies, created through downward population from BFO, to cover broad domains comprehending multiple lower-level ontologies, for example:geospatialinformation artifactmilitary operationapplication ontologies created for specific purposes by merging components from other ontologies, or by introducing data-source specific terms
41The Semantic Enhancement ApproachCreate a small set of plug-and-play ontologies as stable monohierarchies with a high likelihood of being reusedCreate ontologies incrementallyReuse existing ontology resourcesUse these ontologies incrementally in annotating heterogeneous dataAnnotating = arms length approach; the data and data-models themselves remain as they are
42The Semantic Enhancement ApproachAnnotations can be associated with metadata concerning provenance (GO Evidence Codes)Annotations in common ontologies allows data to be shared across different communitiesThe common architecture and logical structure of the ontologies brings benefits in queryingsearchanalyticsreasoning
43Benefits of ModularityBrings a clean division of labor amongst domain experts, who can manage governance aspects pertaining to their own domains Automatic consistency of the results of the distributed efforts no room for contradictionAdditivity of annotations even when multiple independently developed ontologies are usedLessons learned in developing and using one module can be used by the developers and users of later modules44Benefits of ModularityIncreased likelihood of reuse, since potential users will be aware that they are investing in the results of an authoritative coordinated approach of proven reliability Increased value and portability of training in any given moduleIncentivization of those responsible for individual modules45Benefits of ModularityAll of those involved can more easily inspect and criticize the results of others work Creates a collaborative environment for ontology development serves as a platform for innovations which can be easily propagated throughout the whole systemDeveloping and using ontologies in a consistent fashion brings a number of network effects the value of existing annotations increases as new annotations are added46Dealing with vocabulary conflicts across COIsThe goal is: one agreed, authoritative representation for each domain To achieve agreement we need:coordinating board, change managementborder treaty negotiationscommunity-specific views of the terminology (using exact synonyms)
47GovernanceCommon governance (coordinating editors, change board)Common trainingRobust versioningCommon architecture Strategy of downward populationHow much can we embed governance into software?48Logical standards can be only part of the solutionOWL bring benefits primarily on the side of syntax (language) What we need are standards on the semantics (content) side (via top-level ontologies), including standards fortop-level ontologiescommon relations (part_of )relation of lower-level ontologies to each other and to the higher levels
BFO, DOLCE, SUMOAll exist in FOL and OWL versionsAll have been tested in use
BFO: very small, truly domain-neutralDOLCE: largely extends BFO, but built to support linguistic and cognitive engineeringSUMO: has its own tiny mathematics, tiny physics, tiny biology (body-covering, fruit-Or-vegetable), A special case:Cyc: Allows inconsistent microtheories (so: chaos) has received a lot of funding, but does not perform well in use
50120+ ontology projects using BFOhttp://www.ifomis.org/bfo/
Open Biomedical Ontologies Foundry eagle-I, VIVO, CTSAconnectAstraZeneca ElsevierNuclear, Nanotechnology
How a common upper level ontology can help resist ontology chaossomething to teachtraining (expertise) is portableeach new ontology you confront will be more easily understood at the level of contentand more easily criticized, error-checkedprovides starting-point for domain-ontology developmentprovides platform for tool-building and innovationslessons learned in building and using one ontology can potentially benefit other ontologiespromote shareability of data across discilinary and other boundaries
Anatomy Ontology(FMA*, CARO)Environment Ontology(EnvO)Infectious Disease Ontology(IDO*)Biological ProcessOntology (GO*)Cell Ontology(CL)CellularComponentOntology(FMA*, GO*)Phenotypic QualityOntology(PaTO)Subcellular Anatomy Ontology (SAO)Sequence Ontology (SO*)Molecular Function(GO*)Protein Ontology(PRO*) OBO Foundry Modular Organizationtop level
mid-level
domain levelInformation Artifact Ontology(IAO)Ontology for Biomedical Investigations(OBI)Ontology of General Medical Science (OGMS)Basic Formal Ontology (BFO)5353* = dedicated NIH funding
UCore 2.0 / UCore SL
Extension Strategy54top level
mid-level
domain levelMilitary domain ontologies as extensions of the Universal Core Semantic Layerhttp://1105govinfoevents.com/EA/Presentations/EA09_2-2_Robinson.pdf54BFO A simple top-level ontology to support information integration in scientific researchNo overlap with domain ontologies (organism, person, society, information, )Based on realismNo abstractaTested in many natural science domains5555Basic Formal OntologyContinuantOccurrent
process, eventIndependentContinuant
entityDependentContinuant
property property dependson bearer5656depends_onContinuantOccurrent
process, eventIndependentContinuant
entityDependentContinuant
propertyevent dependson participant5757roles, qualitiesContinuantOccurrent
process, eventIndependentContinuantDependentContinuant58QualityRole58instance_ofContinuantOccurrent
process, eventIndependentContinuant
eventDependentContinuant
property .... ..... .......typesinstances595960
A 515287 DC3300 Dust Collector FanB 521683 Gilmer BeltC 521682 Motor Drive BeltCatalog vs. inventory
61
types vs. instances
62names of instances
63
names of types
instance_ofContinuantOccurrent
process, eventIndependentContinuant
eventDependentContinuant
property .... ..... .......typesinstances6464
CONTINUANTOCCURRENTINDEPENDENTDEPENDENTORGAN ANDORGANISMOrganism(NCBITaxonomy)Anatomical Entity(FMA, CARO)OrganFunction(FMP, CPRO)Phenotypic Quality(PaTO)Organism-Level Process(GO)CELL AND CELLULAR COMPONENTCell(CL)Cellular Component(FMA, GO)Cellular Function(GO)Cellular Process(GO)MOLECULEMolecule(ChEBI, SO,RNAO, PRO)Molecular Function(GO)Molecular Process(GO)rationale of OBO Foundry coverage GRANULARITYRELATION TO TIME65Example: The Cell Ontology
BFO 2.0
to be released spring 2012
http://groups.google.com/group/bfo-discuss?pli=1
6768
69
Principle of single inheritance70
Terms never have more than one parent71
On Classifying Material Entities in Basic Formal Ontology72
73
BFO 2.0Independent continuantsThings (such as molecules, organisms, planets)continue to exist through time can gain and lose qualities over timecan gain and lose parts over time
74Specifically dependent continuantsthe quality of whiteness of this cheeseyour role as lecturerthe disposition of this patient to experience diarrheathe function of your heart to pump blood
75the particular case of redness (of a particular fly eye)the universal redinstantiatesan instance of an eye (in a particular fly)the universal eyeinstantiatesdepends_on76depends_onContinuantOccurrent
processIndependentContinuant
thingDependentContinuant
quality .... ..... .......temperature dependson bearer7777Qualitiestemperatureblood pressuremassare continuantsthey exist through time while undergoing changes
78:.A Chart representing how Johns temperature changes
79:.BFO: The Very Topcontinuantindependentcontinuantdependentcontinuantqualityoccurrent
temperature80:.Blinding Flash of the Obviousindependentcontinuantdependentcontinuantqualitytemperaturetypesinstancesorganism
John
Johns temperature
81:.81independentcontinuantdependentcontinuantqualitytemperatureorganism
John
Johns temperature
occurrent
processcourse of temperature changesJohns temperature history
82:.82Advantages of Genus-Species DefinitionsWork on formulating definitions provides a check on the correctness of the backbone is_a hierarchyEvery definition logically encapsulates all the definitions of all higher terms within the relevant single branchThis simple traffic rule (always use genus-species definitions) contributes to coordina-tion of the ontology development effort8384ontology =def.a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent1. types in reality2. those relations between these types which obtain universally (= for all instances)F16 is_a jet fighterjet fighter has_part wing
85How to build an ontologyimport BFO into ontology editorwork with domain experts to create an initial mid-level classificationfind ~50 most commonly used terms corresponding to types in realityarrange these terms into an informal is_a hierarchy according to this universality principleA is_a B every instance of A is an instance of Bfill in missing terms to give a complete hierarchywork with domain experts to populate the lower levels of the hierarchy
86UniversalityOntologies are graphs, whose nodes are singular nouns representing types, and whose edges are relational assertions which hold universally.Often, order will matter. We can assertadult transformation_of childbut notchild transforms_into adult
3-levels:Distinguish things from ideas from wordsFirst-order reality reality as it is prior to any cognitive agents perception or belief; Cognitive representations of this reality embodied in observations and interpretations on the part of cognitive agents;Publicly accessible concretizations of these cognitive representations artifacts representing first order reality (including ontologies, terminologies, data repositories)Smith B, Kusnierczyk W, Schober D, Ceusters W. Towards a Reference Terminology for Ontology Research and Development in the Biomedical Domain. Proceedings of KR-MED 2006, November 8, 2006, Baltimore MD, USA 8788Principle of objectivityWhich types exist in reality is not a function of our knowledge.Terms such asunknownunclassifiedunlocalizedweapon not otherwise specifieddo not designate types in reality.88There is no biological species: unknown rabbit. See discussion below.Chart191.494.491.892.8
Column1
Sheet1Column1Time 191.4Time 294.4Time 391.8Time 492.8To resize chart data range, drag lower right corner of range.