Date post: | 23-Jan-2018 |
Category: |
Technology |
Upload: | deborah-mcguinness |
View: | 29 times |
Download: | 0 times |
Ontologies for the Modern Age
Deborah L. McGuinnessTetherless World Senior Constellation Chair
Professor of Computer, Cognitive, and Web Science
Director RPI Web Science Research Center
RPI Institute for Data Exploration and Application Health Informatics Lead
[email protected] , @dlmcguinness ,
We have come a long way
since 2001
Tracks:
• Ontology and Ontology Maintenance
• Interoperability, Integration, &
Composition
• Web Services & Applications
• Needed to add tutorial / demo / BOF
track to handle large preregistration
numbers
Sponsors:
VerticalNet, Nokia, Spiritsoft, Enigmatic,
Empolis, Connotate, Mondeca, L&C, SC4,
Network Inference, Ontoprise, Inria, KSL,
NSF, DARPA
From http://swsa.semanticweb.org:
245 Attendees | 35/58 Papers Accepted | 3
Tutorials | 0 Workshops
and 2 co-located events, plus BOF/DEMO
• Kicked off Semantic Web Science
Association (SWSA) and
• ISWC conference series (2002)
• Background for Web Science / Web
Science Trust (2006)
McGuinness ISWC 10/23/2017
Themes continue and expand
Co-located and track themes valid then
and expanding now
Tutorials - 7
Workshop explosion – 18
– some of which are vibrant communities
and have been running for many years or
evolved (e.g. Linked Science -> Enabling
Open Semantic Science)
- Some continued themes – Ontologies
come of age (again) 2011
- Some newer themes (e.g., benchmarking
linked data, semantic web for x: IoT,
biodiversity, etc.)
McGuinness ISWC 10/23/2017
Ontologies
An ontology specifies a rich description of the
• Terminology, concepts, nomenclature
• Relationships among concepts and individuals
• Sentences distinguishing concepts, refining
definitions & relationships
relevant to a particular domain or area of interest.
* Based on AAAI ‘99 Ontologies Panel ̶ McGuinness, Welty, Uschold, Gruninger, Lehmann
McGuinness 6/7/2017
• "Pull" for Ontologies. Invited
talk. Semantics for the Web.
Dagstuhl, Germany, 2000.
• Ontologies Come of Age.
Fensel, Hendler, Lieberman,
Wahlster, eds. Spinning the
Semantic Web: Bringing the
World Wide Web to Its Full
Potential. MIT Press, 2003.
McGuinness ISWC 10/23/2017
Ontology-Enabled Application
Configurator Example
McGuinness, Resnick, Isbell. Description Logic in
Practice: A CLASSIC: Application. IJCAI, 1995.
Web-based configurator.
KR-literate designer and
maintainer
Tools like CLASSIC,
Protégé, Ontolingua,
Chimaera, PROMPT, …all
benefit by having a
knowledge representation
expert project owner /
maintainer with domain
expert access
Applications of the day lived
reasonably* well with limited
expressivity
McGuinness ISWC 10/23/2017
Building and Evolving
Ontologies Past Present
Design Knowledge Representation
(KR) Expert with domain
expert access
KR Expert(s) paired with domain
experts AND community
Population KR expert learns domain and
builds ontology with some
external reuse
KR and domain experts determine
seed vocabularies and HEAVILY
leverage them
Evolution KR expert heavily involved KR expert involved in building /
customizing tools that domain
experts use; Input may include
automatic techniques output (e.g.,
extraction)
Tool Users Trained in Computer Science Trained in Domain ScienceS
Application Users Targeted well understood
user base
Diverse and evolving user base
Reuse Well thought out Expect the unexpectedMcGuinness ISWC 10/23/2017
●Limited data integration without controlled
vocabulary
●Limited reproducibility without shared
definitions
●Difficulty in reuse without provenance
Ontologies can enhance integration,
communication, reuse, and research impact
Ontology “Pull”: Browsing / Configuration
to Interoperability / Transparency
McGuinness ISWC 10/23/2017
Data Life Cycle
Consistent
terminology
and
meaning
Ontology-enhanced
Search and
organization
Data management Image: J.Crabtree with permission NIEHS 50 yr FEST
Ontology-enabled
interpretation & integrationOntology-enabled
integrity checking
Provenance
annotations for trust
and reuse
Computer understandable specifications of meaning
(semantics) support enhanced lifespan & impact of data
McGuinness ISWC 10/23/2017
Child Health Exposure
Analysis Repository
Stingone, Mervish, Kovatch, McGuinness, Gennings, and Teitelbaum. Big and Disparate Data: Considerations for Pediatric Consortia. Current Opinions in Pediatrics Journal. 29(2):231-239, April 2017 Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01.
McGuinness ISWC 10/23/2017
Ontology Development Process
**
Use Cases
Existing Ontologies
& Vocabularies
Expert Interviews
Labkey,
Ontology
Fragments
Ontology
Curation
(ongoing)
Reviewers & Curators
* Ontology Development Team
* Domain collaborators
* Invited experts
* "Consumers" (data analysts)
Knowledge Graph
Integration
* Linking data and
metadata content to
domain terms
* Linking workflows
based on semantic
descriptions
Repository
Integration
* Source Datasets
* Analytics source
code
* Results
* Publications
Knowledge-
Enhanced
Search
Finding what
is there that
might be of
use
Semantic
Extract
Transform,
Load
(SETLr)
Expert
Guidance
Sources
Data Reporting
Templates
Data Dictionaries /
Codebooks
Foundational
Ontologies/Vocabularies
Human Aware
Data Acquisition
Framework
Ontology
Browser
Generated
Ontology
* domain concepts
* authoritative
vocabularies
* vetted definitions
* supporting citations
Erickson, McGuinness, McCusker, Chastain, Pinheiro, Rashid, Liang, Liu, Stingone, …
Exemplified by
McGuinness ISWC 10/23/2017
11
• Ontology support for mapping and integration (e.g., education level)
• Ontology informs decisions about variables that may be combined, serve as proxy, or used to derive desired info (e.g., birth outcomes)
• Ontology Integrity constraints may help flag errors (e.g., APGAR > 10)
• Ontology helps expose implicit information and find links
Fenton Z-Score
Sex
Birth weight
Gest Age
Mother’s Highest Education Level
Val
Did not attend school 0
Elementary school 1
Technical post-primary 2
Middle school 3
Technical post-middle school 4
Highschool or junior college 5
Technical post-junior college 6
College 7
Graduate 8
Doesn’t know 9
Mother Education
Val
Less than High School 0
High School Graduate or More 1
Support Browsing,Searching, Pooling
Deriving Values, Verification, …
McGuinness, McCusker, Pinheiro, Stingone, et. al. Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
McGuinness ISWC 10/23/2017
Laboratory Information Mgmt
System (LIMS)-based
backend integrated with
Ontology.
Includes automatic ingest,
access control, data
governance, download, ...
Supports Search study,
sample, subject, ...
Enables statiscians to ask for
content to support their
studies e.g., find
Child: Birth Weight, Gender,
Gestational Age at Birth
Mother: Age, BMI “early in
pregnancy based on
inclusion criterion for the
particular study”, Parity,
Education
Metals: As, CD, Mn, Mo, Pb
CHEAR Human Aware Data
Acquisition Framework
McGuinness ISWC 10/23/2017Pinheiro, Liang, Rashid, Liu, Chastain, Santos, McCusker, McGuinness
Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
CHEAR Ontology infrastructure
13
Thousands of instances of CHEAR & foundational
ontologies (e.g., subjects, samples, lab capabilities)
Thousands of concepts and
relationships from
foundational ontologies
Hundreds of concepts from the CHEAR ontology
258 Analytes (incl. 36 metals, from lab spec)
176 Epidemiological Attributes (from pilots)
28 Sample types (incl. 3 pregnancy, from lab spec)
42 Assay types (from Lab Capabilities)
122 Instrument types (from Lab Capabilities)
We use:
●Labkey to create, curate
and maintain CHEAR
concepts (ontology)
●Labkey to create and
maintain CHEAR instances
(knowledge graph)
●SETLr to build and publish
the CHEAR ontology from
CHEAR concepts
●HADatAc to connect
CHEAR/foundational
concepts and instances to
CHEAR data
●HADatAc to
browse/select/retrieve
CHEAR data from CHEAR
vocabulary
Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES02655501
Disease Ontology
UBERON (Anatomy)
Units Ontology
CHEBI (Chemicals)
RefMet* (Metabolites)
ENVO* (Environment)
UniProt* (Proteins)
HAScO (Instruments/methods)
SIO (Semantic Science Int Ont)
PROV (Provenance)
Example Ontology and
infrastructure (CHEAR)
Content keeps expanding…Metabolomics
Targeted Analytes and RefMet
Refmet main classes and
subclasses are mapped to
CHEBI classes where
available.
CHEAR targeted analyte
classes and superclasses
are also aligned to CHEBI.
When including the CHEBI
hierarchy as well, the
following main classes and
subclasses in RefMet have
targeted analyte
subclasses (count in
parentheses).
McGuinness ISWC 10/23/2017 Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
Mapping Data to Meaning:
Semantic Data Dictionaries
Rashid, Chastain, Stingone, McGuinness, McCusker. The Semantic Data Dictionary
Approach to Data Annotation and Integration. Enabling Open Semantic Science, Oct 21, 2017
McGuinness ISWC 10/23/2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
Columns
id
race
age
edu
bmi
weight
height
smoker
pb_1
pb_2
ga
birthwt
Semantic Data Dictionaries
Describe a Bigger Picture
id(sio:Identifier)
race(sio:Race)
age(sio:Age)
edu(chear:EducationLevel)
bmi(chear:BMI)
weight(sio:Mass)
height(sio:Height)
smoker(chear:SmokingStatus)
pb_1(sio:Concentration)
pb_2(sio:Concentration)
ga(chear:GestationalAge)
birthwt(chear:Weight)
??mother(sio:Human)
??child(sio:Human)
??birth(chear:Birth)
??pregnancy(chear:Pregnancy)
??sample1(Serum)
??sample2(Serum)
??pb_1(Pb)
??pb_2(Pb)
hasAttribute
??visit1(chear:Visit)
??visit2(chear:Visit)
hasAttribute
existsAtwasDerivedFrom
hasPart hasAttribute
inRelationTo
existsAt
existsAt
inRelationTo
(chear:Mother)
inRelationTo
hasRole
Plus Units of
Measure
(not shown)
McGuinness ISWC 10/23/2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
Epidemiological Measurements
Concepts /
relationships from
foundationahttp://
websci16.org/spo
nsorsl ontologies
Examples
of terms
from the
CHEAR
ontology
Examples
of terms
from th
CHEAR
ontol
Examples
of concepts
from the
CHEAR
ontology
Instance of
foundational
ontology
term
Ontology and Knowledge
Graph (Behind the Scenes)
Concepts /
relationships from
foundational
ontologies
McGuinness ISWC 2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
CHEAR Study-based Evolution
Strategy
Identify terms
that can be
mapped to
existing ontology
Identify terms
to be added
to ontology
Describe new
terms w/
definitions and
location within
existing ontology
Mappings (e.g.
variable names)
incorporated into
knowledge graph
Data into
knowledge graph
after embargo
period
Incorporate new
terms into
existing ontology
Review and
revise updates
with stakeholders
Data Structures
& Standards
Working Group
Compile new
terms across
multiple studies
(e.g. Quarterly)
Data Center
New
version
Ontology
X
McGuinness ISWC 2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
Ontology-Enabled Study Search
Blood Biomarkers for Children’s Health (Study 1)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Urine Biomarkers for Children’s Health (Study 2)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
Metabolomic Biomarkers for Children’s Health (Study 3)
Institution:
Principal Investigator(s):
Number of Subjects:
Number of Samples:
Study Description:
Keywords:
McGuinness ISWC 2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
Data Search expanded view
20
Ontology-Enabled Data Search
McGuinness ISWC 2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01
21
• Domain Science is asking for ontologies for Findable, Accessible, Interoperable, Reusable (FAIR) kind of issues
• With tooling and processes, domain scientists can help build and maintain ontologies and ontology-enabled applications, ex. Epidemiologists are doing this
• While classical ontology considerations remain important (e.g., expressive enough for use case), ecosystem considerations dominate many considerations, including maintainability and longevity
• The data center content in CHEAR with the human aware data collection framework front end provides some of the infrastructure I envision in an open* knowledge network
Evolving Reflections
McGuinness, McCusker, Pinheiro, Stingone, et. al
McGuinness ISWC 10/23/2017
Cognitive Computing in Health
Health Empowerment by Analytics,
Learning, and Semantics
• How can we enhance population and
individual health using information found
inside and outside the traditional (E)HR?
• How can we develop precision medicine
across the many levels of research from
Genome to Phenotype to Population
Health?
• How can we use IBM’s Watson
Technology, augmented with Rensselaer’s
semantics, learning, and analytics
expertise to achieve these goals?
McGuinness ISWC 10/23/2017 Partially supported through IBM Cognitive Network funding
Health Empowerment
• Knowledge as Medicine– Cognitive agent technology will enhance ability to
• Explain relationships found in the data and linking these to
appropriate scientific literature,
• Put meaningful labels on clusters and connections derived from
the analytic process,
• Provide inputs to cognitive systems based on the data found in
databases and medical literature
• Generate and/or test hypotheses about health and medicine to
the level of the individual (precision medicine and health)
24McGuinness ISWC 10/23/2017 partially supported
through IBM Cognitive Network
Semantics-enabled Framework
**
Ontologies are an important piece; but are part of a larger integrated framework
Semanalytics / SemNExt RPI team: McGuinness, Bennett PIs with McCusker, Erickson, Seneviratne and the extended Research groups including Rashid here, along with input from IBM HEALS collaborators and motivation from MANY projects, particularly from NIEHS/ Mount Sinai,
McGuinness ISWC 10/23/2017 partially supported through IBM Cognitive Network
Probability-Aware
Knowledge Exploration
• Knowledge imported from drug, protein, and disease interaction databases.
• Each interaction given an evidence-driven probability.
• Find drugs that could affect melanoma, filtered by interaction probability.
• The best hypotheses were generated using the highest probabilities.
McCusker, Dumontier, Yan, He, Dordick, McGuinness. Finding Melanoma Drugs through a Probabilistic Knowledge Graph. PeerJ4L 32007, 2016.
McGuinness ISWC 10/23/2017
Semantic Extract, Transform, Load
for Knowledge Graphs
XML
CSV
JSON
JSON-LD
Templating
Script
RDF
GraphsSatoru
E T L
HTM
L
Entrez
McCusker, Rashid, Liang, Liu, Chastain, Pinheiro, Stingone,
McGuinness. Broad, Interdisciplinary Science In Tela: An Exposure
and Child Health Ontology. Web Science, 2017
McGuinness ISWC 10/23/2017
Cancer Data: Comprehensive
omics, epidemiology & patient care
Initial data analysis aimed initially at the
following public datasets:
TCGA: RNA expression, tumor mutation, protein
expression, and clinical attributes (including staging,
treatment, risk, and survival) on 32 cancer types in >
14,000 patients
NHANES: Cross-sectional biannual survey of the health and nutrition
of the US population, including illness, environmental exposures, and
risk exposures.
Multiparameter Intelligent Monitoring in Intensive
Care: Longitudinal patient records from patients who
stayed in the intensive care units at Beth Israel
Deaconess Medical Center
Additional analysis will include deidentified data in
cancer topics
McGuinness ISWC 10/23/2017 partially supported through IBM Cognitive Network
Building the Knowledge graph:
Reusing Knowledge Sources to Bridge
Abstractions
• Already done: COSMIC Gene Census, OMIM, DrugBank,
iRefIndex
• Pathway data: KEGG, Reactome (small molecule
interactions, curated interactions)
• Gene Ontology: protein localization in cell types and
tissues, protein functions, biological process involvement
• UniProt: Protein families, including common binding sites
• CAP Protocols: Current cancer staging standards,
NCCN… many of these evolve, e.g., breast cancer
staging guidelines
• Vocabularies: SNOMED, NCI Thesaurus, NCI
Metathesaurus, etc.
McGuinness ISWC 10/23/2017
Discussion topics
• Old style ontologies along with their
considerations are still important…
Expressiveness is still an issue… and may
be a growing issue.
• But old style, old processes, old
ecoysystems will not make the impact we
want them to without buy in from a diverse
community of developers and users
• Ecosystems matter! With respect to
process, infrastructure, community, ….
• Goble’s point from Semantic Science –
need community, driver, tools
• Modern age ontologies are just one piece
– an important piece – but one part of the
puzzle – without the other puzzle pieces
we will not change the world
McGuinness ISWC 10/23/2017
Value Propositions Matter to
Get and Keep Collaborators
What will we be able to do that is hard or impossible today? One set of topics from
an applied mathematician collaborator (Bennett)
• How to merge data from heterogeneous data sources for analysis
• What types of data are available for analysis
• What interesting analysis questions we are capable of asking
• Is a potential analysis question too broad or imprecise for the data
• Which adjustment covariates should be used for a given analysis question
• Which statistical and machine learning methods and workflows are appropriate
• What background information might be relevant for an analysis question
• If measurements are plausible and can be trusted
• Are there explanations of derived results/hypotheses in literature
• Are results similar to those of prior analyses
• What are appropriate ways to visualize and present results to user
• Should changes in data trigger a reanalysis/new analysis of questions of interest
McGuinness ISWC 10/23/2017 partially supported through IBM Cognitive Network
Preliminary Study
Bennett, Erickson, McCusker, McGuinness et al
McGuinness ISWC 10/23/2017
Hypothesis:
Does factor increase odds
of disease?
User Specifies:
Data (NHANES Cohorts)
Disease Definition
Confounders (age, BMI)
Factors (pesticides)
Agent dynamically applies
standard risk analysis
workflow based on log-odds
Applicable to any risk
problem and data sets
User
Specify Data
Specify Model
Ingest and Clean Data
Conduct Modeling
Analyze and Visualize Results
Knowledge Graph
Semantic Browser
Analytics Agent
WORKFLOW
Semantic-analytics framework
to support precision health
Obtain goals from all stakeholders
One analyst’s Goals:
• Integrate analytics with knowledge graphs to select
germane data, discover relevant patterns, predict
outcomes, and provide interpretations in response to
queries from users or cognitive agents.
• Design and demonstrate semantic analytics workflows
across the knowledge graph to support precision health
inquiries
• Discover new patterns and predict outcomes to create new
knowledge and insights from the knowledge graph with the
assistance of a cognitive computing agent.
Bennett, Erickson, McCusker, McGuinness, et al
McGuinness ISWC 10/23/2017
Some Observations
• Ontologies are coming of age again…. But in some different
ways and as part of much larger ecosystems
• Champions are emerging out of a number of fields: (e.g., Bio, Env
Health, Biostatisticians, Earth Scientists, Nano materials scientists, etc.)
• Ontologies can support question formation, validation, and
answer generation in new ways
• Ontologies can support movement across abstraction levels
• Ontologies should not be done alone – community requested,
developed, & maintained resources are the future
• Ontology engineering is evolving to be more community-centric
• Building for longevity now also an early consideration (wine
ontology taught me early lessons
• Ontologies can help change the world when viewed as part of
ecosystems…. Lets change the world together!
McGuinness ISWC 10/23/2017
Questions?
• Contact:
[email protected]• Thanks to many: RPI Tetherless World
team particularly McCusker, Erickson,
Hendler, Pinheiro, Rashid, Liang, Liu,
Chastain; RPI: Bennett, Dyson,
Seneviratne; Mount Sinai particularly
Teitelbaum, Stingone, Mervish,
Gennings, Kovatch; IBM, particularly
Das, Chen, Chang, Brown, ….
• Funding: NIEHS 0255-0236-4609 /
1U2CES026555-01, DARPA HR0011-
16-2-0030, IBM-RPI HEALS, NSF
ACI-1640840
• Forthcoming book: Ontology
Engineering with Kendall
McGuinness ISWC 10/23/2017