+ All Categories
Home > Technology > Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Date post: 23-Jan-2018
Category:
Upload: deborah-mcguinness
View: 29 times
Download: 0 times
Share this document with a friend
35
Ontologies for the Modern Age Deborah L. McGuinness Tetherless World Senior Constellation Chair Professor of Computer, Cognitive, and Web Science Director RPI Web Science Research Center RPI Institute for Data Exploration and Application Health Informatics Lead [email protected] , @dlmcguinness ,
Transcript
Page 1: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Ontologies for the Modern Age

Deborah L. McGuinnessTetherless World Senior Constellation Chair

Professor of Computer, Cognitive, and Web Science

Director RPI Web Science Research Center

RPI Institute for Data Exploration and Application Health Informatics Lead

[email protected] , @dlmcguinness ,

Page 2: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

We have come a long way

since 2001

Tracks:

• Ontology and Ontology Maintenance

• Interoperability, Integration, &

Composition

• Web Services & Applications

• Needed to add tutorial / demo / BOF

track to handle large preregistration

numbers

Sponsors:

VerticalNet, Nokia, Spiritsoft, Enigmatic,

Empolis, Connotate, Mondeca, L&C, SC4,

Network Inference, Ontoprise, Inria, KSL,

NSF, DARPA

From http://swsa.semanticweb.org:

245 Attendees | 35/58 Papers Accepted | 3

Tutorials | 0 Workshops

and 2 co-located events, plus BOF/DEMO

• Kicked off Semantic Web Science

Association (SWSA) and

• ISWC conference series (2002)

• Background for Web Science / Web

Science Trust (2006)

McGuinness ISWC 10/23/2017

Page 3: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Themes continue and expand

Co-located and track themes valid then

and expanding now

Tutorials - 7

Workshop explosion – 18

– some of which are vibrant communities

and have been running for many years or

evolved (e.g. Linked Science -> Enabling

Open Semantic Science)

- Some continued themes – Ontologies

come of age (again) 2011

- Some newer themes (e.g., benchmarking

linked data, semantic web for x: IoT,

biodiversity, etc.)

McGuinness ISWC 10/23/2017

Page 4: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Ontologies

An ontology specifies a rich description of the

• Terminology, concepts, nomenclature

• Relationships among concepts and individuals

• Sentences distinguishing concepts, refining

definitions & relationships

relevant to a particular domain or area of interest.

* Based on AAAI ‘99 Ontologies Panel ̶ McGuinness, Welty, Uschold, Gruninger, Lehmann

McGuinness 6/7/2017

• "Pull" for Ontologies. Invited

talk. Semantics for the Web.

Dagstuhl, Germany, 2000.

• Ontologies Come of Age.

Fensel, Hendler, Lieberman,

Wahlster, eds. Spinning the

Semantic Web: Bringing the

World Wide Web to Its Full

Potential. MIT Press, 2003.

McGuinness ISWC 10/23/2017

Page 5: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Ontology-Enabled Application

Configurator Example

McGuinness, Resnick, Isbell. Description Logic in

Practice: A CLASSIC: Application. IJCAI, 1995.

Web-based configurator.

KR-literate designer and

maintainer

Tools like CLASSIC,

Protégé, Ontolingua,

Chimaera, PROMPT, …all

benefit by having a

knowledge representation

expert project owner /

maintainer with domain

expert access

Applications of the day lived

reasonably* well with limited

expressivity

McGuinness ISWC 10/23/2017

Page 6: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Building and Evolving

Ontologies Past Present

Design Knowledge Representation

(KR) Expert with domain

expert access

KR Expert(s) paired with domain

experts AND community

Population KR expert learns domain and

builds ontology with some

external reuse

KR and domain experts determine

seed vocabularies and HEAVILY

leverage them

Evolution KR expert heavily involved KR expert involved in building /

customizing tools that domain

experts use; Input may include

automatic techniques output (e.g.,

extraction)

Tool Users Trained in Computer Science Trained in Domain ScienceS

Application Users Targeted well understood

user base

Diverse and evolving user base

Reuse Well thought out Expect the unexpectedMcGuinness ISWC 10/23/2017

Page 7: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

●Limited data integration without controlled

vocabulary

●Limited reproducibility without shared

definitions

●Difficulty in reuse without provenance

Ontologies can enhance integration,

communication, reuse, and research impact

Ontology “Pull”: Browsing / Configuration

to Interoperability / Transparency

McGuinness ISWC 10/23/2017

Page 8: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Data Life Cycle

Consistent

terminology

and

meaning

Ontology-enhanced

Search and

organization

Data management Image: J.Crabtree with permission NIEHS 50 yr FEST

Ontology-enabled

interpretation & integrationOntology-enabled

integrity checking

Provenance

annotations for trust

and reuse

Computer understandable specifications of meaning

(semantics) support enhanced lifespan & impact of data

McGuinness ISWC 10/23/2017

Page 9: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Child Health Exposure

Analysis Repository

Stingone, Mervish, Kovatch, McGuinness, Gennings, and Teitelbaum. Big and Disparate Data: Considerations for Pediatric Consortia. Current Opinions in Pediatrics Journal. 29(2):231-239, April 2017 Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01.

McGuinness ISWC 10/23/2017

Page 10: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Ontology Development Process

**

Use Cases

Existing Ontologies

& Vocabularies

Expert Interviews

Labkey,

Ontology

Fragments

Ontology

Curation

(ongoing)

Reviewers & Curators

* Ontology Development Team

* Domain collaborators

* Invited experts

* "Consumers" (data analysts)

Knowledge Graph

Integration

* Linking data and

metadata content to

domain terms

* Linking workflows

based on semantic

descriptions

Repository

Integration

* Source Datasets

* Analytics source

code

* Results

* Publications

Knowledge-

Enhanced

Search

Finding what

is there that

might be of

use

Semantic

Extract

Transform,

Load

(SETLr)

Expert

Guidance

Sources

Data Reporting

Templates

Data Dictionaries /

Codebooks

Foundational

Ontologies/Vocabularies

Human Aware

Data Acquisition

Framework

Ontology

Browser

Generated

Ontology

* domain concepts

* authoritative

vocabularies

* vetted definitions

* supporting citations

Erickson, McGuinness, McCusker, Chastain, Pinheiro, Rashid, Liang, Liu, Stingone, …

Exemplified by

McGuinness ISWC 10/23/2017

Page 11: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

11

• Ontology support for mapping and integration (e.g., education level)

• Ontology informs decisions about variables that may be combined, serve as proxy, or used to derive desired info (e.g., birth outcomes)

• Ontology Integrity constraints may help flag errors (e.g., APGAR > 10)

• Ontology helps expose implicit information and find links

Fenton Z-Score

Sex

Birth weight

Gest Age

Mother’s Highest Education Level

Val

Did not attend school 0

Elementary school 1

Technical post-primary 2

Middle school 3

Technical post-middle school 4

Highschool or junior college 5

Technical post-junior college 6

College 7

Graduate 8

Doesn’t know 9

Mother Education

Val

Less than High School 0

High School Graduate or More 1

Support Browsing,Searching, Pooling

Deriving Values, Verification, …

McGuinness, McCusker, Pinheiro, Stingone, et. al. Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

McGuinness ISWC 10/23/2017

Page 12: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Laboratory Information Mgmt

System (LIMS)-based

backend integrated with

Ontology.

Includes automatic ingest,

access control, data

governance, download, ...

Supports Search study,

sample, subject, ...

Enables statiscians to ask for

content to support their

studies e.g., find

Child: Birth Weight, Gender,

Gestational Age at Birth

Mother: Age, BMI “early in

pregnancy based on

inclusion criterion for the

particular study”, Parity,

Education

Metals: As, CD, Mn, Mo, Pb

CHEAR Human Aware Data

Acquisition Framework

McGuinness ISWC 10/23/2017Pinheiro, Liang, Rashid, Liu, Chastain, Santos, McCusker, McGuinness

Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Page 13: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

CHEAR Ontology infrastructure

13

Thousands of instances of CHEAR & foundational

ontologies (e.g., subjects, samples, lab capabilities)

Thousands of concepts and

relationships from

foundational ontologies

Hundreds of concepts from the CHEAR ontology

258 Analytes (incl. 36 metals, from lab spec)

176 Epidemiological Attributes (from pilots)

28 Sample types (incl. 3 pregnancy, from lab spec)

42 Assay types (from Lab Capabilities)

122 Instrument types (from Lab Capabilities)

We use:

●Labkey to create, curate

and maintain CHEAR

concepts (ontology)

●Labkey to create and

maintain CHEAR instances

(knowledge graph)

●SETLr to build and publish

the CHEAR ontology from

CHEAR concepts

●HADatAc to connect

CHEAR/foundational

concepts and instances to

CHEAR data

●HADatAc to

browse/select/retrieve

CHEAR data from CHEAR

vocabulary

Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES02655501

Disease Ontology

UBERON (Anatomy)

Units Ontology

CHEBI (Chemicals)

RefMet* (Metabolites)

ENVO* (Environment)

UniProt* (Proteins)

HAScO (Instruments/methods)

SIO (Semantic Science Int Ont)

PROV (Provenance)

Example Ontology and

infrastructure (CHEAR)

Page 14: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Content keeps expanding…Metabolomics

Targeted Analytes and RefMet

Refmet main classes and

subclasses are mapped to

CHEBI classes where

available.

CHEAR targeted analyte

classes and superclasses

are also aligned to CHEBI.

When including the CHEBI

hierarchy as well, the

following main classes and

subclasses in RefMet have

targeted analyte

subclasses (count in

parentheses).

McGuinness ISWC 10/23/2017 Funding: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Page 15: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Mapping Data to Meaning:

Semantic Data Dictionaries

Rashid, Chastain, Stingone, McGuinness, McCusker. The Semantic Data Dictionary

Approach to Data Annotation and Integration. Enabling Open Semantic Science, Oct 21, 2017

McGuinness ISWC 10/23/2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Page 16: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Columns

id

race

age

edu

bmi

weight

height

smoker

pb_1

pb_2

ga

birthwt

Semantic Data Dictionaries

Describe a Bigger Picture

id(sio:Identifier)

race(sio:Race)

age(sio:Age)

edu(chear:EducationLevel)

bmi(chear:BMI)

weight(sio:Mass)

height(sio:Height)

smoker(chear:SmokingStatus)

pb_1(sio:Concentration)

pb_2(sio:Concentration)

ga(chear:GestationalAge)

birthwt(chear:Weight)

??mother(sio:Human)

??child(sio:Human)

??birth(chear:Birth)

??pregnancy(chear:Pregnancy)

??sample1(Serum)

??sample2(Serum)

??pb_1(Pb)

??pb_2(Pb)

hasAttribute

??visit1(chear:Visit)

??visit2(chear:Visit)

hasAttribute

existsAtwasDerivedFrom

hasPart hasAttribute

inRelationTo

existsAt

existsAt

inRelationTo

(chear:Mother)

inRelationTo

hasRole

Plus Units of

Measure

(not shown)

McGuinness ISWC 10/23/2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Page 17: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Epidemiological Measurements

Concepts /

relationships from

foundationahttp://

websci16.org/spo

nsorsl ontologies

Examples

of terms

from the

CHEAR

ontology

Examples

of terms

from th

CHEAR

ontol

Examples

of concepts

from the

CHEAR

ontology

Instance of

foundational

ontology

term

Ontology and Knowledge

Graph (Behind the Scenes)

Concepts /

relationships from

foundational

ontologies

McGuinness ISWC 2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Page 18: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

CHEAR Study-based Evolution

Strategy

Identify terms

that can be

mapped to

existing ontology

Identify terms

to be added

to ontology

Describe new

terms w/

definitions and

location within

existing ontology

Mappings (e.g.

variable names)

incorporated into

knowledge graph

Data into

knowledge graph

after embargo

period

Incorporate new

terms into

existing ontology

Review and

revise updates

with stakeholders

Data Structures

& Standards

Working Group

Compile new

terms across

multiple studies

(e.g. Quarterly)

Data Center

New

version

Ontology

X

McGuinness ISWC 2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Page 19: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Ontology-Enabled Study Search

Blood Biomarkers for Children’s Health (Study 1)

Institution:

Principal Investigator(s):

Number of Subjects:

Number of Samples:

Study Description:

Keywords:

Urine Biomarkers for Children’s Health (Study 2)

Institution:

Principal Investigator(s):

Number of Subjects:

Number of Samples:

Study Description:

Keywords:

Metabolomic Biomarkers for Children’s Health (Study 3)

Institution:

Principal Investigator(s):

Number of Subjects:

Number of Samples:

Study Description:

Keywords:

McGuinness ISWC 2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Page 20: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Data Search expanded view

20

Ontology-Enabled Data Search

McGuinness ISWC 2017 Partially supported by: NIH/NIEHS 0255-0236-4609 / 1U2CES026555-01

Page 21: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

21

• Domain Science is asking for ontologies for Findable, Accessible, Interoperable, Reusable (FAIR) kind of issues

• With tooling and processes, domain scientists can help build and maintain ontologies and ontology-enabled applications, ex. Epidemiologists are doing this

• While classical ontology considerations remain important (e.g., expressive enough for use case), ecosystem considerations dominate many considerations, including maintainability and longevity

• The data center content in CHEAR with the human aware data collection framework front end provides some of the infrastructure I envision in an open* knowledge network

Evolving Reflections

McGuinness, McCusker, Pinheiro, Stingone, et. al

McGuinness ISWC 10/23/2017

Page 22: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Cognitive Computing in Health

Page 23: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Health Empowerment by Analytics,

Learning, and Semantics

• How can we enhance population and

individual health using information found

inside and outside the traditional (E)HR?

• How can we develop precision medicine

across the many levels of research from

Genome to Phenotype to Population

Health?

• How can we use IBM’s Watson

Technology, augmented with Rensselaer’s

semantics, learning, and analytics

expertise to achieve these goals?

McGuinness ISWC 10/23/2017 Partially supported through IBM Cognitive Network funding

Page 24: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Health Empowerment

• Knowledge as Medicine– Cognitive agent technology will enhance ability to

• Explain relationships found in the data and linking these to

appropriate scientific literature,

• Put meaningful labels on clusters and connections derived from

the analytic process,

• Provide inputs to cognitive systems based on the data found in

databases and medical literature

• Generate and/or test hypotheses about health and medicine to

the level of the individual (precision medicine and health)

24McGuinness ISWC 10/23/2017 partially supported

through IBM Cognitive Network

Page 25: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Semantics-enabled Framework

**

Ontologies are an important piece; but are part of a larger integrated framework

Semanalytics / SemNExt RPI team: McGuinness, Bennett PIs with McCusker, Erickson, Seneviratne and the extended Research groups including Rashid here, along with input from IBM HEALS collaborators and motivation from MANY projects, particularly from NIEHS/ Mount Sinai,

McGuinness ISWC 10/23/2017 partially supported through IBM Cognitive Network

Page 26: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Probability-Aware

Knowledge Exploration

• Knowledge imported from drug, protein, and disease interaction databases.

• Each interaction given an evidence-driven probability.

• Find drugs that could affect melanoma, filtered by interaction probability.

• The best hypotheses were generated using the highest probabilities.

McCusker, Dumontier, Yan, He, Dordick, McGuinness. Finding Melanoma Drugs through a Probabilistic Knowledge Graph. PeerJ4L 32007, 2016.

McGuinness ISWC 10/23/2017

Page 27: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Semantic Extract, Transform, Load

for Knowledge Graphs

XML

CSV

JSON

JSON-LD

Templating

Script

RDF

GraphsSatoru

E T L

HTM

L

Entrez

McCusker, Rashid, Liang, Liu, Chastain, Pinheiro, Stingone,

McGuinness. Broad, Interdisciplinary Science In Tela: An Exposure

and Child Health Ontology. Web Science, 2017

McGuinness ISWC 10/23/2017

Page 28: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Cancer Data: Comprehensive

omics, epidemiology & patient care

Initial data analysis aimed initially at the

following public datasets:

TCGA: RNA expression, tumor mutation, protein

expression, and clinical attributes (including staging,

treatment, risk, and survival) on 32 cancer types in >

14,000 patients

NHANES: Cross-sectional biannual survey of the health and nutrition

of the US population, including illness, environmental exposures, and

risk exposures.

Multiparameter Intelligent Monitoring in Intensive

Care: Longitudinal patient records from patients who

stayed in the intensive care units at Beth Israel

Deaconess Medical Center

Additional analysis will include deidentified data in

cancer topics

McGuinness ISWC 10/23/2017 partially supported through IBM Cognitive Network

Page 29: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Building the Knowledge graph:

Reusing Knowledge Sources to Bridge

Abstractions

• Already done: COSMIC Gene Census, OMIM, DrugBank,

iRefIndex

• Pathway data: KEGG, Reactome (small molecule

interactions, curated interactions)

• Gene Ontology: protein localization in cell types and

tissues, protein functions, biological process involvement

• UniProt: Protein families, including common binding sites

• CAP Protocols: Current cancer staging standards,

NCCN… many of these evolve, e.g., breast cancer

staging guidelines

• Vocabularies: SNOMED, NCI Thesaurus, NCI

Metathesaurus, etc.

McGuinness ISWC 10/23/2017

Page 30: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Discussion topics

• Old style ontologies along with their

considerations are still important…

Expressiveness is still an issue… and may

be a growing issue.

• But old style, old processes, old

ecoysystems will not make the impact we

want them to without buy in from a diverse

community of developers and users

• Ecosystems matter! With respect to

process, infrastructure, community, ….

• Goble’s point from Semantic Science –

need community, driver, tools

• Modern age ontologies are just one piece

– an important piece – but one part of the

puzzle – without the other puzzle pieces

we will not change the world

McGuinness ISWC 10/23/2017

Page 31: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Value Propositions Matter to

Get and Keep Collaborators

What will we be able to do that is hard or impossible today? One set of topics from

an applied mathematician collaborator (Bennett)

• How to merge data from heterogeneous data sources for analysis

• What types of data are available for analysis

• What interesting analysis questions we are capable of asking

• Is a potential analysis question too broad or imprecise for the data

• Which adjustment covariates should be used for a given analysis question

• Which statistical and machine learning methods and workflows are appropriate

• What background information might be relevant for an analysis question

• If measurements are plausible and can be trusted

• Are there explanations of derived results/hypotheses in literature

• Are results similar to those of prior analyses

• What are appropriate ways to visualize and present results to user

• Should changes in data trigger a reanalysis/new analysis of questions of interest

McGuinness ISWC 10/23/2017 partially supported through IBM Cognitive Network

Page 32: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Preliminary Study

Bennett, Erickson, McCusker, McGuinness et al

McGuinness ISWC 10/23/2017

Hypothesis:

Does factor increase odds

of disease?

User Specifies:

Data (NHANES Cohorts)

Disease Definition

Confounders (age, BMI)

Factors (pesticides)

Agent dynamically applies

standard risk analysis

workflow based on log-odds

Applicable to any risk

problem and data sets

User

Specify Data

Specify Model

Ingest and Clean Data

Conduct Modeling

Analyze and Visualize Results

Knowledge Graph

Semantic Browser

Analytics Agent

WORKFLOW

Page 33: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Semantic-analytics framework

to support precision health

Obtain goals from all stakeholders

One analyst’s Goals:

• Integrate analytics with knowledge graphs to select

germane data, discover relevant patterns, predict

outcomes, and provide interpretations in response to

queries from users or cognitive agents.

• Design and demonstrate semantic analytics workflows

across the knowledge graph to support precision health

inquiries

• Discover new patterns and predict outcomes to create new

knowledge and insights from the knowledge graph with the

assistance of a cognitive computing agent.

Bennett, Erickson, McCusker, McGuinness, et al

McGuinness ISWC 10/23/2017

Page 34: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Some Observations

• Ontologies are coming of age again…. But in some different

ways and as part of much larger ecosystems

• Champions are emerging out of a number of fields: (e.g., Bio, Env

Health, Biostatisticians, Earth Scientists, Nano materials scientists, etc.)

• Ontologies can support question formation, validation, and

answer generation in new ways

• Ontologies can support movement across abstraction levels

• Ontologies should not be done alone – community requested,

developed, & maintained resources are the future

• Ontology engineering is evolving to be more community-centric

• Building for longevity now also an early consideration (wine

ontology taught me early lessons

• Ontologies can help change the world when viewed as part of

ecosystems…. Lets change the world together!

McGuinness ISWC 10/23/2017

Page 35: Ontologies For the Modern Age - McGuinness' Keynote at ISWC 2017

Questions?

• Contact:

[email protected]• Thanks to many: RPI Tetherless World

team particularly McCusker, Erickson,

Hendler, Pinheiro, Rashid, Liang, Liu,

Chastain; RPI: Bennett, Dyson,

Seneviratne; Mount Sinai particularly

Teitelbaum, Stingone, Mervish,

Gennings, Kovatch; IBM, particularly

Das, Chen, Chang, Brown, ….

• Funding: NIEHS 0255-0236-4609 /

1U2CES026555-01, DARPA HR0011-

16-2-0030, IBM-RPI HEALS, NSF

ACI-1640840

• Forthcoming book: Ontology

Engineering with Kendall

McGuinness ISWC 10/23/2017


Recommended