Knowledge Subproject Deana Pennington, Natalia Villanueva Rosales,
and Paulo Pinheiro da Silva
University of Texas at El Paso
2/16/2012 Board of Advisers 2012
Outline
Change in personnel
Highlights of first five years
Plans for next five
Dr. Paulo Pinheiro da Silva => Pacific Northwest National Lab
New co-leads:
Dr. Deana Pennington
Dr. Natalia Villanueva Rosales
2
Knowledge about events: time awareness
Natalia www.natalia-villanueva.com
Natalia es mexicana All Mexicans love Mexican food
Event
(today)
is presenting
happening at
Natalia is currently at UTEP.
I am giving a lecture in
Cyber-ShARE, UTEP
11/12/2011, 14:00hrs
3
Vision: First five years
2/16/2012
Scientific
Discovery
Experimental
Science
Theoretical
Science
D
ata
, p
rod
uct
un
ders
tan
din
g a
nd
accep
tan
ce
(tru
st
an
d u
ncert
ain
ty m
an
ag
em
en
t)
Da
ta c
ura
tio
n Data and product
attribution
Product
derivation
understanding
Da
ta a
nd
pro
du
ct
dis
co
ve
ry/r
eu
se
Process
understanding
Info
rma
tio
n
inte
gra
tio
n
Domain
ontologies
Task
ontologies
Provenance
Knowledge of
data, product
derivation and
assertion
Process
knowledge
Domain
knowledge
General-
purpose
ontologies
Common
knowledge
Computational
Science
Data
Exploration
Science
5
Organized (Connected) Data
Crustal Modeling SAW
Crustal Modeling WDO
Seismology WDO
Hole’s Code Tomography
South NM Tomography British Columbia
Tomography
OWL Time ontology
SWEET ontology
pro
ve
nan
ce
Ab
str
act
Wo
rkfl
ow
s
Do
main
On
tolo
gie
s
General-Purpose
Ontologies PML ontology
Accomplishments Development of CI-Miner Infrastructure
6
Accomplishments Development of CI-Miner methodology
Development of three complementary integrated languages:
Workflow Driven Ontologies
Semantic Abstract Workflows
Proof Markup Language
Capable of enhancing scientific processes with:
Semantic search
Knowledge-based visualization
Information management services
7
Students
Two Computer Science PhD students graduating this
spring/early summer Nick Del Rio – knowledge based visualization
Leo Salayandia – semantic abstract workflows
Research presentations to follow
Three more over the next year Aida Gandara -- Semantically enabled collaboration
Jitin Arora – Triple stores
Hugo Porras – Visualizing provenance
Antonio – Representing human processes
8
Building on the past
Continue work on ontologies, provenance, and other knowledge representation tools
Extend work to incorporate new areas of expertise with Natalia Villanueva Rosales
Consider new directions in human factors and the semantics of human processes
Semantic abstract workflows capture human processes
Other mechanisms to capture semantics (Deriva)
Semantic collaboration focuses on supporting human collaboration
Deana Pennington’s research on collaboration and knowledge synthesis
Consider new directions linking visualization and semantics
9
Pharmacogenomics ontology
PharmGKB database : Genes, Gene variants, SNPs , Drugs, Measures and outcomes, interactions, treatments.
Manual augmentation with
literature curated pharmacogenomics knowledge of depression.
- effective drug treatment - - favorable outcome? - possible side effects? -gene variants affect therapeutic outcomes?
Semantic Web for retrieving and integrating
scientific knowledge for pharmacogenomics
10
[Ferres et. al., 2009, JWS.]
Retrieve all time series graphs.
TimeSeriesGraph: EquivalentTo Graph
and hasPart some TimeSeries
E.g.
series1 hasPart datapoint7, datapoint7
hasPart x7,
x7 type SecondQuarter,
SecondQuarter subClassOf Quarter,
Quarter subClassOf TimeInterval,
graph1 hasPart series1
Using Protégé 4 alpha (build 53) , FACT++ DL reasoner
and Manchester Syntax. Across graphs.
Statistical graphs query answering,
same methodology, different data sources
Statistical graphs query answering,
same methodology, different data sources.
11
[Dumontier & Villanueva Rosales, 2009, BiB.]
Summary of first approach:
manual creation of ontologies
Ontologies: - Functional groups [Villanueva-Rosales & Dumontier, 2007, OWLED]
- yOWL [Villanueva-Rosales & Dumontier, 2009, JBI]
- Pharmacogenomics
- Statistical graphs
Involved: - Developing customized parsers to
obtain data.
- Analysis of database, tab files
structure, web services definition.
Disadvantages: - Not very scalable approach (time consuming).
- Parsers hard to reuse or maintain.
12
Can we automate the process of creating
ontologies from relational databases?
Goal
Automatically create an expressive OWL ontology using
a set of rule-based heuristics represented in Semantic
Web Languages over a normalized relational database.
Represent, create and execute mappings between
relational databases and ontologies.
Database Ontology
13
DBOwlizer enables the creation and execution of
mappings between DB and OWL ontologies
Employees
Schema
Ontology
Emplo-
yee
string
hasName
Depart-
ment
worksfor
Employees database schema diagram
(excluding views) auto-generated by
MySQL Workbench ver. 5.1.18
Relational-model
Relational-to-ontology-
mapping
Heuristics
DBOwlizer
14
DBOwlizer maps information in views
(queries)
DB View Ontology
Employee
>=80,000
hasEmployee.salary
CREATE VIEW
`employees `.`high_salary_employee_salary` AS
SELECT
`employees`.`employee`.`name` AS `name`,
`employees`.`employee`.`salary` AS `salary`
from `employees`.`employee`
where (`employees`.`employee`.`salary` >
80,000)
Class: dl:High_salary_employee_salary
EquivalentTo:
(Employee
and (dl:hasEmployee.salary some
xsd:double[>= 80,000]))
and (dl:hasEmployee.salary some
rdfs:Literal)
15
Bottom-up (semi-) automated
methodology
Better capture of intended semantics of the data than other
approaches.
Exposing contents + mappings on the semantic web (deep web).
Enables query of database(s) with terminology from domain
ontologies (with mapping).
Geospatial
ontology
Query about temperature
in specific locations
Query about
C fluxes in a
specific
region
Query about temperature and C fluxes in a
specific region to identify correlations
16
Future directions
Bottom-up approach in scientific domains of knowledge (e.g. environmental sciences, geo) - Develop use cases for data integration, exchange and question
answering.
- Mapping extracted ontologies to domain ontologies.
Improve scalability and robustness. - Computer science research questions (i.e. complexity,
expressivity, optimizations).
Map and contribute to the RDB2RDF W3C’s working group (Semantic Web community) - Benchmark for Relational Databases to Ontology
Include human factor (Deana’s work)
17
New directions: Human factors
Ontologies
Data/Process
Connect
(Top down)
Generate
(Bottom up)
Grudin 1994 Groupware challenges (8)
-Work vs benefit (perceived usefulness)
-Critical mass
-Disrupt social processes
-Failure of intuition
-Difficult adoption process
Dataset
Data driven
(Natalia)
Documents
Document driven
(IBM Watson; PNNL)
Human
Actions
Tags
Links Socially driven
(Deana)
Human reasoning about scientific
data, processes, and documents
-Semantic-enabled collaboration
Boundary
Negotiating
Objects (Lee, 2010
Pennington, 2010)
Bada (2004)
-Clear goals
-Limited scope
-Simple, intuitive
•How can we semantically-enable this process? Workflow driven
(Semantic abstract workflows)
Analysis
18
Analysis driven
Semantic Workflow Negotiation ???
Semantic Abstract Workflow
Executable Scientific Workflow
19
Bottom-up (semi-) automated
methodology in Cyber-ShARE
Geospatial
ontology
Query about temperature
in specific locations
Query about
C fluxes in a
specific
region
Query about temperature and C fluxes in a
specific Region to identify correlations
20
Negotiation artifact driven:
Initial Model (Pennington 2010)
Boundary
Negotiating
Objects
Method
Alignment
& Standardization
Boundary
Concepts
Conceptual
Linkages
Boundary Objects
(Star & Griesemer 1989)
Standardized
Concepts & => Ontologies
Processes
Support
Negotiating
Purpose/Scope
Via boundary
Negotiating
objects
Standardization
Alignment
Clear goals
Limited scope
Simple
Intuitive to the scientist 21
Potential Research Objectives
1. Model boundary negotiation process (Deana)
2. Knowledge representation approaches (Natalia)
3. New approaches to bridge gaps between
4. Evaluate usefulness
22
Questions?
2/16/2012