Date post: | 13-Jan-2016 |
Category: |
Documents |
Upload: | owen-hensley |
View: | 218 times |
Download: | 0 times |
<Insert Picture Here>
Semantic Technology in Oracle Database
Data Interoperability Challenges
• Data locked into schemas, formats, software systems• Semantic technology seen as a possible solution• Specialty RDF data management engines are isolated from the data to be integrated
• In addition there are high training costs, systems admin costs, management costs.
• Tightly coupling semantics (RDF/OWL) functionality to the data storage infrastructure will facilitate data integration using semantics
RDF/OWL Triples
BusinessData
Semantic Apps
BusinessApps
Enterprise DataServer
RDF DataServer
Adding advanced RDF services to Oracle Database
• Database features and queries can be enhanced using semantics
• Hybrid queries between enterprise data and semantic data possible
• Databases are part of infrastructure in several categories of applications that use semantics for data integration
• Biosurveillance, Social Networks, Telcos, Utilities, Text, Life Sciences, GeoSpatial
• All database benefits become available for semantic applications
• Scalability: Manage datasets 10X larger than specialized RDF/OWL stores (billions of triples), no scalability boundaries
• Billions of nodes, large graphs, parallel loading, query, indexing• Security, transaction control, availability, backup and recovery, lifecycle
management, etc.• Can combine multiple datatypes (geospatial, sensor, etc. with semantic data)
Oracle 10g RDF Approach
• Provide an open and persisted RDF data model and analysis platform for semantic applications
• RDF Data Model with inferencing (RDFS and user-defined rules)
• Inferencing based on forward-chaining
• Perform SQL-based access to triples and inferred data• Combine SQL query of business with RDF graphs and
ontologies • Support large graphs (billion+ triples)• Easily extensible by 3rd party tools/apps
Use Case: Knowledge Mining Solutions
Information Extraction
Categorization, Feature/term Extraction
Web Resources
News, Email, RSS
Content Mgmt. Systems
Processed Document Collection
RDF/OWL
Knowledge Mining & Analysis
• Text Indexing using Oracle Text
• Non-Obvious Relationship Discovery
• Pattern Discovery
• Text Mining
• Faceted Search
AnalystBrowsing, Presentation, Reporting, Visualization, Query
SQL/SPARQL Query
Explore
Domain Specific
Knowledge Base
OWL
Ontologies
Ontology Engineering Modeling Process
Geospatial Semantic Search
Schemas:• Persisted RDF/OWL data• Persisted spatial data• Persisted business data• Persisted text data
GeoSemantic Processes• Text Extraction • Semantic Modeling • Rules/Policy Mgmt. • Geospatial Analysis• Map Visualization• Semantic Search
RDF Models Spatial Data
Oracle 10g RDBMS
Business Data Text Data
Semantic Solutions on the WebDeploying on a SOA Infrastructure
• Simple Features• GeoRaster• Topology• Networks• Spatial Data Mining• Geocoding• Routing• Versioning• DBMS Rules
• J2EE Container• SOAP Web sevices• Orchestration &
Workflow• Security• Policy based resource
mgmt• Workload scaling• Portal• Wireless & Sensor
Core SoftwareInfrastructure
Semantic-Enabled toolsApplications& Services
• Business Logic• Entity Extraction• Visualization• Ontology Modeling• Faceted Search• Link/Graph Analysis• Advanced Inference• Metadata Repository• Entity Categorization• Relationship analysis
National Security
Financial RiskAnalysis
RegulatoryCompliance
Life SciencesDrug Discovery
Health ScienceBioSurveillance
Manufacturing Configuration Management
Semantic Technology Stack
Standards
based
Based on Standards
• Our implementation entirely based on W3C standards (RDF, RDFS, OWL)• SPARQL support is planned
• We are members of:• W3C DAWG (WG responsible for SPARQL)• W3C SWEO Interest group• W3C HCLS Interest group• W3C Multimedia Semantics Incubator group• Soon to be formed W3C OWL 1.1 Working group
Technical Features
• Database storage model for data represented in RDF• SQL-based query of RDF data• Combining RDF queries with relational queries• Native inferencing engine to infer new relationships
from RDF data
Technical Overview
RDF/OWL data and
ontologies
Enterprise (Relational)
data
Query RDF/OWL data and
ontologies
Combining relational queries with RDF/OWL
queries
INFERS
TO
RE
QUERY
RD
F/S
Use
r d
ef.
rule
s
Ba
tch
-L
oad
Incr
. L
oad
and
D
ML
Storage: Highlights
• Stores <subject, predicate, object> triples• Set of triples form an RDF/OWL graph (model)
• Optimized storage structure: repeated values stored only once (uses normalization)
• Scales to very large datasets• No limits to amount of data that can be stored
• Current users: 600Million+ triples (UTH)
• Can handle multiple lexical forms of the same value• Ex: “0010”^^xsd:decimal and “010”^^xsd:decimal
• Maintains fidelity (user-specified lexical form)• Supports long literal values
John Oracle
:employeeOf
Semantic Data Storage
ID (number) TRIPLE (sdo_rdf_triple_s) … … …
Model
Model
Triple (SDO_RDF_TRIPLE_S)
…..
Internal Semantic Store
Application table 1
Application table 2
• Application table links to model in internal semantic store
Optional columns for related enterprise data
Query RDF Data
• SPARQL-like graph pattern embedded in SQL query• Matches RDF/OWL graph patterns with patterns in stored data• Returns a table of results• Can use SQL operators/functions to process results• Avoids staging when combined with queries on relational data• Scales: millisecond query times for large data sets (10M+ triples)
SELECT …
FROM …, TABLE (
SDO_RDF_MATCH invocation ) t, …
WHERE …
SDO_RDF_MATCH( '(?x rdf:type :Person)', -- pattern: all persons
SDO_RDF_Models('family'), -- RDF data models
SDO_RDF_Rulebases(‘RDFS'), -- rulebases
SDO_RDF_Aliases(…) -- aliases
null -- no filter condition
)
Query Example: Family Data
select x, y, name from
TABLE(SDO_RDF_MATCH(
‘(:Tom :hasParent ?x)
(?x :hasFather ?y)
(?y :name ?name)',
SDO_RDF_Models('family'),
.., .., ..));
Returns the name of Tom’s grandfather
:Jack :Tom
:Janice:John
:Suzie :Matt
“John D”
X Y NAME
Matt John “John D”
Combining RDF Queries with Relational Queries
• Find salary and hiredate of Tom’s grandfather(s)
• SELECT emp.name, emp.salary, emp.hiredateFROM emp, TABLE(SDO_RDF_MATCH( ‘(:Tom :hasParent ?y) (?y :hasFather ?x) (?x :name ?name)’, SDO_RDF_Models(‘family'), …)) tWHERE emp.name=t.name;
Inference: Overview
• Native inferencing in the database for• RDF, RDFS • User-defined rules
• Rules are stored in rulebases in the database• RDF graph is entailed (new triples are inferred) by
applying rules in rulebase/s to model/s• Inferencing is based on forward chaining: new triples
are inferred and stored ahead of query time• Minimizes on-the-fly computation and results in fast query
times
Inferencing
• RDFS Example:
A rdf:type B, B rdfs:subClassOf C
=> A rdf:type C
Ex: Matt rdf:type Father, Father rdfs:subClassOf Parent
=> Matt rdf:type Parent
• User-defined Rules Example:
A :hasParent B, B :hasParent C
=> A :hasGrandParent C
Ex: Tom :hasParent Matt, Matt :hasParent John
=> Tom :hasGrandParent John
Query Example: Family Data
select y, name from TABLE(SDO_RDF_MATCH(
‘(:Tom :hasGrandParent ?y)
(?y :name ?name)’
(?y rdf:type :Male),
SEM_Models('family'),
SEM_Rulebases(‘family_rb),
.., ..));
Returns the name of Tom’s grandfather
Y NAME
John ‘John D’ :Jack :Tom
:Janice:John
:Suzie :Matt
“JohnD”“JohnD”Male
Data Integration in the Life Sciences
“Find all pieces of information associated with a specific target”
• Data integration of multiple datasets• Across multiple representation formats, granularity of representation, and access
mechanisms• Across In-house and public sets (Gene Ontology, UniProt, NCI thesaurus, etc.).
• Standardized and machine-understandable data format with an open data access model is necessary to enable integration
• Data-warehousing approach represents all data to be integrated in RDF/OWL• Semantic metadata layer approach links metadata from various sources and
maps data access tool to relevant source• Ability to combine RDF/OWL queries with relational queries is a big benefit• Lilly and Pfizer are using semantic technology to solve data integration
problems
Use Case: SenseLab Overview
Courtesy, SenseLab, Yale University
Relational to Ontological Mapping
Drug
Neuron
PathologicalAgent
Receptor
Channel
inhibitsinhibits
Agent
NeuronalProperty
PathologicalChange
involvesinvolves inhibits
Compartment
has
is_located_in
is_located_in
Courtesy, SenseLab, Yale University
<Insert Picture Here>
Semantic Technology Plans for the Next Release
Safe Harbor Statement & Confidentiality
The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions.The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Plans for the Next Release
• Fast bulk-load RDF/OWL data into the database• Several times faster than 10.2.0.2 batch load
• Infer new triples with native OWL inferencing• Faster query of RDF/OWL data and ontologies• Ontology-Assisted Query of relational data
Overview
RDF/OWL data and
ontologies
Enterprise (Relational)
data
Query RDF/OWL data and
ontologies
INFERS
TO
RE
Ontology-Assisted Query of
Enterprise Data
QUERY
RD
F/S
Use
r-de
f.
Ba
tch
-Loa
dO
WLsu
bse
ts
Bu
lk-
Loa
d
Incr
. D
ML
Technical Overview Summary
• Semantic Technology support in the database• Store RDF/OWL data and ontologies• Infer new RDF/OWL triples via native inferencing• Query RDF/OWL data and ontologies• Ontology-Assisted Query of relational data