Date post: | 26-Dec-2015 |
Category: |
Documents |
Upload: | annis-bailey |
View: | 215 times |
Download: | 0 times |
Contents Introduction Different Architectures
• Implications An Example : Jena SDB Evaluations
• Evaluations using LUBM/DBPedia Open Research Issues Which RDF Store to choose for a particular application? Possible system diagram for Phenotype Annonations.
Introduction What is an RDF store?
A system to provide a mechanism for persistent storage and access of RDF graphs.
Potential Applications areas:
Plenty! Backend for Protege, BioPortal, Phenotype Annotations.
Different Architectures Based on their implementation, can be divided into 3
broad categories : In-memory, Native, Non-native Non-memory.
In – Memory : RDF Graph is stored as triples in main –memory. Eg. Storing an RDF graph using Jena API/ Sesame API.
Native : Persistent storage systems with their own implementation of databases. Eg. Sesame Native, Virtuoso, AllegroGraph, Oracle 11g.
Non-Native Non-Memory : Persistent storage systems set-up to run on third party DBs. Eg. Jena SDB.
Implications Scalability Different query languages supported to varying degrees.
• Sesame – SeRQL, Oracle 11g – Own query language. Different level of inferencing.
• Sesame supports RDFS inference, AllegroGraph – RDFS++,
Oracle 11g – RDFS++, OWL Prime Lack of interoperability and portability.
• More pronounced in Native stores.
Jena SDB SDB basically is a Java Loader. Multiple stores supported: MySQL, PostgreSQL, Oracle,
DB2. Takes incoming triples and breaks them down into
components ready for the database. Multiple layouts Integration with the Joseki server. SPARQL supported.
(Non) Interest Declaration: I was previously an intern at HP Labs with the Jena team
Evaluations Third party evaluations for Sesame, Jena SDB, Virtuoso Oracle 11g company evaluations Methodology
• LUBM – Lehigh University BenchMark• DBPedia• Multiple Queries• Load Times
Evaluations DB Pedia – Database of structured information extracted
from Wikipedia. Information about places, persons, music albums and films[2]
LUBM – Synthetically generated RDF data containing universities, departments, students etc.[1]
Dataset size:• DataSet1: 15,472,624 triples; 2.1 GB• DataSet 2: LUBM 50 – 2.75 Million & LUBM 1000 – 55.09
Million• 3 Queries
Oracle 11g – DataSet 2Ontology (size) RDFS OWL Prime
Triples Time Triples Time
LUBM – 50(6.8 Million) 2.75 M 12.14 min 3.05 M 8.01 min
LUBM – 1000(133.6 M)
55.09M 7h 19m 65.25M 7h 12m
Observations Native Stores perform better than systems using third
party stores.• Optimizations are possible
Each of the systems uses different database layouts. • Virtuoso – OGPS,POGS,PSOG,SOPG• SDB – SPO,GSPO
Hashing on SDB is very bad.
Open Research Issues Inferencing[4]
• Present common implementations:• Make a number of small queries to propagate the effects of rule firing.• Each of these queries creates an interaction with the database.• Not very efficient
• Approaches• Snapshot the contents of the database-backed model into RAM for the
duration of processing by the inference engine.• Performing inferencing in-stream.
• Precompute the inference closure of ontology and analyze the in-coming data-streams, add triples to it based on your inference closure.
• Assumes rigid seperation of the RDF Data(A-box) and the Ontology data(T-box)
• Even this maynot work for very large ontologies – BioMedical Ontologies
Open Research Issues Query Optimization
• Third party stores undo’s any optimization done at the API level.
• Better performance of native stores points to that direction.• Some work in optimizing SPARQL queries for in-memory
story.
Which RDF store to choose for an app? Frequency of loads that the application would perform. Single scaling factor and linear load times. Level of inferencing. Support for which query language. W3C
recommendations. Special system needs. Eg. Allegograph needs 64 bit
processor.
Phenotype Annotations
Set of Ontologies required for Phenotype
Annotationseg. PATO, Fly etc.
jJena Model
SDB
MySQL/ Virtuoso
Phenotype Annotations
Jena API
Jena API
Jena API
Inferencing
Jena API
j
Jena API
Jena Model
SDB
References [1] http://esw.w3.org/topic/RdfStoreBenchmarking [2] http://www4.wiwiss.fu-berlin.de/benchmarks-200801/ [3] Kurt Rohloff et al.: An Evaluation of Triple-Store
Technologies for Large Data Stores. Comparing Sesame, Jena and AllegroGraph. 2007
[4]N Bhatia, A Seaborne – ‘Ingestion pipeline for RDF’