Post on 12-Apr-2017
transcript
A Performance Evaluation and Optimization of the Rule Materialization
Process in OWL DatabasesPresented by Hicham Berrada
Supervised by Dr. Harroud
OutlineIntroductionBackground studyA new approach for the
materialization processPerformance evaluation and
optimizationConclusionFuture workProblems encountered
IntroductionThe materialization process aims at
physically storing inferred (discovered) data to insure rapid query answering.
The purpose of this project is to enhance this process in OWL databases
This family of knowledge representation is used in the Semantic Web
Today’s Web Limitations
Weaknesses
High recall, low precision.
Results are highly sensitive
to vocabular
y
Results are single
Web pages
Human involvem
ent is necessary
to interpret
and combine results
Results of Web
searches are not readily
accessible by other software
tools
The meaning of Web
content is not
machine-accessible
Semantic WebRepresent the Web content in a form that is more easily machine-
processable
A way to specify
data and data
relationships
reason over explicitly
declared or defined
knowledge
infer new,
implicit knowled
ge
Materialization process
Semantic Web Applications: Search engines
The Semantic Web Stack
The Semantic Web Stack
Related WorkUrbani:Presents materialization and backward-chaining as
different modes of performing inference. ◦ Materialization:
+ Very efficient responses at query time - Expensive up front closure computation, which needs to
be redone every time the knowledge base changes. ◦ Backward-chaining:
+ No expensive and change-sensitive pre-computation => suitable for more frequently changing knowledge bases
- Has to perform more computation at query time. Present a hybrid algorithm to perform efficient
backward-chaining reasoning on very large datasets expressed in the OWL Horst fragment.
Related WorkJiménez Described a Prolog library for
OWL RL. ◦This library has been implemented
under the SWI-Prolog interpreter and is based on the RDF library provided by the SWI-Prolog environment, in such a way that OWL triples are computed and stored in secondary memory
Improving Rule MaterializationI proposed to add a new parameter to the
materialization process which is the strategy of applying rules. So, S’=closure(S, R, Strategy).
Weights functions to determine the order of applying rules
The way of applying these rules is important and can have a great impact on machine resources which is similar to the impact of query planning on the processing cost of SQL queries.
Selected OWL Profile I have chosen a subset of the OWL RL profile
OWL Rules Rule dependencyEach rule R1 has a premise and a
consequent. The result of a given rule R1
(consequent) can eventually be used as premises of another rule R2.
This implies a dependency between R1 and R2
Rule dependency graphThe nodes of the dependency
graph are rules The dependency R1 R2 means
that the results of firing the rule R1 can make the rule R2 fireable again.
Rule Dependency Matrix
OWL rules implementationOWL rules were implemented
using the construct SparQL queries as given in the following:
Construct {?u ?p ?v.} WHERE { {?p rdf:type owl:SymmetricProperty}. {?v ?p ?u }. }
SesameA framework for storage,
querying and inferencing of RDF and RDF Schema
A Java Library for handling RDF
How Strategies work ?Associate a weight that
represents the rule’s priority during the materialization process with each rule (node) in the RDG.
Highest weight means highest priority.
Strategies: In-degree weight (IW)Maps to each node the number of
edges having that node as a terminal and fires the rules with more IN edges first.
Strategies: Out-degree weight (OW)Assigns to each node, the
number of edges having that node as their initial node. Fires the rules with more OUT edges first.
Strategies: Reachable sub graph weight (SW)Associates with each node the
number of edges in the sub graph which is reachable from that node.
Strategies: Reachable rule weight (RW)Associates to each node the
number of reachable nodes from that node
Strategies: Reachable sub graph weight with attenuation (SWA)The further rule1 is from rule2 in
the RDG, the less likely rule1 will make rule2 fireable.
This weight function reflects the attenuation of dependency with distance.
Strategies: Cyclomatic complexity weight (CCW)Counts the number of cycles + 1
in a given sub graph.
Experiment 1:Experimenting with strategies
Finding the rule execution order for each strategyThe Rule’s orders calculated by these
different strategies is provided below:
Strategies SW, RW and CCW show the same order of applying rules.
I will represent them by SW in my experimentations
IW 14 17 16 13 20 7 15 6 10 18 12 11 8 3 4 9 5 2 1 19 22 21
OW 3 4 8 9 12 20 18 19 15 21 22 13 14 11 10 17 16 2 1 7 6 5
SW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
RW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
SWA 3 4 8 9 12 20 19 21 22 15 18 11 10 13 14 17 16 2 1 7 6 5
CCW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22
Reasoning time per Rule for each strategy
Execution time for the four strategies on a store with 100000 triples.
Rule
1Ru
le 2Ru
le 3Ru
le 4Ru
le 5Ru
le 6Ru
le 7Ru
le 8Ru
le 9 Ru
lRu
lRu
lRu
lRu
lRu
lRu
lRu
lRu
lRu
lRu
lRu
lRu
l0
20000
40000
60000
80000
100000
120000
140000
160000Execution time for OWL rules in ms
IWOWSW SWA
Reasoning time per store for all strategies
1K 10K 100K0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
Reasoning time per store in ms with different strategies
IWOWSWSWA
Second Experiment: Using a much larger store
1K 10K 100K 1000k0
10000000
20000000
30000000
40000000
50000000
60000000
Performance comparison for all strategies in different stores
IWOWSWSWA
Performance comparison on a 1000k store
We notice that SWA is now the best option for larger stores
IW OW SW SWA41000000
42000000
43000000
44000000
45000000
46000000
47000000
48000000
49000000
1000k
1000k
Execution time improvement
SW
OW
SWA
1 1.02 1.04 1.06 1.08 1.1 1.12
1.04160828616711
1.0856861531755
1.10890198330886
Execution time improvement on a 1000k Store
(Compared to IW)
1000k
Third Experiment:Optimizing the strategies
Dynamic Approach algorithm
• All the dynamic strategies seem to do better than the normal ones.
• SW in no longer the best strategy to use, but SWA dynamic then OW dynamic in this case.
IW IW Dynamic OW OW Dynamic SW SW Dynamic SWA SWA Dynamic0
2000
4000
6000
8000
10000
12000
14000
16000
18000
Comparison of classical and dynamic strategies on a 1K store
Most dynamic strategies do better than the normal ones except SW dynamic.
OW dynamic and SWA dynamic seem to be the best strategies to use for this size of database.
IW IW Dynamic OW OW Dynamic SW SW Dynamic SWA SWA Dynamic0
20000
40000
60000
80000
100000
120000
140000
160000
180000
Comparison of classical and dynamic strategies on a 10K store
The two dynamic strategies OW and SWA do slightly worst than SW.
=> I cannot generalize and say that dynamic strategies will always to better than classical ones.
IW IW Dynamic OW OW Dynamic SW SW Dynamic SWA SWA Dynamic0
200000
400000
600000
800000
1000000
1200000
1400000
1600000
1800000
2000000
Comparison of classical and dynamic strategies on 100K triples stores
All the dynamic strategies perform better than the normal ones.
OW dynamic and SWA dynamic do a lot better than any other strategy.
IW IW Dynamic OW OW Dynamic SW SW Dynamic SWA SWA Dynamic0
10000000
20000000
30000000
40000000
50000000
60000000
Comparison of classical and dynamic strategies on a 1000K store
1000k
Execution time improvement
SW
OW
SWA
IW Dynamic
SW Dynamic
OW Dynamic
SWA Dynamic
0 10 20 30 40 50 60
1.04160828616711
1.0856861531755
1.10890198330886
1.36724430233728
2.37168570850941
51.7775893881754
54.3297100385185
Execution time improvement on a 1000k Store
(Compared to IW)
1000k
1K 10K 100K 1000k0
10000000
20000000
30000000
40000000
50000000
60000000Comparison of classical and
dynamic strategies on different stores
SWA DynamicOW Dynamic
IW
OWSWA
SW Dynamic
IW Dynamic
SW
=> Use SW for stores less than 1000k and use SWA dynamic for stores of 1000k and more
ConclusionApplication of the materialization
process in OWL databases for a subset of the OWL RL Profile
Support of rules sub-set (scalability)Demonstration of the impact of rules
order on the materialization processesImprovement of the materialization
process by 54.32%
Future workRule dependencies in the used database
could have a significant impact on the performance of the materialization process. ◦ Add a new metric to estimate the complexity
between rules in the dependency graph.
Study the impact of the new metric on the materialization process.
Study the performance of my approach on other OWL profiles
Problems encounteredTime (Weeks of execution)Computing power (Server)Application compatibility with LinuxLooking for a larger OWL database. No sesame parser for Nquads (new
database)Errors in the new database (several
crashes)
References M. El Koutbi, A. Salah, I. Khriss (2012) Strategies for Applying Rules in OWL Entailment Regimes. A Semantic Web Primer. G.Antoniou and F.Van Harmelen, (2003) Massachusetts Institute of Technology D. Fensel1, et al (n,d). Semantic Web Application Areas. Retrieved September 12th from ebscohost D.Fensel et al, (2002). On-To-Knowledge: Semantic Web Enabled Knowledge Management.
Retrieved September 19th from ebscohost Jeff Heflin (n,d). AN INTRODUCTION TO THE OWL WEB ONTOLOGY LANGUAGE. Retrieved October 12th from ebscohost F. Baader, et al (n,d). The Description Logic Handbook: Theory, Implementation and Applications. Cambridge:
Cambridge University Press, 2002 P. Patel-Schneider, I. Horrocks, and F. van Harmelen. Reviewing the Design of DAML+OIL: An Ontology Language for
the Semantic Web. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI02), (2002). <http://www.cs.vu.nl/frankh/abstracts/AAAI02.html>.
O.Lassila, Ralph R. Swick. Resource Description Framework (RDF) Model and Syntax Specification. Retrived on October 30th from the W3C Recommendation of the 22nd February 1999
T.Bizer, T.Heat. (2009). Linked Data - The Story so far. International Journal on Semantic Web and Information Systems, 5(3):1–22, 2009.[Sparql, 2012] http://www.w3.org/TR/sparql11-query/
G.Ianni, T.Krennwallner, A.Martello A.Polleres(2009). A Rule System for Querying Persistent RDFS Data. ESWC 2009: 857-862.
J.Urbani, S.Kotoulas,J.Maaseen, F.Van Harmelen, H.Bal. (2010), OWL reasoning with WebPIE: calculating the closure of 100 billion triples, In Proceedings of the ESWC '10.
Almendros-Jiménez , (2011). A Prolog Library for OWL RL. In ´ Proceedings of the Logic in Databases, LID’2011, EDBT/VLDB. ACM, 2011.
B.Bishop, S.Bojanov, (2011). Implementing OWL 2 RL and OWL 2 QL rule-sets for OWLIM. In M.Dumontier, M. Courtot, Proc. of the OWL: Experiences and Directions Workshop (OWLED 2011), Volume 796 of CEUR WS Proceedings.
M.Krötzsch, A.Mehdi, S.Rudolph (2010. Orel: Database-Driven Reasoning for OWL 2 Profiles. Description Logics Ontario, Canada
J. Urbani, S. Kotoulas, J. Maassen, F. van Harmelen, and H. Bal (2010). OWL reasoning with WebPIE: calculating the closure of 100 billion triples. In Proceedings of the ESWC, ( 2010).Hogan, et al. Scalable OWL 2 Reasoning for Linked Data. Reasoning Web: 250-325. Galway, Ireland, 2011.