Rules validation - Copy

transcript

A Performance Evaluation and Optimization of the Rule Materialization

Process in OWL DatabasesPresented by Hicham Berrada

Supervised by Dr. Harroud

OutlineIntroductionBackground studyA new approach for the

materialization processPerformance evaluation and

optimizationConclusionFuture workProblems encountered

IntroductionThe materialization process aims at

physically storing inferred (discovered) data to insure rapid query answering.

The purpose of this project is to enhance this process in OWL databases

This family of knowledge representation is used in the Semantic Web

Today’s Web Limitations

Weaknesses

High recall, low precision.

Results are highly sensitive

to vocabular

Results are single

Web pages

Human involvem

ent is necessary

to interpret

and combine results

Results of Web

searches are not readily

accessible by other software

The meaning of Web

content is not

machine-accessible

Semantic WebRepresent the Web content in a form that is more easily machine-

processable

A way to specify

data and data

relationships

reason over explicitly

declared or defined

knowledge

infer new,

implicit knowled

Materialization process

Semantic Web Applications: Search engines

The Semantic Web Stack

Related WorkUrbani:Presents materialization and backward-chaining as

different modes of performing inference. ◦ Materialization:

+ Very efficient responses at query time - Expensive up front closure computation, which needs to

be redone every time the knowledge base changes. ◦ Backward-chaining:

+ No expensive and change-sensitive pre-computation => suitable for more frequently changing knowledge bases

- Has to perform more computation at query time. Present a hybrid algorithm to perform efficient

backward-chaining reasoning on very large datasets expressed in the OWL Horst fragment.

Related WorkJiménez Described a Prolog library for

OWL RL. ◦This library has been implemented

under the SWI-Prolog interpreter and is based on the RDF library provided by the SWI-Prolog environment, in such a way that OWL triples are computed and stored in secondary memory

Improving Rule MaterializationI proposed to add a new parameter to the

materialization process which is the strategy of applying rules. So, S’=closure(S, R, Strategy).

Weights functions to determine the order of applying rules

The way of applying these rules is important and can have a great impact on machine resources which is similar to the impact of query planning on the processing cost of SQL queries.

Selected OWL Profile I have chosen a subset of the OWL RL profile

OWL Rules Rule dependencyEach rule R1 has a premise and a

consequent. The result of a given rule R1

(consequent) can eventually be used as premises of another rule R2.

This implies a dependency between R1 and R2

Rule dependency graphThe nodes of the dependency

graph are rules The dependency R1 R2 means

that the results of firing the rule R1 can make the rule R2 fireable again.

Rule Dependency Matrix

OWL rules implementationOWL rules were implemented

using the construct SparQL queries as given in the following:

Construct {?u ?p ?v.} WHERE { {?p rdf:type owl:SymmetricProperty}. {?v ?p ?u }. }

SesameA framework for storage,

querying and inferencing of RDF and RDF Schema

A Java Library for handling RDF

How Strategies work ?Associate a weight that

represents the rule’s priority during the materialization process with each rule (node) in the RDG.

Highest weight means highest priority.

Strategies: In-degree weight (IW)Maps to each node the number of

edges having that node as a terminal and fires the rules with more IN edges first.

Strategies: Out-degree weight (OW)Assigns to each node, the

number of edges having that node as their initial node. Fires the rules with more OUT edges first.

Strategies: Reachable sub graph weight (SW)Associates with each node the

number of edges in the sub graph which is reachable from that node.

Strategies: Reachable rule weight (RW)Associates to each node the

number of reachable nodes from that node

Strategies: Reachable sub graph weight with attenuation (SWA)The further rule1 is from rule2 in

the RDG, the less likely rule1 will make rule2 fireable.

This weight function reflects the attenuation of dependency with distance.

Strategies: Cyclomatic complexity weight (CCW)Counts the number of cycles + 1

in a given sub graph.

Experiment 1:Experimenting with strategies

Finding the rule execution order for each strategyThe Rule’s orders calculated by these

different strategies is provided below:

Strategies SW, RW and CCW show the same order of applying rules.

I will represent them by SW in my experimentations

IW 14 17 16 13 20 7 15 6 10 18 12 11 8 3 4 9 5 2 1 19 22 21

OW 3 4 8 9 12 20 18 19 15 21 22 13 14 11 10 17 16 2 1 7 6 5

SW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

RW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

SWA 3 4 8 9 12 20 19 21 22 15 18 11 10 13 14 17 16 2 1 7 6 5

CCW 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Reasoning time per Rule for each strategy

Execution time for the four strategies on a store with 100000 triples.

le 2Ru

le 3Ru

le 4Ru

le 5Ru

le 6Ru

le 7Ru

le 8Ru

le 9 Ru

100000

120000

140000

160000Execution time for OWL rules in ms

IWOWSW SWA

Reasoning time per store for all strategies

1K 10K 100K0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

Reasoning time per store in ms with different strategies

IWOWSWSWA

Second Experiment: Using a much larger store

1K 10K 100K 1000k0

10000000

20000000

30000000

40000000

50000000

60000000

Performance comparison for all strategies in different stores

IWOWSWSWA

Performance comparison on a 1000k store

We notice that SWA is now the best option for larger stores

IW OW SW SWA41000000

42000000

43000000

44000000

45000000

46000000

47000000

48000000

49000000

Execution time improvement

1 1.02 1.04 1.06 1.08 1.1 1.12

1.04160828616711

1.0856861531755

1.10890198330886

Execution time improvement on a 1000k Store

(Compared to IW)

Third Experiment:Optimizing the strategies

Dynamic Approach algorithm

• All the dynamic strategies seem to do better than the normal ones.

• SW in no longer the best strategy to use, but SWA dynamic then OW dynamic in this case.

IW IW Dynamic OW OW Dynamic SW SW Dynamic SWA SWA Dynamic0

Comparison of classical and dynamic strategies on a 1K store

Most dynamic strategies do better than the normal ones except SW dynamic.

OW dynamic and SWA dynamic seem to be the best strategies to use for this size of database.

100000

120000

140000

160000

180000

The two dynamic strategies OW and SWA do slightly worst than SW.

=> I cannot generalize and say that dynamic strategies will always to better than classical ones.

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

2000000

Comparison of classical and dynamic strategies on 100K triples stores

All the dynamic strategies perform better than the normal ones.

OW dynamic and SWA dynamic do a lot better than any other strategy.

10000000

20000000

30000000

40000000

50000000

60000000

Execution time improvement

IW Dynamic

SW Dynamic

OW Dynamic

SWA Dynamic

0 10 20 30 40 50 60

1.04160828616711

1.0856861531755

1.10890198330886

1.36724430233728

2.37168570850941

51.7775893881754

54.3297100385185

Execution time improvement on a 1000k Store

(Compared to IW)

1K 10K 100K 1000k0

10000000

20000000

30000000

40000000

50000000

60000000Comparison of classical and

dynamic strategies on different stores

SWA DynamicOW Dynamic

SW Dynamic

IW Dynamic

=> Use SW for stores less than 1000k and use SWA dynamic for stores of 1000k and more

ConclusionApplication of the materialization

process in OWL databases for a subset of the OWL RL Profile

Support of rules sub-set (scalability)Demonstration of the impact of rules

order on the materialization processesImprovement of the materialization

process by 54.32%

Future workRule dependencies in the used database

could have a significant impact on the performance of the materialization process. ◦ Add a new metric to estimate the complexity

between rules in the dependency graph.

Study the impact of the new metric on the materialization process.

Study the performance of my approach on other OWL profiles

Problems encounteredTime (Weeks of execution)Computing power (Server)Application compatibility with LinuxLooking for a larger OWL database. No sesame parser for Nquads (new

database)Errors in the new database (several

crashes)

References M. El Koutbi, A. Salah, I. Khriss (2012) Strategies for Applying Rules in OWL Entailment Regimes. A Semantic Web Primer. G.Antoniou and F.Van Harmelen, (2003) Massachusetts Institute of Technology D. Fensel1, et al (n,d). Semantic Web Application Areas. Retrieved September 12th from ebscohost D.Fensel et al, (2002). On-To-Knowledge: Semantic Web Enabled Knowledge Management.

Retrieved September 19th from ebscohost Jeff Heflin (n,d). AN INTRODUCTION TO THE OWL WEB ONTOLOGY LANGUAGE. Retrieved October 12th from ebscohost F. Baader, et al (n,d). The Description Logic Handbook: Theory, Implementation and Applications. Cambridge:

Cambridge University Press, 2002 P. Patel-Schneider, I. Horrocks, and F. van Harmelen. Reviewing the Design of DAML+OIL: An Ontology Language for

the Semantic Web. In Proceedings of the 18th National Conference on Artificial Intelligence (AAAI02), (2002). <http://www.cs.vu.nl/frankh/abstracts/AAAI02.html>.

O.Lassila, Ralph R. Swick. Resource Description Framework (RDF) Model and Syntax Specification. Retrived on October 30th from the W3C Recommendation of the 22nd February 1999

T.Bizer, T.Heat. (2009). Linked Data - The Story so far. International Journal on Semantic Web and Information Systems, 5(3):1–22, 2009.[Sparql, 2012] http://www.w3.org/TR/sparql11-query/

G.Ianni, T.Krennwallner, A.Martello A.Polleres(2009). A Rule System for Querying Persistent RDFS Data. ESWC 2009: 857-862.

J.Urbani, S.Kotoulas,J.Maaseen, F.Van Harmelen, H.Bal. (2010), OWL reasoning with WebPIE: calculating the closure of 100 billion triples, In Proceedings of the ESWC '10.

Almendros-Jiménez , (2011). A Prolog Library for OWL RL. In ´ Proceedings of the Logic in Databases, LID’2011, EDBT/VLDB. ACM, 2011.

B.Bishop, S.Bojanov, (2011). Implementing OWL 2 RL and OWL 2 QL rule-sets for OWLIM. In M.Dumontier, M. Courtot, Proc. of the OWL: Experiences and Directions Workshop (OWLED 2011), Volume 796 of CEUR WS Proceedings.

M.Krötzsch, A.Mehdi, S.Rudolph (2010. Orel: Database-Driven Reasoning for OWL 2 Profiles. Description Logics Ontario, Canada

J. Urbani, S. Kotoulas, J. Maassen, F. van Harmelen, and H. Bal (2010). OWL reasoning with WebPIE: calculating the closure of 100 billion triples. In Proceedings of the ESWC, ( 2010).Hogan, et al. Scalable OWL 2 Reasoning for Linked Data. Reasoning Web: 250-325. Galway, Ireland, 2011.

Rules validation - Copy

Documents