+ All Categories
Home > Documents > Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Date post: 14-Dec-2015
Category:
Upload: lisbeth-trigg
View: 219 times
Download: 0 times
Share this document with a friend
Popular Tags:
17
Toward Scalable Reasoning over Annotated RDF Data Using MapReduce Chang Liu 1 , Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China
Transcript
Page 1: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Toward Scalable Reasoning over Annotated RDF Data Using MapReduce

Chang Liu1, Guilin Qi2

1Shanghai Jiao Tong University2Southeast University, China

Page 2: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

MotivationMore interests to represent additional

information on top of RDFTime, uncertainty, trust, and provenance=> Annotated RDF

Large amount of dataYAGO2

Problem: Large Scale Reasoning

Page 3: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Motivation (cont’d)Recent work on scalable reasoning using

MapReduceWebPIE (ISWC ‘09, ESWC ‘10)Fuzzy pD* (ISWC ‘11)

Our ideaLarge scale annoated RDF reasoner using

MapReduce

Page 4: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Background: Annotated RDFSyntax:

Deductive rules:Subproperty, Subclass, Domain, Range,

GeneralizationExample:

Subproperty (a)

Zimmermann et al.: A general framework for representing, reasoning and querying with annotated Semantic Web data. Journal of Web Semantics 11, 72-95 (2012)

Page 5: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Background: MapReduce

Page 6: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Naïve ImplementationSubproperty (a)

Mapper Mapper Mapper

Reducer Reducer Reducer

(X, P, Y) : (P,sp,Q) :

(X,Q,Y) :

Key Value

P1 X Y

2 Q

Page 7: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Challenges and solutionsGeneralization Rule

Delete triples from the data set

Large data reconstruction cost

SolutionOnly perform at the beginning and at the endCombine Generalization Rule with other rules

E.g. when a reducer generates and , it generates instead.

Page 8: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Challenges and solutions (cont’d)Unnecessary Derivation

E.g. Waste a lot of computation time

SolutionIncorporate the annotation into mapped keyE.g.

Map to ((t1, p), (1, s,o, [1,2])) Map to (t3, p), (2, q, [3,4])) They will not be grouped together!

Page 9: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Challenges and solutions (cont’d)Fixpoint Calculation

Subproperty/subclass rules require fixpoint iteration

SolutionLoad subproperty/subclass schema triples into

memoryCalculate the closure

Shortest path calculation Floyd-Warshall style algorithm

(𝑥1 , sp , 𝑥2 ) : 𝜆1 , (𝑥2 , sp , 𝑥3 ): 𝜆2 ,…, (𝑥𝑛 , sp , 𝑥𝑛+1 ) :𝜆𝑛⇒ (𝑥1 , sp ,𝑥𝑛+1 ): 𝜆1⊗…⊗𝜆𝑛

𝑥1 𝑥2 𝑥𝑛+1…“Shortest”

path

Page 10: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Experiment setupDataset

Fuzzified DBPedia core ontologyfpdLUBM 1000, 2000, 4000, 8000

Cluster25 machine with 75 mapper/reducer slots

Liu et al.: Reasoning with Large Scale Ontologies in Fuzzy pD* Using MapReduce. Computational Intelligence Magazine, IEEE 7(2), 54-66 (2012)

Page 11: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Experiment result - fuzzy DBPedia

#units 128 64 32 16 8 4 2

Time(sec.)

122.653

136.861

146.393

170.859

282.802

446.917

822.269

Speedup

6.70 6.01 5.62 4.81 2.91 1.84 1.00

Dataset: fuzzified DBPedia core ontology

Results:

Page 12: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Experiment result – fpdLUBM

Number of Universities

Time of FuzzyPD (minutes)

Time of WebPIE (minutes)

1000 38.8 41.32

2000 66.97 74.57

4000 110.40 130.87

8000 215.48 210.01

Experimental results of FuzzyPD and WebPIE

Page 13: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Experiment result– fpdLUBM (cont’d)

Number of units Time(minutes) Speedup

128 38.80 4.01

64 53.15 2.93

32 91.58 1.70

16 155.47 1.00

Scalability over number of units

Page 14: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Experiment result– fpdLUBM (cont’d)Scalability over number of units

Page 15: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Experiment result– fpdLUBM (cont’d)

Number of universities

Input (Mtriples)

Output (Mtriples)

Time (minutes)

Throughput (Ktriples/second)

1000 155.51 92.01 38.8 39.52

2000 310.71 185.97 66.97 46.28

4000 621.46 380.06 110.40 57.37

8000 1243.20 792.54 215.50 61.29

Scalability over data volume

Page 16: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Conclusion and Future workWe show how to design MapReduce

algorithms to achieve scalable annotated RDFS reasoning

Several challenges along with solutions

Future workMore experiments on annotated RDFS

ontologiesAnnotated OWL 2 RL

Page 17: Chang Liu 1, Guilin Qi 2 1 Shanghai Jiao Tong University 2 Southeast University, China.

Q&A


Recommended