Date post: | 14-Dec-2015 |
Category: |
Documents |
Upload: | lisbeth-trigg |
View: | 219 times |
Download: | 0 times |
Toward Scalable Reasoning over Annotated RDF Data Using MapReduce
Chang Liu1, Guilin Qi2
1Shanghai Jiao Tong University2Southeast University, China
MotivationMore interests to represent additional
information on top of RDFTime, uncertainty, trust, and provenance=> Annotated RDF
Large amount of dataYAGO2
Problem: Large Scale Reasoning
Motivation (cont’d)Recent work on scalable reasoning using
MapReduceWebPIE (ISWC ‘09, ESWC ‘10)Fuzzy pD* (ISWC ‘11)
Our ideaLarge scale annoated RDF reasoner using
MapReduce
Background: Annotated RDFSyntax:
Deductive rules:Subproperty, Subclass, Domain, Range,
GeneralizationExample:
Subproperty (a)
Zimmermann et al.: A general framework for representing, reasoning and querying with annotated Semantic Web data. Journal of Web Semantics 11, 72-95 (2012)
Background: MapReduce
Naïve ImplementationSubproperty (a)
Mapper Mapper Mapper
Reducer Reducer Reducer
(X, P, Y) : (P,sp,Q) :
(X,Q,Y) :
Key Value
P1 X Y
2 Q
Challenges and solutionsGeneralization Rule
Delete triples from the data set
Large data reconstruction cost
SolutionOnly perform at the beginning and at the endCombine Generalization Rule with other rules
E.g. when a reducer generates and , it generates instead.
Challenges and solutions (cont’d)Unnecessary Derivation
E.g. Waste a lot of computation time
SolutionIncorporate the annotation into mapped keyE.g.
Map to ((t1, p), (1, s,o, [1,2])) Map to (t3, p), (2, q, [3,4])) They will not be grouped together!
Challenges and solutions (cont’d)Fixpoint Calculation
Subproperty/subclass rules require fixpoint iteration
SolutionLoad subproperty/subclass schema triples into
memoryCalculate the closure
Shortest path calculation Floyd-Warshall style algorithm
(𝑥1 , sp , 𝑥2 ) : 𝜆1 , (𝑥2 , sp , 𝑥3 ): 𝜆2 ,…, (𝑥𝑛 , sp , 𝑥𝑛+1 ) :𝜆𝑛⇒ (𝑥1 , sp ,𝑥𝑛+1 ): 𝜆1⊗…⊗𝜆𝑛
𝑥1 𝑥2 𝑥𝑛+1…“Shortest”
path
Experiment setupDataset
Fuzzified DBPedia core ontologyfpdLUBM 1000, 2000, 4000, 8000
Cluster25 machine with 75 mapper/reducer slots
Liu et al.: Reasoning with Large Scale Ontologies in Fuzzy pD* Using MapReduce. Computational Intelligence Magazine, IEEE 7(2), 54-66 (2012)
Experiment result - fuzzy DBPedia
#units 128 64 32 16 8 4 2
Time(sec.)
122.653
136.861
146.393
170.859
282.802
446.917
822.269
Speedup
6.70 6.01 5.62 4.81 2.91 1.84 1.00
Dataset: fuzzified DBPedia core ontology
Results:
Experiment result – fpdLUBM
Number of Universities
Time of FuzzyPD (minutes)
Time of WebPIE (minutes)
1000 38.8 41.32
2000 66.97 74.57
4000 110.40 130.87
8000 215.48 210.01
Experimental results of FuzzyPD and WebPIE
Experiment result– fpdLUBM (cont’d)
Number of units Time(minutes) Speedup
128 38.80 4.01
64 53.15 2.93
32 91.58 1.70
16 155.47 1.00
Scalability over number of units
Experiment result– fpdLUBM (cont’d)Scalability over number of units
Experiment result– fpdLUBM (cont’d)
Number of universities
Input (Mtriples)
Output (Mtriples)
Time (minutes)
Throughput (Ktriples/second)
1000 155.51 92.01 38.8 39.52
2000 310.71 185.97 66.97 46.28
4000 621.46 380.06 110.40 57.37
8000 1243.20 792.54 215.50 61.29
Scalability over data volume
Conclusion and Future workWe show how to design MapReduce
algorithms to achieve scalable annotated RDFS reasoning
Several challenges along with solutions
Future workMore experiments on annotated RDFS
ontologiesAnnotated OWL 2 RL
Q&A